libresilient/docs/ARCHITECTURE.md

22 KiB

Architecture

A Service Worker is used as a way to persist the library after the initial visit to a website that deployed it.

After the Service Worker is downloaded and activated, it handles all fetch() events by running plugins in the configured order. These plugins can attempt fetching the resource directly from the website (the fetch plugin), or from alternative endpoints (the alt-fetch plugin), or using alternative transports (for example, the dnslink-ipfs plugin); they can also cache the retrieved content for later (the cache plugin) or verify that content (like ``).

Plugins

You can find the list of available plugins along with documentation on them here. You might also want to check out the Quickstart Guide for a walk-through explanation of how plugin configuration and composition works.

There are three kinds of plugins:

  • Transport plugins
    Plugins that retrieve remote content by any means, for example by using regular HTTPS fetch(), or by going through IPFS. They should also offer a way to publish content by website admins (if relevant credentials or encryption keys are provided, depending on the method).
    Methods these plugins implement:

    • fetch - fetch content from an external source (e.g., from IPFS)
    • publish - publish the content to the external source (e.g., to IPFS)
  • Stashing plugins
    Plugins that stash content locally (for example, in the browser cache). This is useful when no transport plugin works, or before remote content is received.
    Methods these plugins implement:

    • fetch - fetch the locally stored content (e.g., from cache)
    • stash - stash the content locally (e.g., in cache)
    • unstash - clear the content from the local store (e.g., clear the cache)
  • Composing plugins and Wrapping plugins
    Plugins that compose multiple other plugins (for example, by running them simultaneously to retrieve content from whichever succeeds first); or that wrap other plugins, applying some action to the results returned by the wrapped plugin (for instance, checking known resource integrity hashes on returned content).
    Methods these plugins implement depend on which plugins they compose. Additionally, plugins being composed the uses key, providing the configuration for them the same way configuration is provided for plugins in the plugins key of LibResilientConfig (which is configurable via config.json).

Every plugin needs to be implemented as a constructor function that is added to the LibResilientPluginConstructors Map() object for later instantiation.

The constructor function should return a structure as follows (fields depending on the plugin type):

{
    name: 'plugin-name',
    description: 'Plugin description. Just a few words, ideally.',
    version: 'any relevant plugin version information',
    fetch: functionImplementingFetch,
    publish|stash|unstash: functionsImplementingRelevantFunctionality,
    uses: []
}

Transport plugins

Transport plugins must add X-LibResilient-Method and X-LibResilient-ETag headers to the response they return, so as to facilitate informing the user about new content after content was displayed using a stashing plugin.

  • X-LibResilient-Method:
    contains the name of the plugin used to fetch the content.

  • X-LibResilient-ETag:
    contains the ETag for the content; this can be an actual ETag header for HTTPS-based plugins, or some arbitrary string identifying a particular version of the resource (e.g., for IPFS-based plugins this can be the IPFS address, since that is based on content and different content results in a different IPFS address).

Stashing plugins

Stashing plugins must stash the request along with the X-LibResilient-Method and X-LibResilient-ETag headers.

Composing plugins

Composing plugins work by composing other plugins, for example to: run them simultaneously and retrieve content from the first one that succeeds; or to run them in a particular order. A composing plugin needs to set the uses key in the object returned by it's constructor. The key should contain mappings from plugin names to configuration:

uses: [{
          name: "composed-plugin-1",
          configKey1: "whatever-data-here"
      },{
          name: "composed-plugin-2",
          configKey2: "whatever-data-here"
      },
      {...}
}]

If these mappings are to be configured via the global configuration file (which is most often the case), the uses key should instead point to config.uses:

uses: config.uses

Wrapping plugins

Wrapping plugins wrap other plugins, in order to performing some actions on request data sent to them, or on response data received from them. A wrapping plugin needs to set the uses key in the object returned by it's constructor. The key should contain a mapping from wrapped plugin name to configuration:

uses: [{
          name: "composed-plugin-1",
          configKey1: "whatever-data-here"
      }
}]

If this mapping is to be configured via the global configuration file (which is most often the case), the uses key should instead point to config.uses:

uses: config.uses

Fetching a resource via LibResilient

Whenever a resource is being fetched on a LibResilient-enabled site, the service-worker.js script dispatches plugins in the set order. This order is configured via the plugins key of the LibResilientConfig variable, usually set via the config.json config file.

A minimal default configuration is hard-coded in case no site-specific configuration is provided. This default configuration runs these plugins:

  1. fetch, to use the upstream site directly if it is available,
  2. cache, to display the site immediately from the cache in case regular fetch fails (if content is already cached from previous visit).

A more robust configuration could look like this:

{
    "plugins": [{
            "name": "fetch"
        },{
            "name": "cache"
        },{
            "name": "alt-fetch",
            "endpoints": [
                "https://fallback-endpoint.example.com"
            ]
        }]
}

For each resource, such a config would:

  1. Perform a regular fetch() to the main site domain first; if that succeeds, content is added to cache and displayed to the user.
  2. If the fetch() failed, the cache would be checked.
    1. If the resource was cached, it would be displayed; at the same time, a background request for that resource would be made to fallback-endpoint.example.com instead of the (failing) main domain; if that succeeded, the new version of the resource would be cached.
    2. If the resource was not cached, a request for that resource would be made to fallback-endpoint.example.com; if that succeded, the resource would be displayed and cached.

Stashed versions invalidation

Invalidation heuristic is rather naïve, and boils down to checking if either of X-LibResilient-Method or X-LibResilient-ETag differs between the response from a transport plugin and whatever has already been stashed by a stashing plugin. If either differs, the transport plugin response is considered "fresher".

This is far from ideal and will need improvements in the long-term. The difficulty is that different transport plugins can provide different ways of determining the "freshness" of fetched content -- HTTPS-based requests offer ETag, Date, Last-Modified, and other headers that can help with that; whereas IPFS can't really offer much apart from the address which itself is a hash of the content, so at least we know the content is different (but is it fresher though?).

Content versioning

Content versioning has not been implemented in any plugin yet, but might be necessary at some point. Some delivery mechanisms (IPFS, BitTorrent) might be slow to pick up newly published content, and while information about this might be available, it might be faster to fetch and display older content that has already propagated across multiple peers or network nodes, with a message informing the reader that new content is available and that they might want to retry fetching it.

An important consideration related to content versioning is that it needs to be consistent across a full set of published pieces of content.

For example, consider a simple site that consists of an index.html, style.css, and script.js. Non-trivial changes in index.html will render older versions of style.css and script.js broken. A particular version of the whole published site needs to be fetched, otherwise things will not work as expected.

This will probably need to be fleshed out later on, but the initial API needs to be designed in a way where content versioning can be introduced without breaking backwards compatibility with plugins.

Status information

Status information should be available to users, informing them that the content is being retrieved using non-standard means that might take longer.

LibResilient information is kept per-request in the Service Worker, meaning it is transient and does not survive Service Worker restarts, which might happen multiple times over the lifetime of an open tab. The Service Worker can communicate with the browser window using the Client.postMessage() to post messages to the browser window context using the relevant Client ID, retrieved from the fetch event object. This is also how information on Service Worker commit SHAs and available plugins are made available to the browser window context.

The data provided (per each requested URL handled by the Service Worker) is:

  • clientId – the Client ID for the request (that is, the Client ID of this browser window)
  • url – the URL of the request
  • serviceWorker – the commit SHA of the Service Worker that handled the request
  • lastError – the last error message emitted from any plugin
  • method – the name of the plugin by which the request was completed
  • state – the state of the request (running, failed, success)

The code in the browser window context is responsible for keeping a more permanent record of the URLs requested, the methods used, and the status of each, if needed.

When the browser window context wants to message the service worker, it uses the Worker.postMessage() call, with clientId field set to the relevant client ID if a response is expected. Service Worker then again responds using Client.postMessage() using the clientId field as source of the Client ID.

MIME type guessing

Some plugins (for example, those based on IPFS, like dnslink-ipfs), receive content without a Content-Type header, because the transport simply does not support it. That's problematic, as the Content-Type header is used by the browser to decide what can be done with a file -- display it if it's an image, run it if it's a JavaScript file, and so on.

For the purpose of guessing the MIME type of a given piece of content (in order to populate the Content-Type header) the LibResilient's Service Worker implements the guessMimeType() function, available to any plugin.

By default the function attempts to guess the MIME type of content only by considering the file extension of the path used to retrieve it. There is a hard-coded list of extension-to-MIME-type mappings that should cover most file types relevant on the Web.

This might not be enough, however. So, the Service Worker can optionally load an external library that can establish a MIME type based on the actual content. Currently that is file-type, distributed along with LibResilient in the lib/ directory).

To enable content-based MIME type guessing, set the useMimeSniffingLibrary to true in config.json.

By default, content-based MIME guessing is disabled, because it is somewhat slower (content needs to be read, inspected, and compared to a lot of signatures), and relies on an external library that needs to be distributed along with LibResilient, while not being needed for most plugins, nor necessary for those that do actually need to guess content MIME types.

When enabled, content-based MIME guessing is attempted first for any given piece of content that requires it. If it fails, extension-based MIME guessing is then used.

How configuration file is processed

When loading the configuration file, config.json, LibResilient's service worker attempts to make sure that a broken config file is never used.

The file's syntax is checked before any attempt at loading it is performed. When loading the file, its contents are merged with built-in defaults, which provide a sane baseline. Once a config file is successfully loaded, it is also cached in a way that provides fall-back in case loading a newer version later on fails for whatever reason.

There are two levels of cache of the config.json file employed here: "regular" and "verified". When attempting to load the configuration:

  1. first the "regular" cache is used;
  2. if it is empty or the configuration file is invalid for whatever reason (fails the syntax check, or loading it errors out), the "verified" cache is used;
  3. if that in turn is empty or broken, a regular fetch() is attempted to the original website;
  4. and finally if that fails for whatever reason, the built-in defaults are used.

Whenever a configuration file is successfully loaded and applied, it gets saved to the "verified" cache, so that it is available as a known-good fall-back in the future. After the config.json file is loaded and applied, if it was loaded from any of the caches it is checked for staleness. If it is stale (older than 24h), a an attempt is made to retrieve a newer version through the currently configured plugins. If it succeeds and the retrieved config.json passes verification, it is cached in the "regular" cache, to be used next time the service worker i initialized.

This verification involves checking the syntax of the file and if it contains all the necessary fields. If the file was retrieved using means other than regular fetch(), it is also checked in case it requires any plugins whose code has not been loaded in the currently deployed service worker. If it does, it is discarded — the Service Workers API specifies that code loaded by the service worker can only come from the original domain; if the config file was loaded using some other means, it might not be possible to load the necessary plugin code when initializing the service worker later.

Still-loading screen

Depending on the plugin configuration, some requests can take a long time -- even tens of seconds. This is especially true for plugins using non-standard transports, like IPFS.

When requests that take that long, user experience suffers. Visitors have no clue if something went wrong and the request is just hanging there, or if they should just wait a bit longer. Browsers will at some point time out the request and just display a generic error screen.

To improve this user experience, the LibResilient service worker implements a "still-loading" screen for navigate requests. Navigate requests are requests that the browser understands as meaning to navigate between two different pages (in code this means that the Request object has mode property set to navigate).

Navigate requests are meant to return a resource that is directly displayed to the visitor. So, the still-loading screen will not be returned in case of fetch() requests for some HTML parts to inject in the page, or requests for style sheets, images, scripts, etc., that are to be used in an already displayed page. But it will be displayed when a visitor navigates to a resource to display it in their browser window and the request is taking too long -- even if that resource is a style sheet, script, or image.

If a navigate request takes too long (longer than stillLoadingTimeout configuration setting, to be precise, which by default is set to 5000ms), and there is a stashing plugin configured and enabled, the service worker will return a hard-coded "Still loading..." HTML page, with a simple throbber and attempts counter to indicate things are still happening in the background. It also contains a short explainer text and a link for the user to click if they think the request is taking too long.

That still-loading screen is listening to the messages from the service worker (sent using the Client.postMessage() API call). When the service worker indicates that content is ready, the page will automatically reload to display it. If instead the service worker indicates a final failure of the request, the text on the still-loading screen is modified to reflect that.

The still-loading screen is only displayed when all of these conditions are met:

  1. the stillLoadingTimeout is set to number greater than zero;
  2. there is at least one stashing plugin (normally, cache) configured and enabled;
  3. the request in question is a navigation request;
  4. the request is taking longer than stillLoadingTimeout.

The reason why a stashing plugin needs to be configured and enabled is to avoid loops. Consider a scenario, where a visitor is navigating to a page, and the request is taking very long. The still-loading screen is displayed (by way of the service worker returning the relevant HTML in response to the request). Eventually, the request completes in the background, but the response is discarded due to lack of a stashing plugin.

In such a case reloading the page will cause a repeat: request, still-loading screen, request completes in the background (and the result is discarded). The visitor would be stuck in a loop. If a stashing plugin (like cache) is enabled, this loop can be expected not to emerge, since the second request would quickly return the cached response.

Error handling

LibResilient's error handling focuses on attempting to "do the right thing". In some cases this means passing a HTTP error response from a plugin directly to the browser to be displayed to the user; in other cases it means ignoring HTTP error response from a plugin so that other plugins can attempt to retrieve a given resource.

In general:

  • A response with status code value of 499 or lower is passed immediately to the browser
    no other plugins are used to try to retrieve the content; if a stashing plugin is configured, the response might be cached locally.

  • A response with status code value of 500 or higher is treated as a plugin error
    if other plugins are configured, they will be used to try to retrieve the content; if a stashing plugin is configured it will not stash that response.

  • Any exception thrown in a plugin will be caught and treated as a plugin error
    if other plugins are configured, they will be used to try to retrieve the content; there is no response object, so there is nothing to stash.

  • If a plugin rejects for whatever reason, it is treated as a plugin error
    if other plugins are configured, they will be used to try to retrieve the content; there is no response object, so there is nothing to stash.

All plugin errors (5xx HTTP responses, thrown exceptions, rejections) are logged internally. This data is printed in the console (if loggedComponents config field contains service-worker), and sent to the client using Client.postMessage(), to simplify debugging.

If all plugins fail in case of a navigate request, a Request object is created with a 404 Not Found HTTP status, containing a simple HTML error page (similar to the still-loading screen mentioned above) to be displayed to the user. If the request is not a navigate request, the rejected promise is returned directly.

Mapping plugin errors onto HTTP errors is not always going to be trivial. For example, an IPFS-based transport plugin could in some circumstances return a 404 Not Found HTTP error, but the any-of plugin necessarily has to ignore any HTTP errors it receives from plugins it is configured to use, while waiting for one to potentially return the resource successfully. If all of the configured plugins fail, with different HTTP errors, which one should the any-of plugin return itself?..

At the same time, returning HTTP errors makes sense, as it allows the browser and the user to properly interpret well-understood errors. So, the fetch plugin will return any 4xx HTTP error it receives, for example, and the service worker will in turn treat that as a successfully completed retrieval and return that to the browser to be displayed to the user.

Plugin authors should consider this carefully. If in doubt, it's probably better to throw an exception or reject the promise with a meaningful error message, than to try to fit a potentially complex failure mode into the limited and rigit contraints of HTTP error codes.