Making sense of the README

merge-requests/23/head
Michał "rysiek" Woźniak 2022-12-16 12:58:34 +00:00
rodzic b28c555bc3
commit 24cbec237c
2 zmienionych plików z 37 dodań i 99 usunięć

101
README.md
Wyświetl plik

@ -4,109 +4,24 @@ A browser-based decentralized content delivery network, implemented as a JavaScr
Ideally, users should not need to install any special software nor change any settings to continue being able to access an overloaded LibResilient-enabled site as soon as they are able to access it *once*.
**Project website: https://resilient.is
Documentation: https://resilient.is/docs**
## Current status
LibResilient is currently considered *beta*: the code works, and the API is mostly stable, but it has not been deployed in production and would benefit from real-world testing. During development it has been tested on Firefox, Chromium and Chrome on desktop, as well as Firefox for mobile on Android, but it should work in any browser implementing the Service Worker API.
LibResilient is currently considered *beta*: the code works, and the API is mostly stable, but it has not yet been deployed in production on a reasonably high-traffic site and would benefit from real-world testing. During development it has been tested on Firefox, Chromium and Chrome on desktop, as well as Firefox for mobile on Android, but it should work in any browser implementing the Service Worker API.
Feel free to test it, but be aware that it might not work as expected. If you'd like to get in touch, please email us at `rysiek+libresilient[at]hackerspace.pl`, create an [issue](https://gitlab.com/rysiekpl/libresilient/-/issues/new).
If you'd like to get in touch, please email create an [issue](https://gitlab.com/rysiekpl/libresilient/-/issues/new).
## Rationale
While a number of content delivery technologies exist, these typically require enormous centralized services. This creates opportunities for gate-keeping, and [causes any disruption at these centralized providers to become a major problem for thousands of websites](https://blog.cloudflare.com/cloudflare-outage-on-july-17-2020/).
This project explores the possibility of solving this in a way that would not require website visitors to install any special software or change any settings; the only things that are needed are a modern Web browser and the ability to visit a website once, so that the JavaScript ServiceWorker kicks in.
On the other hand, visitors have at their disposal many tools that allow them to work around potential localized problems with availability of certain websites — for example, Tor, proxies, VPNs, etc. These tools, however, require the visitors to install and configure them. While useful for few dedicated users, it is unreasonable to expect large part of the Internet using public to switch to them just to access certain websites.
You can read more in-depth overview of LibResilient [here](./docs/ARCHITECTURE.md). And [here](./docs/PHILOSOPHY.md) is a document describing the philosophy influencing project goals and relevant technical decisions.
LibResilient explores the possibility of solving this conundrum in a way that would not require website visitors to install any special software or change any settings; the only things that are needed are a modern Web browser and the ability to visit a website once, so that the JavaScript ServiceWorker kicks in.
## Architecture
A [ServiceWorker](https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API) is used as a way to persist the library after the initial visit to the participating website.
After the ServiceWorker is downloaded and activated, it handles all `fetch()` events by first trying to use the regular HTTPS request to the original website. If that fails for whatever reason (be it a timeout or a `4xx`/`5xx` error), the plugins kick in, attempting to fetch the content via any means available.
A more complete overview of the architecture and technicalities of LibResilient is available [here](./docs/ARCHITECTURE.md).
## Draft API
The plan is to have an API to enable the use of different strategies for getting content. There are two basic functions a plugin needs to perform:
- **resolution**
*where* a given piece of content (image, stylesheet, script, HTML file, etc.) is to be found
- **delivery**
*how* to get it
These need to be closely integrated. For example, if using Gun and IPFS, resolution is performed using Gun, and delivery is performed using IPFS. However, Gun needs to resolve content to something that is usable with IPFS. If, alternatively, we're also using Gun to resolve content available on BitTorrent, that will have to be a separate namespace in the Gun graph, since it will have to resolve to magnet links.
Therefore, it doesn't seem to make sense to separate resolution and delivery. Thus, a LibResilient plugin would need to implement the whole pipeline, and work by receiving a URL and returning a Promise that resolves to a valid Response object containing the content.
It should be possible to chain the plugins (try the first one, in case of error try the next, and so on), or run them in parallel (fire requests using all available plugins and return the first complete successful response). Running in parallel might offer a better user experience, but will also be more resource-intensive.
An additional part of the API is going to deal with reporting the status of the plugins, their versions, and how a given piece of content was fetched (using which plugin). This will require modifying actual content from the ServiceWorker to pass that data to the DOM.
### Content versioning
Implementing content versioning might be necessary. Some delivery mechanisms (IPFS, BitTorrent) might be slow to pick up newly published content, and while information about this might be available, it might be faster to fetch and display older content that has already propagated across multiple peers or network nodes, with a message informing the reader that new content is available and that they might want to retry fetching it.
An important consideration related to content versioning is that it needs to be consistent across a full set of published pieces of content.
For example, consider a simple site that consists of an `index.html`, `style.css`, and `script.js`. Non-trivial changes in `index.html` will render older versions of `style.css` and `script.js` broken. A particular version of the whole published site needs to be fetched, otherwise things will not work as expected.
This will probably need to be fleshed out later on, but the initial API needs to be designed in a way where content versioning can be introduced without breaking backwards compatibility with plugins.
### Status information
Status information should be available to users, informing them that the content is being retrieved using non-standard means that might take longer.
LibResilient information is kept per-request in the ServiceWorker, meaning it is transient and does not survive ServiceWorker restarts, which might happen multiple times over the lifetime of an open tab. However, each update is communicated to the browser window context that is relevant for a given request via [`client.postMessage`](https://developer.mozilla.org/en-US/docs/Web/API/Client/postMessage) calls. This is also how information on ServiceWorker commit SHAs and available plugins are made available to the browser window context.
The data provided (per each requested URL handled by the ServiceWorker) is:
- `clientId` – the [Client ID](https://developer.mozilla.org/en-US/docs/Web/API/FetchEvent/clientId) for the request (that is, the Client ID of this browser window)
- `url` – the URL of the request
- `serviceWorker` – the commit SHA of the ServiceWorker that handled the request
- `fetchError` – `null` if the request completed successfully via regular HTTPS; otherwise the error message
- `method` – the method by which the request was completed: "`fetch`" is regular HTTPS `fetch()`, `gun-ipfs` means Gun and IPFS were used, etc.
- `state` – the state of the request (`running`, `error`, `success`)
The code in the browser window context is responsible for keeping a more permanent record of the URLs requested, the methods used, and the status of each, if needed.
## Review of possible resolution/delivery methods
- **[Gun](https//gun.eco/)**
Better suited for resolution than for delivery, although it could handle both. Pretty new project, dynamically developed. No global network of public peers available currently. Content is cryptographically signed.
- **[IPNS](https://docs.ipfs.io/guides/concepts/ipns/)**
Only suitable for resolution. Experimental, not fully functional in the browser yet. Fits like a hand in a glove with IPFS.
- **[DNSLink](https://docs.ipfs.io/guides/concepts/dnslink/)**
Only suitable for resolution. Deployed, stable, and well-documented. Fits like a hand in a glove with IPFS. The downside is that it requires publishing of DNS records to work (every time any new content is published), which means it might be difficult to implement by website admins.
- **[IPFS](https://ipfs.io/)**
Only suitable for delivery, since it is content-addressed. Resolution of a content URI to an IPFS address needs to be handled by some other technology (like Gun or IPNS, or using [gateways](https://ipfs.github.io/public-gateway-checker/)). Deployed and well-documented, with a large community of developers. Redeploying a new content package with certain files unchanged does not change the addresses of the unchanged files, meaning that small changes in content do not lead to the whole content tree needing to be re-seeded.
- **[WebTorrent](https://github.com/webtorrent/webtorrent)**
Only suitable for content delivery. It seems possible to fetch a particular file from a given torrent, so as not to have to download a torrent of the whole website just to display a single page with some CSS and JS. Requires a resolver to point to the newest torrent since torrents are immutable. Even small changes (for example, only a few files changed in the whole website tree) require creating a new torrent and re-seeding, which is obviously less than ideal.
- **Plain files via HTTPS**
This delivery method is obvious if we're talking simply about the originating site and it serving the files, but this can also mean non-standard strategies like pushing static HTML+CSS+JS to CloudFront or Wasabi, and having a minimal resolver kick in if the originating site is unavailable, to fetch content seamlessly from alternative locations (effectively implementing domain fronting and collateral freedom in the browser). However, this will require some thought being put into somehow signing content deployed to third-party locations – perhaps the resolver (like Gun) could be responsible for keeping SHA sums of known good content, or perhaps we should just address it using the hashes, effectively imitating IPFS.
## Limitations
There are certain limitations to what can be done with LibResilient:
### Service worker cannot be updated if origin is down
ServiceWorker script apparently cannot be delivered using any of the transport plugins, [since](https://gist.github.com/Rich-Harris/fd6c3c73e6e707e312d7c5d7d0f3b2f9#the-new-service-worker-isnt-fetched-by-the-old-one):
> when you call `navigator.serviceWorker.register('service-worker.js)` the request for service-worker.js isn't intercepted by any service worker's fetch event handler.
So, the ServiceWorker script will be un-updateable via LibResilient in case the origin site is down, unless we find a way to hack around it with caches etc.
### JS implementations of decentralized protocols are still bootstrapped using servers
Gun and IPFS (and probably other potential LibResilient strategies) still use bootstrapping servers (STUN/TURN, and other kinds of public nodes), so technically it would be possible for all of these to be overwhelmed by traffic also, rendering LibResilient ineffective. This is a limitation of browsers and is related to IPv4 and NATs.
One way to deal with this is to have a large list of such public nodes and send only 2-3 each time LibResilient calls home (including via already working decentralized means), so that the traffic is spread more evenly.
Plus, the ever-increasing adoption of IPv6 will also partially fix this.
You can read more in-depth overview of LibResilient [here](https://resilient.is/docs/ARCHITECTURE/). And [here](https://resilient.is/docs/PHILOSOPHY/) is a document describing the philosophy influencing project goals and relevant technical decisions.
## Related developments

Wyświetl plik

@ -1,9 +1,13 @@
# Architecture
Eventually this will document the architecture of LibResilient.
A [Service Worker](https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API) is used as a way to persist the library after the initial visit to a website that deployed it.
After the Service Worker is downloaded and activated, it handles all `fetch()` events by running plugins in the configured order. These plugins can attempt fetching the resource directly from the website (the [`fetch` plugin](../plugins/fetch/)), or from alternative endpoints (the [`alt-fetch`](../plugins/alt-fetch/) plugin), or using alternative transports (for example, the [`dnslink-ipfs`](../plugins/dnslink-ipfs/) plugin); they can also cache the retrieved content for later (the [`cache` plugin](../plugins/cache/)) or verify that content (like [``](../plugins/basic-integrity/)).
## Plugins
You can find the list of available plugins along with documentation on them [here](../plugins/). You might also want to check out the [Quickstart Guide](./QUICKSTART.md) for a walk-through explanation of how plugin configuration and composition works.
There are three kinds of plugins:
- **Transport plugins**
@ -132,12 +136,31 @@ Invalidation heuristic is rather naïve, and boils down to checking if either of
This is far from ideal and will need improvements in the long-term. The difficulty is that different transport plugins can provide different ways of determining the "*freshness*" of fetched content -- HTTPS-based requests offer `ETag`, `Date`, `Last-Modified`, and other headers that can help with that; whereas IPFS can't really offer much apart from the address which itself is a hash of the content, so at least we know the content is *different* (but is it *fresher* though?).
## Messaging
### Content versioning
The ServiceWorker can communicate with the browser window using the [`Client.postMessage()`](https://developer.mozilla.org/en-US/docs/Web/API/Client/postMessage) to post messages to the browser window context using the relevant [`Client ID`](https://developer.mozilla.org/en-US/docs/Web/API/Client/id), retrieved from the fetch event object.
Content versioning has not been implemented in any plugin yet, but might be necessary at some point. Some delivery mechanisms (IPFS, BitTorrent) might be slow to pick up newly published content, and while information about this might be available, it might be faster to fetch and display older content that has already propagated across multiple peers or network nodes, with a message informing the reader that new content is available and that they might want to retry fetching it.
When the browser window context wants to message the service worker, it uses the [`Worker.postMessage()`](https://developer.mozilla.org/en-US/docs/Web/API/Worker/postMessage) call, with `clientId` field set to the relevant client ID if a response is expected. ServiceWorker then again responds using `Client.postMessage()` using the `clientId` field as source of the `Client ID`.
An important consideration related to content versioning is that it needs to be consistent across a full set of published pieces of content.
### Messages
For example, consider a simple site that consists of an `index.html`, `style.css`, and `script.js`. Non-trivial changes in `index.html` will render older versions of `style.css` and `script.js` broken. A particular version of the whole published site needs to be fetched, otherwise things will not work as expected.
This will probably need to be fleshed out later on, but the initial API needs to be designed in a way where content versioning can be introduced without breaking backwards compatibility with plugins.
## Status information
Status information should be available to users, informing them that the content is being retrieved using non-standard means that might take longer.
LibResilient information is kept per-request in the Service Worker, meaning it is transient and does not survive Service Worker restarts, which might happen multiple times over the lifetime of an open tab. The Service Worker can communicate with the browser window using the [`Client.postMessage()`](https://developer.mozilla.org/en-US/docs/Web/API/Client/postMessage) to post messages to the browser window context using the relevant [`Client ID`](https://developer.mozilla.org/en-US/docs/Web/API/Client/id), retrieved from the fetch event object. This is also how information on Service Worker commit SHAs and available plugins are made available to the browser window context.
The data provided (per each requested URL handled by the Service Worker) is:
- `clientId` – the [Client ID](https://developer.mozilla.org/en-US/docs/Web/API/FetchEvent/clientId) for the request (that is, the Client ID of this browser window)
- `url` – the URL of the request
- `Service Worker` – the commit SHA of the Service Worker that handled the request
- `fetchError` – `null` if the request completed successfully via regular HTTPS; otherwise the error message
- `method` – the method by which the request was completed: "`fetch`" is regular HTTPS `fetch()`, `gun-ipfs` means Gun and IPFS were used, etc.
- `state` – the state of the request (`running`, `error`, `success`)
The code in the browser window context is responsible for keeping a more permanent record of the URLs requested, the methods used, and the status of each, if needed.
When the browser window context wants to message the service worker, it uses the [`Worker.postMessage()`](https://developer.mozilla.org/en-US/docs/Web/API/Worker/postMessage) call, with `clientId` field set to the relevant client ID if a response is expected. Service Worker then again responds using `Client.postMessage()` using the `clientId` field as source of the `Client ID`.
This section is a work in progress.