dnslink-fetch: initial but pretty complete code and README (ref. #63)

merge-requests/17/head
Michał 'rysiek' Woźniak 2022-10-21 23:24:43 +00:00
rodzic 0dc88d5226
commit 64cf1f4b42
2 zmienionych plików z 279 dodań i 0 usunięć

Wyświetl plik

@ -0,0 +1,57 @@
# Plugin: `dnslink-fetch`
- **status**: alpha
- **type**: [transport plugin](../../docs/ARCHITECTURE.md#transport-plugins)
This transport plugin uses standard [`fetch()`](https://developer.mozilla.org/en-US/docs/Web/API/fetch) to retrieve remote content from alternative endpoints — that is, HTTPS endpoints that are not in the original domain. This enables retrieving content even if the website on the original domain is down for whatever reason. The list of alternative endpoints is itself retrieved using [DNSLink](https://dnslink.org/) for the original domain.
Compare: [`alt-fetch`](../alt-fetch/).
As per LibResilient architecture, this plugin adds `X-LibResilient-Method` and `X-LibResilient-ETag` headers to the returned response.
## Configuration
The `dnslink-fetch` plugin supports the following configuration options:
- `concurrency` (default: 3)
Number of alternative endpoints to attempt fetching from simultaneously.
If the number of available alternative endpoints is *lower* then `concurrency`, all are used for each request. If it is *higher*, only `concurrency` of them, chosen at random, are used for any given request.
- `dohProvider` (default: "`https://dns.google/resolve`")
DNS-over-HTTPS JSON API provider/endpoint to query when resolving the DNSLink. By default using Google's DoH provider. Other options:
- "`https://cloudflare-dns.com/dns-query`"
CloudFlare's DoH JSON API endpoint
- "`https://mozilla.cloudflare-dns.com/dns-query`"
Mozilla's DoH JSON API endpoint, operated in co-operation with CloudFlare.
- `ecsMasked` (default: `true`)
Should the [EDNS Client Subnet](https://en.wikipedia.org/wiki/EDNS_Client_Subnet) be masked from authoritative DNS servers for privacy. See also: `edns_client_subnet` [parameter of the DoH JSON API](https://developers.google.com/speed/public-dns/docs/doh/json#supported_parameters).
## Operation
When fetching an URL, `dnslink-fetch` removes the scheme and domain component. Then, for each alternative endpoint that is used for this particular request (up to `concurrency` of endpoints, as described above), it concatenates the endpoint with the remaining URL part. Finally, it performs a [`fetch()`](https://developer.mozilla.org/en-US/docs/Web/API/fetch) request for every URL construed in such a way.
Let's say the plugin is deployed for website `https://example.com`, with `concurrency` set to `2` and these are the alternative endpoints specified in DNS according to the DNSLink specification (so, in [multiaddr form](https://github.com/multiformats/multiaddr#encapsulation-based-on-context)):
- `/https/example.org`
- `/https/example.net/alt-example`
- `/https/eu.example.cloud`
- `/https/us.example.cloud`
***Notice**: `dnslink-fetch` currently only supports a rudimentary, naïve form of [multiaddr](https://multiformats.io/multiaddr/) addresses, which is `/https/domain_name[/optional/path]`; full mutiaddr support might be implemented at a later date.*
A visitor, who has visited the `https://example.com` website at least once before (and so, LibResilient is loaded and working), tries to access it. For whatever reason, the `https://example.com` site is down or otherwise inaccessible, and so the `dnslink-fetch` plugin kicks in.
The request for `https://example.com/index.html` is being handled thus:
1. scheme and domain removed: `index.html`
2. two (based on `concurrency` setting) random alternative endpoints selected:
- `/https/example.net/alt-example`
- `/https/example.org`
3. resolve endpoint multiaddrs to URL of each endpoint:
- `https://example.net/alt-example/`
- `https://example.org/`
4. `fetch()` request issued simultaneously for URL (so, alternative endpoint concatenated with the path from hte original request):
- `https://example.net/alt-example/index.html`
- `https://example.org/index.html`
5. the first successful response from either gets returned as the response for the whole plugin call.

Wyświetl plik

@ -0,0 +1,222 @@
/* ========================================================================= *\
|* === HTTP(S) fetch() from alternative endpoints === *|
\* ========================================================================= */
/**
* this plugin does not implement any push method
*
* NOTICE: this plugin uses Promise.any()
* https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/any
* the polyfill is implemented in LibResilient's service-worker.js
*/
// no polluting of the global namespace please
(function(LRPC){
// this never changes
const pluginName = "dnslink-fetch"
LRPC.set(pluginName, (LR, init={})=>{
/*
* plugin config settings
*/
// sane defaults
let defaultConfig = {
// how many simultaneous connections to different endpoints do we want
//
// more concurrency means higher chance of a request succeeding
// but uses more bandwidth and other resources;
//
// 3 seems to be a reasonable default
concurrency: 3,
// DNS-over-HTTPS JSON API provider
// using Google's DoH provider; other options:
// 'https://cloudflare-dns.com/dns-query'
// 'https://mozilla.cloudflare-dns.com/dns-query'
dohProvider: 'https://dns.google/resolve',
// should the EDNS Client Subnet be masked from authoritative DNS servers for privacy?
// - https://en.wikipedia.org/wiki/EDNS_Client_Subnet
// - https://developers.google.com/speed/public-dns/docs/doh/json#supported_parameters
ecsMasked: true
}
// merge the defaults with settings from the init var
let config = {...defaultConfig, ...init}
/**
* retrieving the alternative endpoints list from dnslink
*
* returns an array of strings, each being a valid endpoint, in the form of
* scheme://example.org[/optional/path]
*/
let resolveEndpoints = async (domain) => {
// pretty self-explanatory:
// DoH provider, _dnslink label in the domain, TXT type, pretty please
var query = `${config.dohProvider}?name=_dnslink.${domain}&type=TXT`
// do we want to mask the EDNS Client Subnet?
//
// this protects user privacy somewhat by telling the DoH provider not to disclose
// the subnet from which the DNS request came to authoritiative nameservers
if (config.ecsMasked) {
query += '&edns_client_subnet=0.0.0.0/0'
}
// make the query, get the response
var response = await fetch(
query, {
headers: {
'accept': 'application/json',
}
})
.then(r=>r.json())
// only Status == 0 is acceptable
// https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-6
if (response.Status != 0) {
throw new Error(`DNS request failure, status code: ${response.Status}`)
}
// we also do need the Answer section please
if (!('Answer' in response)) {
throw new Error(`DNS response did not contain an Answer section`)
}
// only get TXT records, and extract the data from them
response = response
.Answer
.filter(r => r.type == 16)
.map(r => r.data);
// did we get anything of value? anything at all?
if (response.length < 1) {
throw new Error(`Answer section of the DNS response did not contain any TXT records`)
}
// filter by 'dnslink="/https?/', morph into scheme://...
let re = /^dnslink=\/(https?)\/(.+)/
response = response
.filter(r => re.test(r))
.map(r => r.replace(re, "$1:\/\/$2"));
// do we have anything to work with?
if (response.length < 1) {
throw new Error(`No TXT record contained http or https endpoint definition`)
}
// in case we need some debugging
LR.log(pluginName, '+-- alternative endpoints from DNSLink:\n - ', response.join('\n - '))
// this should be what we're looking for - an array of URLs
return response
}
/**
* getting content using regular HTTP(S) fetch()
*/
let fetchContentFromAlternativeEndpoints = async (url, init={}) => {
// remove the https://original.domain/ bit to get the relative path
// TODO: this assumes that URLs we handle are always relative to the root
// TODO: of the original domain, this needs to be documented
url = url.replace(/https?:\/\//, '').split('/')
var domain = url.shift()
var path = url.join('/')
LR.log(pluginName, '+-- fetching:\n',
` - domain: ${domain}\n`,
` - path: ${path}\n`
)
// we really want to make fetch happen, Regina!
// TODO: this change should *probably* be handled on the Service Worker level
init.cache = 'reload'
// we don't want to modify the original endpoints array
var sourceEndpoints = await resolveEndpoints(domain)
// if we have fewer than the configured concurrency or just as many, use all of them
if (sourceEndpoints.length <= config.concurrency) {
var useEndpoints = sourceEndpoints
// otherwise get `config.concurrency` endpoints at random
} else {
var useEndpoints = new Array()
while (useEndpoints.length < config.concurrency) {
useEndpoints.push(
sourceEndpoints
.splice(Math.floor(Math.random() * sourceEndpoints.length), 1)[0]
)
}
}
// add the rest of the path to each endpoint
useEndpoints.forEach((endpoint, index) => {
useEndpoints[index] = endpoint + '/' + path;
});
// debug log
LR.log(pluginName, `+-- fetching from alternative endpoints:\n - ${useEndpoints.join('\n - ')}`)
return Promise.any(
useEndpoints.map(
u=>fetch(u, init)
))
.then((response) => {
// 4xx? 5xx? that's a paddlin'
if (response.status >= 400) {
// throw an Error to fall back to other plugins:
throw new Error('HTTP Error: ' + response.status + ' ' + response.statusText);
}
// all good, it seems
LR.log(pluginName, "fetched:", response.url);
// we need to create a new Response object
// with all the headers added explicitly,
// since response.headers is immutable
var responseInit = {
status: response.status,
statusText: response.statusText,
headers: {},
url: url
};
response.headers.forEach(function(val, header){
responseInit.headers[header] = val;
});
// add the X-LibResilient-* headers to the mix
responseInit.headers['X-LibResilient-Method'] = pluginName
// we will not have it most of the time, due to CORS rules:
// https://developer.mozilla.org/en-US/docs/Glossary/CORS-safelisted_response_header
responseInit.headers['X-LibResilient-ETag'] = response.headers.get('ETag')
if (responseInit.headers['X-LibResilient-ETag'] === null) {
// far from perfect, but what are we going to do, eh?
responseInit.headers['X-LibResilient-ETag'] = response.headers.get('last-modified')
}
// return the new response, using the Blob from the original one
return response
.blob()
.then((blob) => {
return new Response(
blob,
responseInit
)
})
})
}
// return the plugin data structure
return {
name: pluginName,
description: 'HTTP(S) fetch() using alternative endpoints retrieved via DNSLink',
version: 'COMMIT_UNKNOWN',
fetch: fetchContentFromAlternativeEndpoints
}
})
// done with not polluting the global namespace
})(LibResilientPluginConstructors)