planetiler/ARCHITECTURE.md

141 wiersze
8.5 KiB
Markdown

# Planetiler Architecture
![Architecture Diagram](diagrams/architecture.png)
Planetiler builds a map in 3 phases:
1. [Process Input Files](#1-process-input-files) according to the [Profile](#profiles) and write vector tile features to
intermediate files on disk
2. [Sort Features](#2-sort-features) by tile ID
3. [Emit Vector Tiles](#3-emit-vector-tiles) by iterating through sorted features to group by tile ID, encoding, and
writing to the output tile archive
User-defined [profiles](#profiles) customize the behavior of each part of this pipeline.
## 1) Process Input Files
First, Planetiler
reads [SourceFeatures](planetiler-core/src/main/java/com/onthegomap/planetiler/reader/SourceFeature.java)
from each input source:
- For "simple
sources" [NaturalEarthReader](planetiler-core/src/main/java/com/onthegomap/planetiler/reader/NaturalEarthReader.java)
or [ShapefileReader](planetiler-core/src/main/java/com/onthegomap/planetiler/reader/ShapefileReader.java) can get the
latitude/longitude geometry directly from each feature
- For OpenStreetMap `.osm.pbf` files,
[OsmReader](planetiler-core/src/main/java/com/onthegomap/planetiler/reader/osm/OsmReader.java)
needs to make 2 passes through the input file to construct feature geometries:
- pass 1:
- nodes: store node latitude/longitude locations in-memory or on disk
using [LongLongMap](planetiler-core/src/main/java/com/onthegomap/planetiler/collection/LongLongMap.java)
- ways: nothing
- relations: call `preprocessOsmRelation` on the [profile](#profiles) and store information returned for each
relation of
interest, along with relation member IDs in-memory using
a [LongLongMultimap](planetiler-core/src/main/java/com/onthegomap/planetiler/collection/LongLongMultimap.java).
- pass 2:
- nodes: emit a point source feature
- ways:
- lookup the latitude/longitude for each node ID to get the way geometry and relations that the way is contained
in
- emit a source feature with the reconstructed geometry which can either be a line or polygon, depending on
the `area` tag and whether the way is closed
- if this way is part of a multipolygon, also save the way geometry in-memory for later in
a [LongLongMultimap](planetiler-core/src/main/java/com/onthegomap/planetiler/collection/LongLongMultimap.java)
- relations: for any multipolygon relation, fetch the member geometries and attempt to reconstruct the multipolygon
geometry
using [OsmMultipolygon](planetiler-core/src/main/java/com/onthegomap/planetiler/reader/osm/OsmMultipolygon.java),
then emit a polygon source feature with the reconstructed geometry if successful
Then, for each [SourceFeature](planetiler-core/src/main/java/com/onthegomap/planetiler/reader/SourceFeature.java),
generate vector tile features according to the [profile](#profiles) in a worker thread (default 1 per core):
- Call `processFeature` method on the [profile](#profiles) for each source feature
- For every vector tile feature added to
the [FeatureCollector](planetiler-core/src/main/java/com/onthegomap/planetiler/FeatureCollector.java):
- Call [FeatureRenderer#accept](planetiler-core/src/main/java/com/onthegomap/planetiler/render/FeatureRenderer.java)
which for each zoom level the feature appears in:
- Scale the geometry to that zoom level
- Simplify it in screen pixel coordinates
- Use [TiledGeometry](planetiler-core/src/main/java/com/onthegomap/planetiler/render/TiledGeometry.java)
to slice the geometry into subcomponents that appear in every tile it touches using the stripe clipping algorithm
derived from [geojson-vt](https://github.com/mapbox/geojson-vt):
- `sliceX` splits the geometry into vertical slices for each "column" representing the X coordinate of a vector
tile
- `sliceY` splits each "column" into "rows" representing the Y coordinate of a vector tile
- Uses an [IntRangeSet](planetiler-core/src/main/java/com/onthegomap/planetiler/collection/IntRangeSet.java) to
optimize processing for large filled areas (like oceans)
- If any features wrapped past -180 or 180 degrees longitude, repeat with a 360 or -360 degree offset
- Reassemble each vector tile geometry and round to tile precision (4096x4096)
- For
polygons, [GeoUtils#snapAndFixPolygon](planetiler-core/src/main/java/com/onthegomap/planetiler/geo/GeoUtils.java)
uses [JTS](https://github.com/locationtech/jts) utilities to fix any topology errors (i.e. self-intersections)
introduced by rounding. This is very expensive, but necessary since clients
like [MapLibre GL JS](https://github.com/maplibre/maplibre-gl-js) produce rendering artifacts for invalid
polygons.
- Encode the feature into compact binary format
using [FeatureGroup#newRenderedFeatureEncoder](planetiler-core/src/main/java/com/onthegomap/planetiler/collection/FeatureGroup.java)
consisting of a sortable 64-bit `long` key (zoom, x, y, layer, sort order) and a binary value encoded
using [MessagePack](https://msgpack.org/) (feature group/limit, feature ID, geometry type, tags, geometry)
- Add the encoded feature to
a [WorkQueue](planetiler-core/src/main/java/com/onthegomap/planetiler/worker/WorkQueue.java)
Finally, a single-threaded writer reads encoded features off of the work queue and writes them to disk
using [ExternalMergeSort#add](planetiler-core/src/main/java/com/onthegomap/planetiler/collection/ExternalMergeSort.java)
- Write features to a "chunk" file until that file hits a size limit (i.e. 1GB) then start writing to a new file
## 2) Sort Features
[ExternalMergeSort](planetiler-core/src/main/java/com/onthegomap/planetiler/collection/ExternalMergeSort.java) sorts all
of the intermediate features using a worker thread per core:
- Read each "chunk" file into memory
- Sort the features it contains by 64-bit `long` key
- Write the chunk back to disk
## 3) Emit Vector Tiles
[TileArchiveWriter](planetiler-core/src/main/java/com/onthegomap/planetiler/archive/TileArchiveWriter.java) is the main
driver.
First, a single-threaded reader reads features from disk:
- [ExternalMergeSort](planetiler-core/src/main/java/com/onthegomap/planetiler/collection/ExternalMergeSort.java) emits
sorted features by doing a k-way merge using a priority queue of iterators reading from each sorted chunk
- [FeatureGroup](planetiler-core/src/main/java/com/onthegomap/planetiler/collection/FeatureGroup.java) collects
consecutive features in the same tile into a `TileFeatures` instance, dropping features in the same group over the
grouping limit to limit point label density
- Then [TileArchiveWriter](planetiler-core/src/main/java/com/onthegomap/planetiler/archive/TileArchiveWriter.java)
groups tiles
into variable-sized batches for workers to process (complex tiles get their own batch to ensure workers stay busy
while the writer thread waits for finished tiles in order)
Then, process tile batches in worker threads (default 1 per core):
- For each tile in the batch, first check to see if it has the same contents as the previous tile to avoid re-encoding
the same thing for large filled areas (i.e. oceans)
- Encode using [VectorTile](planetiler-core/src/main/java/com/onthegomap/planetiler/VectorTile.java)
- gzip each encoded tile
- Pass the batch of encoded vector tiles to the writer thread
Finally, a single-threaded writer writes encoded vector tiles to the output archive format:
- For MBTiles, create the largest prepared statement supported by SQLite (999 parameters)
- Iterate through finished vector tile batches until the prepared statement is full, flush to disk, then repeat
- Then flush any remaining tiles at the end
## Profiles
To customize the behavior of this pipeline, custom profiles implement
the [Profile](planetiler-core/src/main/java/com/onthegomap/planetiler/Profile.java) interface to override:
- what vector tile features to generate from an input feature
- what information from OpenStreetMap relations we need to save for later use
- how to post-process vector features grouped into a tile before emitting
A Java project can implement this interface and add arbitrarily complex processing when overriding the methods.
The [custommap](planetiler-custommap) project defines
a [ConfiguredProfile](planetiler-custommap/src/main/java/com/onthegomap/planetiler/custommap/ConfiguredProfile.java)
implementation that loads instructions from a YAML config file to dynamically control how schemas are generated without
needing to write or compile Java code.