tweak wording and minor details relating to preview queries

pull/15/head
cblgh 2022-11-22 14:08:44 +01:00
rodzic 7c6a63ce2c
commit 9517f62de2
3 zmienionych plików z 15 dodań i 15 usunięć

Wyświetl plik

@ -41,9 +41,9 @@ func getAboutHeuristics(path string) []string {
func getPreviewQueries(path string) []string { func getPreviewQueries(path string) []string {
previewQueries := util.ReadList(path, "\n") previewQueries := util.ReadList(path, "\n")
if len(previewQueries) > 0 { if len(previewQueries) > 0 {
return previewQueries; return previewQueries
} else { } else {
return []string{"main p", "article p", "section p", "p"}; return []string{"main p", "article p", "section p", "p"}
} }
} }

Wyświetl plik

@ -123,21 +123,19 @@ are stopped from entering the search index. The default wordlist consists of the
interesting concepts and verbs—such as `reading` and `books`, for example. interesting concepts and verbs—such as `reading` and `books`, for example.
#### `previewQueryList` #### `previewQueryList`
A list of css selectors (one per line) to fetch preview paragraphs, A list of css selectors—one per line—used to fetch preview paragraphs. The first paragraph
the first paragraph found that passes a check against the `heuristics` file makes found passing a check against the `heuristics` file makes it into the search index. For
it into the search index. For each selector lieu tries the first four paragraphs each selector in `previewQueryList`, Lieu tries the first four paragraphs—as found by the
found with each selector before skipping to the next one. selector—before trying to find a new set of paragraphs using the file's next selector.
To get good results one usually wants to tune this to getting the first "real" paragraph To get good results, one usually wants to tune this list to getting the first "real" paragraph
after the header, or a summary paragraph if provided. It is also worth trying to avoind getting after common page headers, or finding a summary paragraph. The default has been, at the time of
irelevant paragraphs as they clutter up your index and results, lieu will fall back to other writing, tuned for use with the [Fediring](https://fediring.net).
preview sources.
The default has been (at the time of writing) tuned for use with the Fediring. Depending on the structure of the websites you are indexing, this will get you 70-90% of the
way in terms of accurate link descriptions. For the rest of the way, fine-tune `heuristics.txt`
Depending on how well the websites you are indexing are with semantic HTML this will and reach out the creators of the websites you are indexing; they often appreciate the
get you the 70 to 90% solution. For the rest use heuristics and contact the creators of the feedback.
websites you are tring to index, they (usually) appreciate the feedback.
#### OpenSearch metadata #### OpenSearch metadata
If you are running your own instance of Lieu, you might want to look into changing the URL If you are running your own instance of Lieu, you might want to look into changing the URL

Wyświetl plik

@ -196,6 +196,8 @@ bannedSuffixes = "data/banned-suffixes.txt"
boringWords = "data/boring-words.txt" boringWords = "data/boring-words.txt"
# domains that won't be output as outgoing links # domains that won't be output as outgoing links
boringDomains = "data/boring-domains.txt" boringDomains = "data/boring-domains.txt"
# queries to search for finding preview text
previewQueryList = "data/preview-query-list.txt"
`) `)
err := ioutil.WriteFile("lieu.toml", conf, 0644) err := ioutil.WriteFile("lieu.toml", conf, 0644)
Check(err) Check(err)