kopia lustrzana https://github.com/cblgh/lieu
tweak wording and minor details relating to preview queries
rodzic
7c6a63ce2c
commit
9517f62de2
|
@ -41,9 +41,9 @@ func getAboutHeuristics(path string) []string {
|
||||||
func getPreviewQueries(path string) []string {
|
func getPreviewQueries(path string) []string {
|
||||||
previewQueries := util.ReadList(path, "\n")
|
previewQueries := util.ReadList(path, "\n")
|
||||||
if len(previewQueries) > 0 {
|
if len(previewQueries) > 0 {
|
||||||
return previewQueries;
|
return previewQueries
|
||||||
} else {
|
} else {
|
||||||
return []string{"main p", "article p", "section p", "p"};
|
return []string{"main p", "article p", "section p", "p"}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -123,21 +123,19 @@ are stopped from entering the search index. The default wordlist consists of the
|
||||||
interesting concepts and verbs—such as `reading` and `books`, for example.
|
interesting concepts and verbs—such as `reading` and `books`, for example.
|
||||||
|
|
||||||
#### `previewQueryList`
|
#### `previewQueryList`
|
||||||
A list of css selectors (one per line) to fetch preview paragraphs,
|
A list of css selectors—one per line—used to fetch preview paragraphs. The first paragraph
|
||||||
the first paragraph found that passes a check against the `heuristics` file makes
|
found passing a check against the `heuristics` file makes it into the search index. For
|
||||||
it into the search index. For each selector lieu tries the first four paragraphs
|
each selector in `previewQueryList`, Lieu tries the first four paragraphs—as found by the
|
||||||
found with each selector before skipping to the next one.
|
selector—before trying to find a new set of paragraphs using the file's next selector.
|
||||||
|
|
||||||
To get good results one usually wants to tune this to getting the first "real" paragraph
|
To get good results, one usually wants to tune this list to getting the first "real" paragraph
|
||||||
after the header, or a summary paragraph if provided. It is also worth trying to avoind getting
|
after common page headers, or finding a summary paragraph. The default has been, at the time of
|
||||||
irelevant paragraphs as they clutter up your index and results, lieu will fall back to other
|
writing, tuned for use with the [Fediring](https://fediring.net).
|
||||||
preview sources.
|
|
||||||
|
|
||||||
The default has been (at the time of writing) tuned for use with the Fediring.
|
Depending on the structure of the websites you are indexing, this will get you 70-90% of the
|
||||||
|
way in terms of accurate link descriptions. For the rest of the way, fine-tune `heuristics.txt`
|
||||||
Depending on how well the websites you are indexing are with semantic HTML this will
|
and reach out the creators of the websites you are indexing; they often appreciate the
|
||||||
get you the 70 to 90% solution. For the rest use heuristics and contact the creators of the
|
feedback.
|
||||||
websites you are tring to index, they (usually) appreciate the feedback.
|
|
||||||
|
|
||||||
#### OpenSearch metadata
|
#### OpenSearch metadata
|
||||||
If you are running your own instance of Lieu, you might want to look into changing the URL
|
If you are running your own instance of Lieu, you might want to look into changing the URL
|
||||||
|
|
|
@ -196,6 +196,8 @@ bannedSuffixes = "data/banned-suffixes.txt"
|
||||||
boringWords = "data/boring-words.txt"
|
boringWords = "data/boring-words.txt"
|
||||||
# domains that won't be output as outgoing links
|
# domains that won't be output as outgoing links
|
||||||
boringDomains = "data/boring-domains.txt"
|
boringDomains = "data/boring-domains.txt"
|
||||||
|
# queries to search for finding preview text
|
||||||
|
previewQueryList = "data/preview-query-list.txt"
|
||||||
`)
|
`)
|
||||||
err := ioutil.WriteFile("lieu.toml", conf, 0644)
|
err := ioutil.WriteFile("lieu.toml", conf, 0644)
|
||||||
Check(err)
|
Check(err)
|
||||||
|
|
Ładowanie…
Reference in New Issue