Reorganize docs by project and archive legacy context files

This commit is contained in:
2026-03-13 05:47:23 +00:00
parent 8d323c9393
commit 99df060f51
13 changed files with 132 additions and 20 deletions
+5
View File
@@ -0,0 +1,5 @@
# Archive
This folder stores superseded context docs.
- `legacy-context-2026-03-13/`: initial ungrouped context/runbook files moved during project-based reorganization.
@@ -0,0 +1,31 @@
# Architecture
## Request flow
1. User query enters SearXNG (`search.sethpc.xyz`).
2. SearXNG calls `json_engine` endpoint at SethSearch API.
3. SethSearch queries local SQLite FTS5 index and returns normalized results.
4. SearXNG merges SethSearch with other engines and renders the result page.
## Data plane
- Index DB: `/opt/sethsearch/articles.db`
- Tables:
- `documents` (canonical indexed records)
- `documents_fts` (FTS5 virtual table)
- Source-level scoring and matching occur in SethSearch.
## Source adapters
- Caddy snapshot parser: domain discovery and tag generation.
- Gitea adapter: public repo metadata via REST.
- Wiki.js adapter: public crawl with fallback records.
- WordPress adapter: public posts/pages via `/wp-json/wp/v2/...`.
- Emby adapter: media index using server API token and deep links.
- FreshRSS adapter: GReader API article ingest.
## Reliability model
- SethSearch syncs sources independently.
- If one source fails, others continue and commit.
- Service runs under systemd with restart policy.
@@ -0,0 +1,56 @@
# SearchXNG Context
Last updated: 2026-03-13 05:27:14 UTC
## Homelab placement
- Cluster: `sethpc`
- SearXNG:
- CT: `119`
- Node: `pve173`
- URL: `https://searxng.sethpc.xyz` and `https://search.sethpc.xyz`
- Config: `/etc/searxng/settings.yml`
- SethSearch API:
- CT: `620`
- Node: `pve173`
- URL: `https://sethsearch.sethpc.xyz`
- Service: `sethsearch.service`
- App path: `/opt/sethsearch/sethsearch.py`
- Config: `/opt/sethsearch/config.json`
- Caddy:
- CT: `600`
- Node: `pve241`
- Config: `/etc/caddy/Caddyfile`
## Search engines in use
- `sethsearch` (`shortcut: ss`, category: `general`)
- URL: `https://sethsearch.sethpc.xyz/search?q={query}&source=general&limit=40`
- Weight: `5.0`
- `sethflix` (`shortcut: sfx`, category: `videos`)
- URL: `https://sethsearch.sethpc.xyz/search?q={query}&source=sethflix&limit=40`
- Weight: `5.0`
- `libretranslate` (`shortcut: lt`)
- Base URL: `https://translate.sethpc.xyz`
## SethSearch sources
- `sites`: Caddy host/domain catalog with tags.
- `gitea`: public repositories.
- `wikijs`: public crawl/fallback page catalog.
- `wordpress`: public pages/posts from `sethfreiberg.com`.
- `emby`: media discovery index (links require account session).
- `freshrss`: article index with stricter matching and lower weight.
## Matching policy
- General (`source=general`): includes Emby with stricter matching.
- Sethflix (`source=sethflix`): Emby only with liberal matching.
- FreshRSS: strict term matching and lower source weight.
## API endpoints
- Health: `GET /health`
- Search: `GET /search?q=<query>&source=<group|source>&limit=<n>`
- Stats: `GET /stats`
- Manual sync: `POST /sync`
@@ -0,0 +1,36 @@
# Operations Runbook
## Common commands
- SethSearch service status:
- `ssh pve173 "pct exec 620 -- systemctl status sethsearch --no-pager"`
- SethSearch logs:
- `ssh pve173 "pct exec 620 -- journalctl -u sethsearch -n 100 --no-pager"`
- SearXNG service status:
- `ssh pve173 "pct exec 119 -- systemctl status searxng --no-pager"`
- SearXNG logs:
- `ssh pve173 "pct exec 119 -- journalctl -u searxng -n 100 --no-pager"`
## Verify behavior
- General search endpoint:
- `curl -s "https://sethsearch.sethpc.xyz/search?q=home&source=general&limit=5"`
- Sethflix endpoint:
- `curl -s "https://sethsearch.sethpc.xyz/search?q=always%20sunny&source=sethflix&limit=5"`
- Stats:
- `curl -s "https://sethsearch.sethpc.xyz/stats"`
## Config touchpoints
- SethSearch config: `/opt/sethsearch/config.json`
- SethSearch code: `/opt/sethsearch/sethsearch.py`
- SearXNG config: `/etc/searxng/settings.yml`
- Caddy config: `/etc/caddy/Caddyfile`
## Change protocol
1. Edit SethSearch code/config.
2. Restart SethSearch and verify `/health` and `/stats`.
3. Edit SearXNG engines (if needed).
4. Restart SearXNG and verify `/config` engine list.
5. Validate top query use-cases.
@@ -0,0 +1,24 @@
# SethSearch API Layer
## Live deployment
- Host CT: 620 (`sethsearch-api`)
- URL: `https://sethsearch.sethpc.xyz`
- App: `/opt/sethsearch/sethsearch.py`
- Config: `/opt/sethsearch/config.json`
## Source groups
- `source=general`: sites, gitea, wikijs, wordpress, freshrss, emby (strict)
- `source=sethflix`: emby (liberal)
## Weighting overview
- Higher: sites, gitea, wikijs, wordpress, emby
- Lower + strict: freshrss
## Maintenance
- Manual re-index: `POST /sync`
- Health check: `GET /health`
- Index summary: `GET /stats`
@@ -0,0 +1,19 @@
# SearXNG Layer
This folder documents SearXNG-side integration with SethSearch.
## Active custom engines
- `sethsearch` (general, highest weight)
- `sethflix` (videos, Emby-only)
- `libretranslate` (translate)
## Live config location
- `/etc/searxng/settings.yml` in CT 119 on `pve173`
## Important notes
- SearXNG blocks plain HTTP in engine requests; use HTTPS endpoints.
- Engine names should be lowercase to avoid startup warnings.
- `use_default_settings: true` allows small override file patterns.