diff --git a/docs/superpowers/specs/2026-05-17-sqlite-web-app-design.md b/docs/superpowers/specs/2026-05-17-sqlite-web-app-design.md new file mode 100644 index 0000000..2e297d8 --- /dev/null +++ b/docs/superpowers/specs/2026-05-17-sqlite-web-app-design.md @@ -0,0 +1,135 @@ +# SQLite Store and Web App Design + +## Goal + +Move Newsletter Link Catalog from spreadsheet-first output to a SQLite-first local catalog, while preserving spreadsheet output as an optional export/sync path. Add a functional local web app for browsing the catalog. + +## Scope + +This first pass keeps the working CLI ingestion flow. `nlc run` continues to fetch Gmail newsletters, parse links, categorize, enrich when enabled, and update incremental state. The main behavior change is that extracted data is persisted to SQLite by default. + +The web app is intentionally plain and functional. It should make the catalog useful before investing in visual polish. + +## Architecture + +Add a persistence layer alongside the existing output writers: + +- `DatabaseWriter` persists run results into SQLite. +- Spreadsheet writers remain available through the existing config-driven write path during `nlc run`. +- Web routes read from SQLite and do not re-run Gmail ingestion. +- Existing parsing, filtering, categorization, enrichment, state, and OAuth modules stay in place. + +The CLI remains the control surface for ingestion: + +- `nlc run` writes to SQLite by default. +- `nlc run` may also write Excel and/or Google Sheets if configured. +- `nlc serve` starts a local web server against the configured SQLite database. + +## SQLite Database + +Default database path: `./data/newsletter-catalog.sqlite`, configurable as `database.path`. + +Tables: + +- `newsletters`: newsletter identity and display name. +- `issues`: one row per processed email issue, linked to a newsletter. +- `links`: canonical URL-level records. +- `link_occurrences`: per-issue link appearances, including category, title, description, page metadata, and also-in text. +- `sponsors`: sponsor/ad appearances, linked to issue and URL where possible. +- `dead_links`: dead or unreachable link records. +- `runs`: import run history with counts, timestamps, mode, and error counts. + +Constraints: + +- URLs are deduplicated in `links` by normalized URL. +- Link occurrences are deduplicated per issue by normalized URL. +- Processed Gmail message IDs remain tracked by the existing state file for now, avoiding a risky migration in this first pass. + +## Configuration + +Extend `config.yaml`: + +```yaml +database: + enabled: true + path: "./data/newsletter-catalog.sqlite" +``` + +Behavior: + +- If `database.enabled` is omitted, it defaults to `true`. +- Spreadsheet output remains controlled by the existing `output.excel.enabled` and `output.sheets_api.enabled` settings. +- If all outputs are disabled, `nlc run` still writes to SQLite. + +## Web App + +Add `nlc serve [flags]`. + +Flags: + +- `--config PATH`, default `./config.yaml` +- `--host HOST`, default `127.0.0.1` +- `--port PORT`, default `3000` + +Views: + +- Dashboard: counts for newsletters, issues, links, sponsors, dead links, and recent runs. +- Links: searchable table of content links with filters for newsletter, category, and date range. +- Newsletter detail: issues and links for one newsletter. +- Sponsored links: global sponsor table. +- Dead links: global dead/unreachable table. +- Runs: recent import run summaries. + +The first UI can use server-rendered HTML with minimal CSS. No SPA framework is required. + +## Data Flow + +Import: + +1. Gmail fetches configured label messages. +2. Existing parser extracts links and sponsor entries. +3. Existing cleanup/categorization/enrichment pipeline prepares rows. +4. `DatabaseWriter` upserts newsletters, issues, links, occurrences, sponsors, dead links, and run summary. +5. Optional spreadsheet writers run after database persistence when configured. + +Web: + +1. `nlc serve` opens the SQLite database. +2. Request handlers query read-only catalog views. +3. HTML pages render tables and simple filters. + +## Error Handling + +- Database open/migration errors are critical and stop `nlc run` or `nlc serve`. +- A single failed row insert during import increments the run error count and stops the database transaction for that run. +- SQLite writes use transactions so partial import data is not committed on failure. +- Web route errors return a simple 500 page and log the error. + +## Testing + +Add tests for: + +- Config defaults for `database.enabled` and `database.path`. +- Schema migration creates expected tables. +- Database writer inserts newsletters, issues, links, sponsors, and dead links. +- Duplicate normalized URLs are deduplicated while occurrences remain per issue. +- `nlc run` writes to SQLite by default even when spreadsheet outputs are disabled. +- Web routes render dashboard, links, sponsored links, dead links, and run history using a fixture database. +- Existing spreadsheet writer tests continue to pass. + +## Out of Scope for This Pass + +- Browser-based Gmail OAuth setup. +- Running Gmail imports from a web button. +- User accounts or remote hosting. +- Full visual design polish. +- Migrating old JSON state into SQLite. +- Removing Excel or Google Sheets support. + +## Acceptance Criteria + +- `nlc run` creates/updates the SQLite database by default. +- `nlc serve` starts a local web app and displays imported catalog data. +- Spreadsheet output still works when enabled. +- Existing CLI validations remain green. +- Tests cover database persistence and web read paths without live Gmail or Google credentials.