Files
Newsletter-Link-Catalog/docs/superpowers/specs/2026-05-17-sqlite-web-app-design.md
2026-05-17 14:05:25 -05:00

5.1 KiB

SQLite Store and Web App Design

Goal

Move Newsletter Link Catalog from spreadsheet-first output to a SQLite-first local catalog, while preserving spreadsheet output as an optional export/sync path. Add a functional local web app for browsing the catalog.

Scope

This first pass keeps the working CLI ingestion flow. nlc run continues to fetch Gmail newsletters, parse links, categorize, enrich when enabled, and update incremental state. The main behavior change is that extracted data is persisted to SQLite by default.

The web app is intentionally plain and functional. It should make the catalog useful before investing in visual polish.

Architecture

Add a persistence layer alongside the existing output writers:

  • DatabaseWriter persists run results into SQLite.
  • Spreadsheet writers remain available through the existing config-driven write path during nlc run.
  • Web routes read from SQLite and do not re-run Gmail ingestion.
  • Existing parsing, filtering, categorization, enrichment, state, and OAuth modules stay in place.

The CLI remains the control surface for ingestion:

  • nlc run writes to SQLite by default.
  • nlc run may also write Excel and/or Google Sheets if configured.
  • nlc serve starts a local web server against the configured SQLite database.

SQLite Database

Default database path: ./data/newsletter-catalog.sqlite, configurable as database.path.

Tables:

  • newsletters: newsletter identity and display name.
  • issues: one row per processed email issue, linked to a newsletter.
  • links: canonical URL-level records.
  • link_occurrences: per-issue link appearances, including category, title, description, page metadata, and also-in text.
  • sponsors: sponsor/ad appearances, linked to issue and URL where possible.
  • dead_links: dead or unreachable link records.
  • runs: import run history with counts, timestamps, mode, and error counts.

Constraints:

  • URLs are deduplicated in links by normalized URL.
  • Link occurrences are deduplicated per issue by normalized URL.
  • Processed Gmail message IDs remain tracked by the existing state file for now, avoiding a risky migration in this first pass.

Configuration

Extend config.yaml:

database:
  enabled: true
  path: './data/newsletter-catalog.sqlite'

Behavior:

  • If database.enabled is omitted, it defaults to true.
  • Spreadsheet output remains controlled by the existing output.excel.enabled and output.sheets_api.enabled settings.
  • If all outputs are disabled, nlc run still writes to SQLite.

Web App

Add nlc serve [flags].

Flags:

  • --config PATH, default ./config.yaml
  • --host HOST, default 127.0.0.1
  • --port PORT, default 3000

Views:

  • Dashboard: counts for newsletters, issues, links, sponsors, dead links, and recent runs.
  • Links: searchable table of content links with filters for newsletter, category, and date range.
  • Newsletter detail: issues and links for one newsletter.
  • Sponsored links: global sponsor table.
  • Dead links: global dead/unreachable table.
  • Runs: recent import run summaries.

The first UI can use server-rendered HTML with minimal CSS. No SPA framework is required.

Data Flow

Import:

  1. Gmail fetches configured label messages.
  2. Existing parser extracts links and sponsor entries.
  3. Existing cleanup/categorization/enrichment pipeline prepares rows.
  4. DatabaseWriter upserts newsletters, issues, links, occurrences, sponsors, dead links, and run summary.
  5. Optional spreadsheet writers run after database persistence when configured.

Web:

  1. nlc serve opens the SQLite database.
  2. Request handlers query read-only catalog views.
  3. HTML pages render tables and simple filters.

Error Handling

  • Database open/migration errors are critical and stop nlc run or nlc serve.
  • A single failed row insert during import increments the run error count and stops the database transaction for that run.
  • SQLite writes use transactions so partial import data is not committed on failure.
  • Web route errors return a simple 500 page and log the error.

Testing

Add tests for:

  • Config defaults for database.enabled and database.path.
  • Schema migration creates expected tables.
  • Database writer inserts newsletters, issues, links, sponsors, and dead links.
  • Duplicate normalized URLs are deduplicated while occurrences remain per issue.
  • nlc run writes to SQLite by default even when spreadsheet outputs are disabled.
  • Web routes render dashboard, links, sponsored links, dead links, and run history using a fixture database.
  • Existing spreadsheet writer tests continue to pass.

Out of Scope for This Pass

  • Browser-based Gmail OAuth setup.
  • Running Gmail imports from a web button.
  • User accounts or remote hosting.
  • Full visual design polish.
  • Migrating old JSON state into SQLite.
  • Removing Excel or Google Sheets support.

Acceptance Criteria

  • nlc run creates/updates the SQLite database by default.
  • nlc serve starts a local web app and displays imported catalog data.
  • Spreadsheet output still works when enabled.
  • Existing CLI validations remain green.
  • Tests cover database persistence and web read paths without live Gmail or Google credentials.