Back
gh

steipete/summarize: Point at any URL/YouTube/Podcast or file. Get the gist. CLI and Chrome Extension.

Point at any URL/YouTube/Podcast or file. Get the gist. CLI and Chrome Extension. - steipete/summarize

by steipete github.com 3,135 words
View original

Summarize 📝 — Chrome Side Panel + CLI

Fast summaries from URLs, files, and media. Works in the terminal, a Chrome Side Panel and Firefox Sidebar.

Highlights

Feature overview

Summarize extension screenshot

One‑click summarizer for the current tab. Chrome Side Panel + Firefox Sidebar + local daemon for streaming Markdown.

Chrome Web Store: Summarize Side Panel

YouTube slide screenshots (from the browser):

Summarize YouTube slide screenshots

Beginner quickstart (extension)

  1. Install the CLI (choose one):
    • npm (cross‑platform): npm i -g @steipete/summarize
      • Homebrew (macOS arm64): brew install steipete/tap/summarize
  2. Install the extension (Chrome Web Store link above) and open the Side Panel.
  3. The panel shows a token + install command. Run it in Terminal:
    • summarize daemon install --token <TOKEN>

Why a daemon/service?

If you only want the CLI, you can skip the daemon install entirely.

Notes:

More:

Slides (extension)

Advanced (unpacked / dev)

  1. Build + load the extension (unpacked):
    • Chrome: pnpm -C apps/chrome-extension build
      • chrome://extensions → Developer mode → Load unpacked - Pick: apps/chrome-extension/.output/chrome-mv3
      • Firefox: pnpm -C apps/chrome-extension build:firefox
      • about:debugging#/runtime/this-firefox → Load Temporary Add-on - Pick: apps/chrome-extension/.output/firefox-mv3/manifest.json
  2. Open Side Panel/Sidebar → copy token.
  3. Install daemon in dev mode:
    • pnpm summarize daemon install --token <TOKEN> --dev

CLI

Summarize CLI screenshot

Install

Requires Node 22+.

npx -y @steipete/summarize "https://example.com"
npm i -g @steipete/summarize
npm i @steipete/summarize-core
import { createLinkPreviewClient } from "@steipete/summarize-core/content";
brew install steipete/tap/summarize

Homebrew availability depends on the current tap formula for your architecture. If Homebrew install fails on Intel/x64, use the npm global install above.

Optional local dependencies

Install these if you want media-heavy features:

macOS (Homebrew):

brew install ffmpeg yt-dlp
brew install tesseract # optional, for --slides-ocr

If --slides is enabled and these tools are missing, Summarize warns and continues without slides.

CLI vs extension

Quickstart

summarize "https://example.com"

Inputs

URLs or local paths:

summarize "/path/to/file.pdf" --model google/gemini-3-flash
summarize "https://example.com/report.pdf" --model google/gemini-3-flash
summarize "/path/to/audio.mp3"
summarize "/path/to/video.mp4"

Stdin (pipe content using -):

echo "content" | summarize -
pbpaste | summarize -
# binary stdin also works (PDF/image/audio/video bytes)
cat /path/to/file.pdf | summarize -

Notes:

YouTube (supports youtube.com and youtu.be):

summarize "https://youtu.be/dQw4w9WgXcQ" --youtube auto

Podcast RSS (transcribes latest enclosure):

summarize "https://feeds.npr.org/500005/podcast.xml"

Apple Podcasts episode page:

summarize "https://podcasts.apple.com/us/podcast/2424-jelly-roll/id360084272?i=1000740717432"

Spotify episode page (best-effort; may fail for exclusives):

summarize "https://open.spotify.com/episode/5auotqWAXhhKyb9ymCuBJY"

Output length

--length controls how much output we ask for (guideline), not a hard cap.

summarize "https://example.com" --length long
summarize "https://example.com" --length 20k

What file types work?

Best effort and provider-dependent. These usually work well:

Notes:

Model ids

Use gateway-style ids: <provider>/<model>.

Examples:

Note: some models/providers do not support streaming or certain file media types. When that happens, the CLI prints a friendly error (or auto-disables streaming for that model when supported by the provider).

Limits

Common flags

summarize <input> [flags]

Use summarize --help or summarize help for the full help text.

Coding CLIs (Codex, Claude, Gemini, Agent)

Summarize can use common coding CLIs as local model backends:

Requirements:

Quick smoke test:

printf "Summarize CLI smoke input.\nOne short paragraph. Reply can be brief.\n" >/tmp/summarize-cli-smoke.txt

summarize --cli codex --plain --timeout 2m /tmp/summarize-cli-smoke.txt
summarize --cli claude --plain --timeout 2m /tmp/summarize-cli-smoke.txt
summarize --cli gemini --plain --timeout 2m /tmp/summarize-cli-smoke.txt
summarize --cli agent --plain --timeout 2m /tmp/summarize-cli-smoke.txt

Set explicit CLI allowlist/order:

{
  "cli": { "enabled": ["codex", "claude", "gemini", "agent"] }
}

Configure implicit auto CLI fallback:

{
  "cli": {
    "autoFallback": {
      "enabled": true,
      "onlyWhenNoApiKeys": true,
      "order": ["claude", "gemini", "codex", "agent"]
    }
  }
}

More details: docs/cli.md

Auto model ordering

--model auto builds candidate attempts from built-in rules (or your model.rules overrides). CLI attempts are prepended when:

Default fallback behavior: only when no API keys are configured, order claude, gemini, codex, agent, and remember/prioritize last successful provider (~/.summarize/cli-state.json).

Set explicit CLI attempts:

{
  "cli": { "enabled": ["gemini"] }
}

Disable implicit auto CLI fallback:

{
  "cli": { "autoFallback": { "enabled": false } }
}

Note: explicit --model auto does not trigger implicit auto CLI fallback unless cli.enabled is set.

Website extraction (Firecrawl + Markdown)

Non-YouTube URLs go through a fetch -> extract pipeline. When direct fetch/extraction is blocked or too thin, --firecrawl auto can fall back to Firecrawl (if configured).

YouTube transcripts

--youtube auto tries best-effort web transcript endpoints first. When captions are not available, it falls back to:

  1. Apify (if APIFY_API_TOKEN is set): uses a scraping actor (faVsWy9VTSNVIhWpR)
  2. yt-dlp + Whisper (if yt-dlp is available): downloads audio, then transcribes with local whisper.cpp when installed (preferred), otherwise falls back to Groq (GROQ_API_KEY), AssemblyAI (ASSEMBLYAI_API_KEY), Gemini (GEMINI_API_KEY / Google aliases), OpenAI (OPENAI_API_KEY), then FAL (FAL_KEY)

Environment variables for yt-dlp mode:

Apify costs money but tends to be more reliable when captions exist.

Slide extraction (YouTube + direct video URLs)

Extract slide screenshots (scene detection via ffmpeg) and optional OCR:

Requirements:

summarize "https://www.youtube.com/watch?v=..." --slides
summarize "https://www.youtube.com/watch?v=..." --slides --slides-ocr

Outputs are written under ./slides/<sourceId>/ (or --slides-dir). OCR results are included in JSON output (--json) and stored in slides.json inside the slide directory. When scene detection is too sparse, the extractor also samples at a fixed interval to improve coverage. When using --slides, supported terminals (kitty/iTerm/Konsole) render inline thumbnails automatically inside the summary narrative (the model inserts [slide:N] markers). Timestamp links are clickable when the terminal supports OSC-8 (YouTube/Vimeo/Loom/Dropbox). If inline images are unsupported, Summarize prints a note with the on-disk slide directory.

Use --slides --extract to print the full timed transcript and insert slide images inline at matching timestamps.

Format the extracted transcript as Markdown (headings + paragraphs) via an LLM:

summarize "https://www.youtube.com/watch?v=..." --extract --format md --markdown-mode llm

Media transcription (Whisper)

Local audio/video files are transcribed first, then summarized. --video-mode transcript forces direct media URLs (and embedded media) through Whisper first. Prefers local whisper.cpp when available; otherwise requires one of GROQ_API_KEY, ASSEMBLYAI_API_KEY, GEMINI_API_KEY (or Google aliases), OPENAI_API_KEY, or FAL_KEY.

Local ONNX transcription (Parakeet/Canary)

Summarize can use NVIDIA Parakeet/Canary ONNX models via a local CLI you provide. Auto selection (default) prefers ONNX when configured.

Verified podcast services (2025-12-25)

Run: summarize <url>

Transcription: prefers local whisper.cpp when installed; otherwise uses Groq, AssemblyAI, Gemini, OpenAI, or FAL when keys are set.

Translation paths

--language/--lang controls the output language of the summary (and other LLM-generated text). Default is auto.

When the input is audio/video, the CLI needs a transcript first. The transcript comes from one of these paths:

  1. Existing transcript (preferred)
    • YouTube: uses youtubei / captionTracks when available.
      • Podcasts: uses Podcasting 2.0 RSS <podcast:transcript> (JSON/VTT) when the feed publishes it.
  2. Whisper transcription (fallback)
    • YouTube: falls back to yt-dlp (audio download) + Whisper transcription when configured; Apify is a last resort.
      • Prefers local whisper.cpp when installed + model available.
      • Otherwise uses cloud transcription in this order: Groq (GROQ_API_KEY) → AssemblyAI (ASSEMBLYAI_API_KEY) → Gemini (GEMINI_API_KEY / Google aliases) → OpenAI (OPENAI_API_KEY) → FAL (FAL_KEY).

For direct media URLs, use --video-mode transcript to force transcribe -> summarize:

summarize https://example.com/file.mp4 --video-mode transcript --lang en

Configuration

Single config location:

Supported keys today:

{
  "model": { "id": "openai/gpt-5-mini" },
  "env": { "OPENAI_API_KEY": "sk-..." },
  "ui": { "theme": "ember" }
}

Shorthand (equivalent):

{
  "model": "openai/gpt-5-mini"
}

Also supported:

Note: the config is parsed leniently (JSON5), but comments are not allowed. Unknown keys are ignored.

Media cache defaults:

{
  "cache": {
    "media": { "enabled": true, "ttlDays": 7, "maxMb": 2048, "verify": "size" }
  }
}

Note: --no-cache bypasses summary caching only (LLM output). Extract/transcript caches still apply. Use --no-media-cache to skip media files.

Precedence:

  1. --model
  2. SUMMARIZE_MODEL
  3. ~/.summarize/config.json
  4. default (auto)

Theme precedence:

  1. --theme
  2. SUMMARIZE_THEME
  3. ~/.summarize/config.json (ui.theme)
  4. default (aurora)

Environment variable precedence:

  1. process env
  2. ~/.summarize/config.json (env)
  3. ~/.summarize/config.json (apiKeys, legacy)

Environment variables

Set the key matching your chosen --model:

OpenAI-compatible chat completions toggle:

UI theme:

OpenRouter (OpenAI-compatible):

summarize refresh-free

Quick start: make free the default (keep auto available)

summarize refresh-free --set-default
summarize "https://example.com"
summarize "https://example.com" --model auto

Regenerates the free preset (models.free in ~/.summarize/config.json) by:

If --model free stops working, run:

summarize refresh-free

Flags:

Example:

OPENROUTER_API_KEY=sk-or-... summarize "https://example.com" --model openrouter/meta-llama/llama-3.1-8b-instruct:free
OPENROUTER_API_KEY=sk-or-... summarize "https://example.com" --model openrouter/minimax/minimax-m2.5

If your OpenRouter account enforces an allowed-provider list, make sure at least one provider is allowed for the selected model. When routing fails, summarize prints the exact providers to allow.

Legacy: OPENAI_BASE_URL=https://openrouter.ai/api/v1 (and either OPENAI_API_KEY or OPENROUTER_API_KEY) also works.

NVIDIA API Catalog (OpenAI-compatible; free credits):

export NVIDIA_API_KEY="nvapi-..."
summarize "https://example.com" --model nvidia/stepfun-ai/step-3.5-flash

Z.AI (OpenAI-compatible):

Optional services:

Model limits

The CLI uses the LiteLLM model catalog for model limits (like max output tokens):

Library usage (optional)

Recommended (minimal deps):

Compatibility (pulls in CLI deps):

Development

pnpm install
pnpm check

More

Troubleshooting

License: MIT