2026hot

Vidistiller

Self-hosted video intelligence platform. Transcribes, detects slides, summarizes with LLMs, and exports to Obsidian-ready Markdown.

GitHub ↗

Vidistiller is a self-hosted video intelligence platform. Submit a URL from YouTube, Vimeo, Twitch, TikTok, X, Reddit, Rumble, or any direct video file, and it runs a full pipeline: transcript extraction, slide detection, LLM summarization, and snapshot capture. The output is structured Markdown ready for a language model, a vault, or a document.

Two modes at submission time. Transcript Mode extracts speech with timestamps and injects platform chapter markers as navigable section headers. Presentation Mode adds a slide detection pipeline: SSIM frame comparison, Tesseract OCR on each detected slide, LLM classification for ambiguous transitions, and per-slide transcript alignment.

Summarization is a separate user-triggered step. The transcript is split into sections by chapter header, each classified by content type (intro, code walkthrough, demo, conceptual) and summarized with a matching prompt. Four LLM providers are supported per-user: Ollama (local, no key), OpenAI, Anthropic, and a self-hosted vLLM fleet with automatic model routing.

The browser UI is a VS Code-like panel layout: embedded video player on one side, timestamped transcript on the other. Clicking any timestamp in the transcript seeks the video. Slides appear in a bottom gallery; clicking a slide seeks to that moment. Export options include Obsidian Markdown ZIP (transcript or summary with images as a vault-ready bundle) and JSON for backup and portability.

The REST API exposes the full pipeline programmatically, with JWT auth for users and API key auth for headless service integration.

The name is a portmanteau of vidi (Latin: I saw) and distiller. It takes what was seen and reduces it to what is useful.

What it does

Accepts video from YouTube, Vimeo, Twitch, TikTok, X, Reddit, Rumble, direct MP4/MKV/MOV/WebM, and 50+ platforms via yt-dlp
Transcript Mode: speech-to-text with timestamps, platform chapter markers as section headers
Presentation Mode: SSIM slide detection, Tesseract OCR, LLM ambiguity classification, per-slide transcript alignment
LLM summarization with content-type-aware prompts: intro, code walkthrough, demo, conceptual, conclusion
Four LLM providers: Ollama (local), OpenAI, Anthropic, vLLM fleet with auto model routing
Browser UI with embedded video player and clickable transcript timestamps that seek the video
Obsidian Markdown ZIP export: transcript or summary with inline snapshot images, vault-ready
REST API with JWT user auth and API key auth for machine-to-machine integration
Celery job queue with real-time step logs and mid-run cancellation
Job export and import as self-contained JSON for backup and portability

Status

Hot. Used here for research, reference, and model context building.