Loading...
Loading...
Production-ready system for automated YouTube channel monitoring and video transcription. Leverages OpenAI Whisper and WhisperX for speech-to-text with speaker diarization, featuring a Python backend, FastAPI REST API, and Next.js dashboard.
Content creators, researchers, and businesses need to monitor YouTube channels and extract accurate transcriptions from videos at scale. Manual transcription is time-consuming, expensive, and lacks speaker identification capabilities.
We built YTVideoTranscriber as a comprehensive automated pipeline that monitors YouTube channels via RSS feeds, downloads audio using yt-dlp, and transcribes content using OpenAI Whisper with WhisperX for speaker diarization. The system includes a full-stack dashboard for management and search.
YTVideoTranscriber follows a three-tier architecture with clear separation between the CLI/orchestration layer, REST API, and web dashboard. The core pipeline handles video discovery, audio extraction, and AI-powered transcription with speaker identification.
Central coordinator managing the entire transcription pipeline from discovery to output
Python, Click CLI, State machine for video processing
Discovers new videos from subscribed channels using RSS feeds and yt-dlp
RSS parsing, yt-dlp integration, Duplicate detection
Core AI engine using Whisper for STT and WhisperX for alignment and speaker diarization
OpenAI Whisper, WhisperX, GPU acceleration, Multiple model sizes
Full-featured API with 25+ endpoints for channel management, transcription control, and search
FastAPI, SQLAlchemy ORM, SQLite/PostgreSQL, Background tasks
The core innovation is a sophisticated multi-stage pipeline that combines video discovery, audio extraction, speech recognition, and speaker identification into a seamless automated workflow.
Support for all Whisper model sizes (tiny, base, small, medium, large) - trade accuracy for speed based on your needs
Precise word-level timestamps through forced alignment, enabling accurate subtitle generation
PyAnnote-powered speaker identification labels each segment with SPEAKER_00, SPEAKER_01, etc.
Videos progress through states: PENDING → DOWNLOADING → TRANSCRIBING → COMPLETED with full error recovery
Search across all transcriptions to find specific content, speakers, or topics instantly
YTVideoTranscriber is a production-ready system with comprehensive tooling for automated YouTube transcription at scale.