mcp.film / Transcription & Captions / Sanzaru (Whisper MCP)
Sanzaru (Whisper MCP)
OpenAI Whisper/GPT-4o transcription, audio chat, and TTS in one local MCP server.
What it does
Community MCP server wrapping OpenAI's audio APIs: Whisper and GPT-4o transcription with enhancement templates, interactive 'chat with audio' analysis, format conversion/compression, and OpenAI TTS. A filmmaker-agent can transcribe and interrogate dailies, then generate scratch narration, all against a single OpenAI key.
Connect
claude mcp add sanzaru -e OPENAI_API_KEY=YOUR_KEY -e SANZARU_MEDIA_PATH=/absolute/path/to/media -- uvx "sanzaru[all]"
OPENAI_API_KEY — https://platform.openai.com (API keys)
Tools you'll see
transcribe_audiotranscribe_with_enhancementchat_with_audiocreate_audioconvert_audiocompress_audioField notes
Successor to the widely-listed (now deprecated) arcaputo3/mcp-server-whisper. v0.6.2, Apr 2026, MIT. For fully local/offline transcription: jwulff/whisper-mcp and SmartLittleApps/local-stt-mcp (whisper.cpp, Apple Silicon, diarization).
Pairs well with
MartiniMartini
Official
The film set for AI videos — direct entire productions from your agent.
Remotion MCPRemotion
Official
Official Remotion docs MCP — grounds agents in the programmatic-video framework.
Adobe Premiere Pro MCPhetpatel-11
Drive Premiere Pro timelines, effects, and exports from your agent via a CEP bridge.