Self-hosting a video subtitle pipeline (Whisper + ffmpeg + Go)

The hosted version of Subtitles King is the convenient default. There are good reasons to skip it. This guide is for the team that needs the pipeline running on infrastructure they own — usually because the video content cannot leave their network, sometimes because they want to amortize a GPU they already have, occasionally because they like operating their own boxes.

The system is intentionally simple to host. If you have shipped a Go binary behind nginx before, you have already done the hard part.

When DIY makes sense

Self-host the pipeline if any of these apply:

Privacy. Video stays on your own VPS. No third-party API key, no log lines on someone else's box.
Volume. Hundreds of hours per month where the flat plan stops being the cheapest option.
Latency control. A box in the same region as your storage, no cold start behavior.
Custom Whisper. You want to point at a fine-tuned model, swap in faster-whisper, or run Whisper.cpp on Apple silicon.

If none of these apply, the hosted version at brains.subtitlesking.com is faster to start and one less thing to maintain.

What the pipeline actually is

Three components, all open source, glued by a single Go binary:

ffmpeg — compresses the input video to keep Whisper happy and bandwidth bills sane. Then, after transcription, burns the subtitle track back into the video frames.
OpenAI Whisper (large) — runs transcription on the compressed audio. You can call the OpenAI API or run Whisper locally.
Go upload server + MCP binary — the HTTP API that accepts uploads and the MCP server that exposes the same operations as tools to an agent.

The pipeline is linear: upload → compress → transcribe → burn → return file. No queues, no microservices, no message bus. A single process handles a job from end to end.

System requirements

A baseline that works:

CPU: 4 vCPU is fine for compression and burn-in. ffmpeg is the hot path.
GPU: Optional. Required only if you run Whisper locally instead of via API. An RTX 3090 or better gives near real-time transcription on the large model. Without a GPU, use the OpenAI API.
RAM: 8 GB minimum, 16 GB comfortable.
Disk: Sized for retention. A 100 MB compressed video plus the burned output plus intermediate audio fits in roughly 250 MB. Multiply by your concurrency.
OS: Linux (Ubuntu 22.04 or 24.04 in CI). macOS works for local dev. Windows with WSL2 also works but is not the supported path.

Install dependencies

The build needs Go 1.22+, ffmpeg with libx264 and libass, and either an OpenAI API key or a local Whisper install:

sudo apt update
sudo apt install -y ffmpeg golang-go git
ffmpeg -version | head -1
go version

If you want local Whisper instead of API calls:

pip install -U openai-whisper
whisper --help

For a GPU box, install CUDA first, then pip install faster-whisper for a noticeably faster runtime.

Build from source

Clone the repo and build. The whole project compiles to a single static binary:

git clone https://github.com/kirillzubovsky/subtitlesking-mcp.git
cd subtitlesking
go build -o subtitlesking-server ./cmd/server
go build -o subtitlesking-mcp ./cmd/mcp

Two binaries fall out: the HTTP upload server and the stdio MCP server. They share code — the MCP binary is a thin shim that calls the HTTP API locally over a Unix socket or 127.0.0.1.

Configure

Configuration is environment variables. The minimal set:

export SK_LISTEN_ADDR=":8080"
export SK_DATA_DIR="/var/lib/subtitlesking"
export SK_RETENTION_HOURS="24"
export SK_MAX_UPLOAD_MB="100"

# transcription backend
export SK_WHISPER_MODE="api"          # or "local"
export OPENAI_API_KEY="sk-..."        # required if mode=api
export SK_WHISPER_BIN="/usr/local/bin/whisper"   # if mode=local
export SK_WHISPER_MODEL="large"

SK_DATA_DIR is where uploads, intermediate audio, and output videos live. Mount a real volume there in production. SK_RETENTION_HOURS controls the cleanup sweep — set it to 0 to keep files forever, or to something low if you handle sensitive content.

Run the upload server

./subtitlesking-server

That is the whole foreground command. It listens on :8080 and serves the JSON API documented at /docs/api. Health check at /health. Submit a job:

curl -X POST http://localhost:8080/api/jobs \
  -H "Content-Type: application/json" \
  -d '{"video_url":"https://example.com/clip.mp4","language":"en"}'

Poll /api/jobs/:id until status is done, then GET /api/jobs/:id/download.

Wire up the MCP server

The MCP binary points at the upload server:

export SK_API_BASE="http://localhost:8080"
./subtitlesking-mcp

claude mcp add subtitlesking /usr/local/bin/subtitlesking-mcp

For Claude Desktop, add a stanza to claude_desktop_config.json with the binary path. Full client matrix at /docs/mcp.

Production deployment

Run the upload server behind a reverse proxy. nginx or Caddy both work. The minimum Caddy config:

sub.example.com {
    reverse_proxy localhost:8080
    request_body {
        max_size 200MB
    }
}

Match max_size to SK_MAX_UPLOAD_MB plus a safety margin. Caddy handles TLS automatically. If you use nginx, set client_max_body_size to the same value.

A systemd unit for the server:

[Unit]
Description=Subtitles King upload server
After=network.target

[Service]
ExecStart=/usr/local/bin/subtitlesking-server
EnvironmentFile=/etc/subtitlesking/env
Restart=always
User=subtitlesking

[Install]
WantedBy=multi-user.target

Drop the env vars in /etc/subtitlesking/env, systemctl enable --now subtitlesking-server, and you are done.

Operations notes

A few things you will want to know on day two:

Disk fills up faster than you expect. The compressed input plus burned output stays around for SK_RETENTION_HOURS. Add monitoring on SK_DATA_DIR usage.
Whisper API rate limits. If you run with SK_WHISPER_MODE=api, the OpenAI rate limit is the ceiling on your throughput. Watch 429 responses in the server logs.
ffmpeg failures are usually codec-related. A "moov atom not found" error means a truncated upload, not a pipeline bug.
Backups. The data dir is ephemeral by design. There is no database. Job state lives in JSON files alongside the videos.

When to upgrade to the hosted version anyway

Self-hosting is great until you find yourself doing infra work instead of building your product. If the answers to "do you have on-call?" or "do you want to keep up with Whisper releases?" are no, the hosted version at brains.subtitlesking.com is doing the same pipeline on someone else's box.

The full self-host reference, including Docker compose files and a hardened nginx config, lives at /self-host and /docs/self-host. The source is at github.com/kirillzubovsky/subtitlesking-mcp.