YouTube transcript API: any public video's text in one call

Founder, AnyAPI · June 23, 2026

youtubetranscriptshow-toapi

YouTube transcript API: any public video's text in one call

TL;DR

YouTube's official API only returns captions for videos you own, and the popular open-source library breaks the moment it runs on a server. One call to a managed transcript API returns the timestamped text of any public video, with no OAuth and no proxies, at $0.002 a call. The same call works on Shorts, long videos, and other platforms like TikTok and Instagram.

YouTube's official API will tell you a video has captions. To download them it wants OAuth, the right scope, and permission to edit the video.

You don't have any of those for a video you didn't publish. So captions.download returns 403, every time.

The popular open-source library skips the auth and works on your laptop. Then it runs on a server, YouTube blocks the cloud IP, and it stops.

I got tired of both. One request now returns the transcript of any public video, with no account and no proxy pool.

Quickstart: your first transcript

Send the video URL, get the transcript back. Here's the call I actually run, against the first video ever uploaded to YouTube, and the real response it returns:

curl -X POST https://api.getanyapi.com/v1/run/youtube.video_transcript \
  -H "Authorization: Bearer $ANYAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.youtube.com/watch?v=jNQXAC9IVRw"}'

import os, requests

res = requests.post(
    "https://api.getanyapi.com/v1/run/youtube.video_transcript",
    headers={"Authorization": f"Bearer {os.environ['ANYAPI_KEY']}"},
    json={"url": "https://www.youtube.com/watch?v=jNQXAC9IVRw"},
)
print(res.json()["output"]["data"]["transcript"])

const res = await fetch("https://api.getanyapi.com/v1/run/youtube.video_transcript", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.ANYAPI_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ url: "https://www.youtube.com/watch?v=jNQXAC9IVRw" }),
});
const { output } = await res.json();
console.log(output.data.transcript);

{
  "output": {
    "found": true,
    "data": {
      "language": "English",
      "transcript": "[{\"text\":\"All right, so here we are, in front of the elephants\",\"startMs\":\"1200\",\"endMs\":\"3360\",\"startTimeText\":\"0:01\"},{\"text\":\"the cool thing about these guys is that they have really...\",\"startMs\":\"5318\",\"endMs\":\"7974\",\"startTimeText\":\"0:05\"},{\"text\":\"really really long trunks\",\"startMs\":\"7974\",\"endMs\":\"12616\",\"startTimeText\":\"0:07\"}]"
    }
  },
  "provider": "AnyAPI",
  "costUsd": 0.002
}

You need one thing to run it: an API key, which a new account gets for free. There's no OAuth flow and no Google Cloud project to set up.

The transcript comes back as a JSON array of caption lines. Each line carries the spoken text, its start and end in milliseconds, and a readable startTimeText I can drop straight into a UI.

The language field reports the caption track that was served. The found flag tells me whether a transcript existed, so a video with captions off comes back as found: false instead of an exception I've got to catch.

That's the whole integration. No token refresh, no headless browser, no proxy pool. When YouTube changes how captions load, we fix it on our side and your call keeps working.

What you actually get back

The transcript is the caption track YouTube serves, and that track is one of two things.

Most videos carry only YouTube's auto-generated captions, written by its own speech recognition. You can see the seams: the machine tags [music] and [laughter], marks each speaker change with >>, and drops the odd bit of punctuation. When a creator uploaded their own captions, you get those instead, clean and correctly cased.

You don't pick between them, and you can't force a human-quality track onto a video that never had one.

On accuracy, I'll be straight with you. Clear single-speaker narration comes back almost perfect. Heavy accents, crosstalk, and loud music are where the auto track gets rough. For search, summaries, and analysis it's more than good enough. It's not word-perfect, so don't lean on it where every word has to be exact.

If a video has no caption track at all, you get found: false and null data, not an error. Silent clips and some music videos are the usual case.

Shorts behave exactly like long videos. I tested both; they return from the same call, with no separate endpoint and no different URL shape.

Turn it into subtitles or clean text

The line array is the raw material for the three formats I reach for most. Parse it once.

const lines = JSON.parse(output.data.transcript);

For a plain reading copy or an LLM prompt, drop the timings and join the text:

const plain = lines.map((l) => l.text).join(" ");

For subtitles, the startMs and endMs on every line map straight to SRT:

const ms = (n) => new Date(+n).toISOString().substr(11, 12).replace(".", ",");
const srt = lines
  .map((l, i) => `${i + 1}\n${ms(l.startMs)} --> ${ms(l.endMs)}\n${l.text}`)
  .join("\n\n");

Swap the comma back to a dot, prepend a WEBVTT line, and the same data is a .vtt file a <track> element loads directly.

Languages and translation

The language field reports the track YouTube served, English in the call above. A video can carry several caption tracks, and the endpoint returns the default one.

One thing trips people up, so I'll call it out. The "auto-translate" dropdown in YouTube's web player is generated in your browser on the fly, not stored as a track, so no transcript API can return it. There's nothing on the server to fetch.

To get a transcript in another language, pull the original and run the text through a translation model. It's plain text, so that's one extra call, not a second integration.

Why the official API returns 403

The YouTube Data API does have a captions resource, but it won't help you here.

Its captions.download method requires permission to edit the video, so you can only download captions for channels you already control.

For any other public video, it returns 403. The API will tell you a caption track exists; it won't give you the text.

That single restriction is what sends developers looking for a scraper. If you're building search, summarization, or analysis over videos you didn't publish, the official path is a dead end before you write a line of business logic.

Why the open-source libraries break in production

The usual workaround is youtube-transcript-api, the Python library with about 7,800 GitHub stars. It skips OAuth by calling YouTube's internal endpoint, and on your laptop it works.

Then you deploy it to a server, and it stops.

It reports the captions are disabled when they aren't, on the exact same video that worked from your laptop. The maintainer pinned that bug report and said plainly that there's no proper fix and probably never will be.

The cause is the datacenter IP. YouTube blocks the shared ranges that AWS, GCP, and Azure hand out, then returns an HTML page that reads "Sign in to confirm you're not a bot."

The library tries to parse that bot page as a caption file. It either throws no element found on the empty XML or hits a 429. Nothing is wrong with the video. Your IP is just on a list.

The maintainer's advice in the thread never changes: route every request through a VPN or a rotating proxy, or send authenticated cookies. No library version fixes it, because it isn't a library bug.

It isn't a Python problem either. The popular Node port hits the identical wall, and its top open issue is titled, simply, "Doesn't work in production".

So the free library shows up with a residential-proxy bill, a cookie rotation, and a parser that breaks every time YouTube touches the page. That's the maintenance you inherit the day you ship. The transcript was supposed to be the easy part.

Which path should you use?

There are really four ways to get a YouTube transcript, and I've run all of them:

Capability	Official Data API	OSS library	Roll your own scraper	AnyAPI
Works on videos you don't own	No (403)	Yes	Yes	Yes
Account / auth needed	OAuth + edit rights	None	None	One API key
Runs from a server	Yes	Breaks on cloud IPs	Breaks on cloud IPs	Yes
Cross-platform (TikTok, Instagram)	No	No	No	Yes
Price	Free quota	Free, plus your proxy bill	Free, plus proxies + your time	$0.002 / call
Who maintains it when YouTube changes	n/a	You wait on a maintainer	You	We do

Here's how I choose. Use the official Data API only for captions on your own channel. Skip it for anything else; the 403 is final. The OSS library is fine for a script on your laptop that you babysit. Rolling your own is the OSS library plus more code and the same IP-block wall. I reach for the managed call when it runs on a server or has to be reliable, because I'd rather pay $0.002 than buy proxies just to scrape.

The difference shows up before the first transcript, in setup.

Setup before your first transcript. Fewer is better. The official-API bar is the count of steps to a working call, and it still can't return videos you don't own at any step count.

Errors and rate limits

The block-and-proxy fight above is the one thing you don't inherit here, because the gateway handles the upstream blocking for you.

There's no rate limit on authenticated calls. Provider-level failover governs throughput by design, so you fan out instead of trickling through a quota.

The errors you handle are simple. A video with no captions returns found: false, not an exception. A bad URL or missing key returns a normal HTTP error code. There's no 429-storm to back off from, because the IP rotation that avoids it happens on our side, so you never see it.

Run it across a whole channel

One transcript covers a single video. Every transcript a channel posted is a searchable archive. Chain two endpoints, youtube.channel_videos to list a channel's uploads and youtube.video_transcript to transcribe each one.

# 1. list a channel's recent video URLs
curl -s -X POST https://api.getanyapi.com/v1/run/youtube.channel_videos \
  -H "Authorization: Bearer $ANYAPI_KEY" -H "Content-Type: application/json" \
  -d '{"handle": "@mkbhd"}' \
  | jq -r '.output.data.videos[].url' > urls.txt

# 2. transcribe each one
while read url; do
  curl -s -X POST https://api.getanyapi.com/v1/run/youtube.video_transcript \
    -H "Authorization: Bearer $ANYAPI_KEY" -H "Content-Type: application/json" \
    -d "{\"url\": \"$url\"}" | jq -r '.output.data.transcript'
done < urls.txt

A transcript never changes once a video is published, so I cache each one by video ID and pay $0.002 exactly once per video, ever. A thousand transcripts is $2, and the next run of the same videos is free.

What you can build on top of it

A transcript is text with timestamps, which is exactly the input most AI and search features want. The things I see built on this one endpoint:

Video search that jumps to the moment. Index every line with its startMs, and a query lands the viewer on the exact second instead of the top of a 40-minute video. This is how podcast and lecture search works.
AI summaries and chapters. Feed the transcript to an LLM for a TL;DR, key takeaways, or auto-generated chapter markers. The timestamps turn a flat summary into clickable sections.
A knowledge base that answers from video. Chunk a channel's transcripts, embed them, and you've got a RAG assistant that cites the exact video and timestamp behind every answer.
Repurposing. One talk becomes a blog draft, a newsletter, an X thread, and the list of keywords it already ranks for, all from the text.
Subtitles and dubs at scale. Convert to SRT or VTT (above), or translate the text and ship multilingual captions across a whole back catalog.
On-camera monitoring. Watch a set of channels and alert when a competitor, product, or claim is spoken aloud, where the words never appear in the title or description.

The AI summary is the one I start with most, and it's a few lines on top of the call:

const lines = JSON.parse(output.data.transcript);
const text = lines.map((l) => l.text).join(" ");
const summary = await llm(`Summarize this video transcript in 5 bullets:\n\n${text}`);

Each of these is the same first call, then your own logic on plain text.

Try it without writing code

If you just need one transcript right now, the free tool does it in the browser. No key, no login, paste a link and copy the text.

It's also the fastest way to see the output shape before you wire up the API.

Paste a YouTube link, get the transcript.

Open the free YouTube transcript tool

Frequently asked questions

Does YouTube have a transcript API?

Sort of, with a big catch. The YouTube Data API has a captions resource, but captions.download requires OAuth and permission to edit the video, so you can only pull captions for channels you own. For any other public video it returns 403, which is why most people reach for a scraper or a hosted endpoint like the one above.

How do I get the transcript of a YouTube video?

Two ways. For a one-off, paste the link into the free tool. For bulk or programmatic use, POST the video URL or ID to youtube.video_transcript and read output.data.transcript.

Why does the YouTube Data API return 403 for captions?

Because captions.download requires the youtube.force-ssl scope and edit rights on the video. Google scopes it to creators managing their own captions, not to anyone reading a public video's text. There's no public scope that lifts the restriction.

Why does youtube-transcript-api stop working or get blocked?

It calls an undocumented YouTube endpoint and YouTube now blocks most cloud-provider IP ranges, so it raises RequestBlocked or IpBlocked from a server even when it works on your laptop. Running it in production means adding a rotating proxy pool and keeping up with endpoint changes.

How much does a YouTube transcript cost?

$0.002 per call, billed in dollars, with no subscription. That's $2 per 1,000 transcripts.

Can I get a YouTube transcript for free?

Yes, for a one-off. The free tool does it in the browser with no key or login. For bulk or programmatic use, the endpoint is one call at $0.002.

What format does the transcript come back in?

A JSON array of caption lines, each with text, startMs, endMs, and a readable startTimeText. The millisecond offsets let you deep-link or build subtitles; the array sits under output.data.transcript.

What languages does it support?

Whatever caption track the video carries. The language field reports which track was served, so you can branch on it or pass the text to a translation step.

Can I transcribe an entire YouTube channel?

Yes. Chain youtube.channel_videos to list a channel's uploads, then youtube.video_transcript on each URL. The shell loop above does it and builds a searchable archive of the whole catalog.

Does it transcribe the audio or read existing captions?

It returns the caption track YouTube serves, which is the creator's captions when present and YouTube's auto-generated track otherwise. Either way you get timestamped spoken text, not the video description.

What happens if a video has no transcript?

The response comes back with found: false and a null data, not an error. You check the flag instead of wrapping the call in a try/catch.

Video transcript API

One call for transcripts across TikTok, YouTube, and Instagram.

TikTok transcript API

The same one call, for any public TikTok, as timestamped text.

Instagram transcript API

Transcribe any public Reel, where there's no caption track at all.

The transcript endpoint plus 200+ other YouTube and cross-platform data sources, one key, priced in dollars. New accounts start with free credit.

Browse the data catalog