← All posts

Video transcript API: one call for TikTok, YouTube, Instagram

Kevin Wang
Kevin Wang
Founder, AnyAPI · June 23, 2026
transcriptsapihow-tovideo
Video transcript API: one call for TikTok, YouTube, Instagram
TL;DR

One API gives you video transcripts across TikTok, YouTube, and Instagram. It's the same request shape every time, and the transcript always comes back under output.data. It's $0.002 a call with no per-platform glue.

Three platforms, three different walls. YouTube hides its captions behind OAuth, Instagram exposes none at all, and TikTok has no official API.

Build it yourself and you maintain three scrapers, three auth schemes, and three output shapes.

That's three integrations to write and keep alive, for one feature: a transcript.

One request shape returns the spoken text from any of the three, under one envelope. The rest of this page walks through it.

One shape, every platform

The call is the same on all three. You pass a URL, and the transcript comes back under output.data. Here's the call I actually run, using YouTube as the example:

curl -X POST https://api.getanyapi.com/v1/run/youtube.video_transcript \
  -H "Authorization: Bearer $ANYAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}'

You change only the SKU per platform: tiktok.video_transcript for TikTok, instagram.media_transcript for Instagram. Everything else stays the same.

One key, one auth header, one envelope across all three. No Google Cloud project for YouTube, no Graph API for Instagram, no headless browser for TikTok.

Your codeAnyAPI1POST any video URL2route + normalize the response3200 + transcript under output.data
One request in, a transcript out under output.data, whichever platform the URL points to.
The same one-call pattern on every platform. The gateway routes the URL to the right source and normalizes the response, so your code reads one shape.

What each platform gives you

The calls match; the platforms don't. How much each one exposes decides what you get back, and it's worth knowing before you build:

TikTokYouTubeInstagram
Source of the textcaption trackcaption tracktranscribed audio
Output shapeWebVTTtimed line arrayplain text
Timestampsyesyesno
Language tagsometimesyesno
Price per call$0.002$0.002$0.002

TikTok and YouTube publish a caption track, so you get timestamps for free. Instagram publishes none, so the audio is transcribed and you get clean text without offsets.

The price is the same on all three.

The deep dive per platform

Each platform has its own gotcha and its own deep dive. Start with the one you need:

Normalize once, branch where it matters

Because the envelope is shared, a small wrapper handles every platform and you branch only on the output shape. Here's the whole seam:

async function transcript(sku, url) {
  const res = await fetch(`https://api.getanyapi.com/v1/run/${sku}`, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.ANYAPI_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ url }),
  });
  const { output } = await res.json();
  if (!output.found) return null;            // no transcript available
  // YouTube/TikTok: output.data.transcript ; Instagram: output.data.transcripts[].text
  return output.data.transcript ?? output.data.transcripts?.map((t) => t.text).join("\n\n");
}

Write the fetch once, and adding a platform is just a new SKU string. That's the point of a single gateway: the surface you maintain doesn't grow with the number of sources.

Try them without writing code

Each platform has a free, no-login tool. Paste a link, copy the text, see the output shape before you wire up the API:

Frequently asked questions

Is there one API for video transcripts across platforms?

Yes. POST /v1/run/<platform>.<sku> with a video URL returns the transcript under output.data for TikTok, YouTube, and Instagram, using one key and one request shape. You change the SKU per platform; everything else stays the same.

Which platforms are supported?

TikTok, YouTube, and Instagram today, each with its own deep-dive guide linked above. They sit alongside 200+ other data sources in the catalog.

Do all platforms return the same format?

The envelope is the same (output.data), but the shape inside differs because the platforms expose different things. TikTok returns WebVTT, YouTube a timed line array, Instagram plain text. The wrapper above smooths over the difference.

Why does only Instagram lack timestamps?

TikTok and YouTube serve a caption track, which carries timing. Instagram serves no caption track, so the audio is transcribed to plain text with no offsets. See the Instagram guide for the detail.

How much does a transcript cost?

$0.002 per call on every platform, billed in dollars, with no subscription. That's $2 per 1,000 transcripts.

Can I get a transcript for free?

Yes, for a one-off. Each platform has a free in-browser tool (linked above) with no key or login. For bulk or programmatic use, the endpoint is one call at $0.002.

Do I need an account on each platform?

No. You don't touch YouTube's OAuth, Instagram's Graph API, or any platform login. One AnyAPI key covers all three.

Can I transcribe many videos at once?

Yes. Loop the endpoint over a list of URLs; each call is independent and priced the same. The per-platform guides show the shell pattern.

What can I build with it?

Cross-platform search over a creator's whole footprint, an LLM summarizer or repurposing tool, subtitles, translation, or a RAG index for an assistant, all from one integration.

Video transcripts on three platforms, plus 200+ other data sources, one key, priced in dollars. New accounts start with free credit.

Browse the data catalog