Instagram transcript API: transcribe any Reel in one call

Instagram has no transcript API and no caption track to scrape. One call transcribes the audio of any public Reel and hands you plain text. It's $0.002 a call, the same pattern you'd use for TikTok and YouTube.
A Reel carries three kinds of text: the caption the creator typed, the stickers burned into the frames, and the words actually spoken.
Instagram's API gives you the first two. The spoken audio, the part most people actually want, it never exposes.
There's no transcript endpoint, and unlike YouTube or TikTok, not even a caption track to scrape.
One request transcribes the audio and returns the spoken text of any public Reel.
The one call
Send the Reel URL, get the transcript back. This is a real call and its real response:
curl -X POST https://api.getanyapi.com/v1/run/instagram.media_transcript \
-H "Authorization: Bearer $ANYAPI_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://www.instagram.com/reel/DHsD6HGqJhp/"}'import os, requests
res = requests.post(
"https://api.getanyapi.com/v1/run/instagram.media_transcript",
headers={"Authorization": f"Bearer {os.environ['ANYAPI_KEY']}"},
json={"url": "https://www.instagram.com/reel/DHsD6HGqJhp/"},
)
print(res.json()["output"]["data"]["transcripts"][0]["text"])const res = await fetch("https://api.getanyapi.com/v1/run/instagram.media_transcript", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.ANYAPI_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ url: "https://www.instagram.com/reel/DHsD6HGqJhp/" }),
});
const { output } = await res.json();
console.log(output.data.transcripts[0].text);{
"output": {
"found": true,
"data": {
"transcripts": [
{
"id": "3597267389859272809",
"shortcode": "DHsD6HGqJhp",
"text": "Let's fry up the perfect Banh Xeo. Beautiful. Everybody. Shh. The perfect Banh Xeo. Let me show you my Banh Xeo. When it comes down to Banh Xeo, we make it the crispiest. Hear the crunch? Heaven."
}
]
}
},
"provider": "AnyAPI",
"costUsd": 0.002
}The transcript is plain text under output.data.transcripts. It comes back as an array because a single post can hold more than one clip; each entry carries the spoken text plus the post's id and shortcode.
There's no timestamps and no language field here, and that's on purpose. Instagram serves no caption track to borrow offsets or a language tag from, so what you get is the transcribed audio as text.
The found flag is false when a Reel has no speech to transcribe, so you check the flag instead of catching an error.
That's the whole integration. No OAuth, no headless browser, no scraping the player.
When Instagram changes its internals, we absorb the fix and your call keeps working.
Why Instagram gives you nothing
Instagram's official Graph API is built for managing your own account, not reading other people's audio. It exposes the typed caption, comment threads, and media fields, and it's got no transcript resource at all.
So of the three kinds of text on a Reel, the one you usually want, the spoken audio, is the one Instagram never publishes.
The endpoint above gets it by transcribing that audio, which is why it works on any public Reel without you touching the Graph API.
If you build it yourself, it's two jobs stacked: scrape the video, then run the audio through a speech-to-text model.
The scrape half alone is a slog. Instaloader, the most-used Instagram scraper at 12.6k stars, stops after roughly a dozen posts and returns a 401, and Instagram throttles a logged-in session with "Please wait a few minutes" before cutting it off. Then transcribing the audio is your problem on top of that.
YouTube serves captions; Instagram has to be transcribed
The three transcript endpoints in the catalog look identical to call and behave differently underneath, because each platform exposes a different amount. It's worth knowing which is which:
| TikTok | YouTube | ||
|---|---|---|---|
| Source of the text | caption track | caption track | transcribed audio |
| Output shape | WebVTT | timed line array | plain text |
| Timestamps | yes | yes | no |
| Language tag | sometimes | yes | no |
| Price per call | $0.002 | $0.002 | $0.002 |
TikTok and YouTube hand back a caption track, so you get timestamps for free. Instagram has none, so the audio is transcribed and you get clean text without offsets.
The call is the same either way:
Do it across many Reels
One transcript covers a single Reel. Point the endpoint at a list of URLs and you have a searchable archive of a whole content series or a competitor's catalog:
# urls.txt holds one public Reel URL per line
while read url; do
curl -s -X POST https://api.getanyapi.com/v1/run/instagram.media_transcript \
-H "Authorization: Bearer $ANYAPI_KEY" -H "Content-Type: application/json" \
-d "{\"url\": \"$url\"}" | jq -r '.output.data.transcripts[].text'
done < urls.txtIndex what was said, and you can search audio you could only watch before. Feed each transcript to an LLM for summaries, hooks, or a topic breakdown across a series.
It's plain text, so it drops straight into search, a RAG index, or a translation step.
What a transcript unlocks
Turning a Reel's audio into text makes it searchable, summarizable, and reusable. The same field serves two jobs:
- If you build software: pipe the text into full-text search, a summarizer, a moderation scan, or a RAG index for an assistant that answers from the videos.
- If you do content or research: one Reel becomes a caption draft, a hook list, subtitles, or notes, without you replaying it and typing along.
Try it without writing code
If you just need one transcript right now, the free tool does it in the browser. No key, no login, paste a Reel link and copy the text.
It's also the fastest way to see the output before you wire up the API.
Paste an Instagram Reel link, get the transcript.
Open the free Instagram transcript tool
Frequently asked questions
Does Instagram have a transcript API?
No. Instagram's Graph API exposes the typed caption and media metadata, but it's got no transcript or caption-track resource, and there's no public endpoint for the spoken audio. To get the words spoken in a Reel you transcribe the audio, which is what the endpoint above does.
How do I get the transcript of an Instagram Reel?
Two ways. For a one-off, paste the link into the free tool. For bulk or programmatic use, POST the Reel URL to instagram.media_transcript and read output.data.transcripts[].text.
Is the transcript the caption or the spoken words?
The spoken words. Instagram has three kinds of text on a Reel: the typed caption, on-screen stickers, and the spoken audio. This endpoint returns the audio transcribed to text, which is the one most people need.
Why are there no timestamps in an Instagram transcript?
Because Instagram serves no caption track to take offsets from. The text is transcribed from the audio, so it comes back as plain prose. TikTok and YouTube do expose a caption track, so their endpoints include timestamps.
How much does an Instagram transcript cost?
$0.002 per call, billed in dollars, with no subscription. That's $2 per 1,000 transcripts.
Can I get an Instagram transcript for free?
Yes, for a one-off. The free tool does it in the browser with no key or login. For bulk or programmatic use, the endpoint is one call at $0.002.
Does it work on Stories and carousel posts?
It works on public Reels and video posts with spoken audio. A post can return more than one transcript entry, which is why transcripts is an array. Private accounts and videos with no speech come back with found: false.
What format does the transcript come back in?
Plain text, under output.data.transcripts[].text. Each entry also carries the post id and shortcode, so you can tie a transcript back to its Reel.
What language does it support?
It transcribes whatever language is spoken in the audio. There's no separate language tag, since Instagram exposes no caption metadata, so detect the language from the text if your pipeline needs it.
Can I transcribe a lot of Reels at once?
Yes. Put one Reel URL per line in a file and loop the endpoint over it, as in the shell snippet above. Each call is independent and priced the same, so a catalog of a few hundred Reels is a few hundred calls.
What can I build with Instagram transcripts?
Full-text search across a creator's Reels, an LLM summarizer or hook extractor, subtitles, a translation step, or a RAG index for an assistant. It's one of many Instagram endpoints, and the same one-call pattern covers TikTok and YouTube too.
Related guides
The transcript endpoint plus 200+ other Instagram and cross-platform data sources, one key, priced in dollars. New accounts start with free credit.
Browse the data catalog