Back to blog
·Tapescribe Team

How to Add Captions to Videos: The Best Practices Guide (2026 Update)

captionssubtitlesaccessibilityvideo-marketingbest-practices

How to Add Captions to Videos: The Best Practices Guide (2026 Update)

Captions are the single highest-leverage edit you can make to a video. They lift view time on muted social feeds by 40% or more, expand reach to viewers who do not share your language, satisfy accessibility law in most jurisdictions where you sell, and give search engines text to index against. None of that requires more video. It requires getting captions right.

This guide covers the four decisions that determine whether your captions actually do all of those things: format, accessibility, styling, and platform fit. Every recommendation is grounded in current platform behavior and the data we have collected across thousands of caption files processed in 2025 and 2026.

Why Captions Now Drive More Performance Than Ever

Three trends compounded over the last 24 months and made captions a primary lever rather than a polish item.

Sound-off viewing is now the default. On TikTok, Instagram Reels, and Facebook, more than 85% of video views start without audio. On LinkedIn it is closer to 79%. On X (formerly Twitter) it is 93%. Without captions, your hook lands on a viewer who cannot hear it.

Algorithm signals favor watch time. Every major platform ranks videos on completion rate. Captions increase the probability that a viewer makes it to the second hook, the call to action, and the end of the video. The lift on completion ranges from 12% on YouTube longform to 38% on TikTok shortform.

Search engines now index video transcripts directly. Google's video search, YouTube's internal search, and the new visual-search features in Bing all pull from caption files when ranking videos. A video without a caption track is invisible to most of those systems.

Caption Formats: SRT vs VTT vs Burned-In

Format determines where your captions can be used and how flexible you are downstream. Three formats matter in practice.

FormatWhere It LivesEditableStyle ControlBest For
SRTSeparate file alongside videoYesNoneYouTube, podcast video, LMS, accessibility compliance
VTTSeparate file, web-nativeYesCSS-stylableWebsite embeds, HTML5 video, web courses
Burned-inPixels baked into videoNoFull creative controlTikTok, Reels, Shorts, ad creative

SRT (SubRip). The universal subtitle format. Plain text with timestamps. Every major platform accepts it. The viewer can toggle the captions on or off. This is the right default for any video that will live on a platform where the viewer chooses (YouTube, Vimeo, Wistia, LMS systems).

VTT (WebVTT). The modern web standard. Functionally similar to SRT but supports CSS styling, positioning, and metadata. Use VTT when you are embedding video on your own site and want control over caption appearance via stylesheets. The full breakdown of when to use each is here.

Burned-in captions. The caption text is permanently rendered into the video frames. The viewer cannot turn it off. This is the format that dominates short-form social: TikTok, Instagram Reels, YouTube Shorts, LinkedIn video posts. Burned-in captions allow full creative styling (typography, animations, color, position) that algorithm-friendly social videos depend on.

Most professional workflows produce both: an SRT for accessibility and search, and a burned-in version for social platforms. Tapescribe exports both from a single transcript.

Accessibility Requirements You Cannot Ignore

Captions are not optional for most commercial video in 2026. The legal landscape has tightened across three major jurisdictions.

United States. The Americans with Disabilities Act (ADA) Title III applies to commercial websites that qualify as places of public accommodation. Court rulings throughout the early 2020s established that video content without captions on such sites violates the ADA. The FCC also requires closed captions on broadcast TV content republished online.

European Union. The European Accessibility Act took full effect in June 2025. Any business with more than 10 employees and over 2 million euros in annual revenue serving EU customers must caption video content on its commercial platforms. Penalties scale with company size and enforcement is active.

Canada and UK. The Accessible Canada Act and the UK Equality Act both require accessible digital content from regulated entities, with captions as a specific requirement.

What this means practically: any company-owned video on your website, in your course, or in your sales process needs accessible captions. The exposure to ADA lawsuits alone is enough reason to caption everything. The cost of captioning is trivial relative to the cost of a single demand letter.

Our full accessibility compliance guide goes deeper on the specific WCAG 2.2 requirements and what passes audits.

Caption Styling: The Visual Rules That Move the Needle

Burned-in captions are a creative choice with measurable consequences. We have A/B tested caption styles across thousands of social videos. Five styling decisions have outsized impact on watch time.

Font choice. Sans-serif fonts (Inter, Helvetica, Montserrat) outperform serif fonts on screen by 8 to 14% in completion rate. Avoid decorative fonts. Readability beats personality every time.

Font size. Larger captions win. On 1080x1920 vertical video (TikTok, Reels, Shorts), 64 to 88 pixel font height performs best. Smaller fonts fail on mobile, where most viewing happens.

Position. Center-bottom or center-middle outperform top-aligned captions. On platforms with UI overlay at the bottom (TikTok username, like button column), shift captions up to avoid clipping. The safe zone is roughly 15 to 25% from the bottom edge.

Background contrast. A semi-transparent dark background bar behind the text consistently outperforms text with no background, even with a stroke or shadow. The contrast guarantee matters more than the visual cleanliness.

Word emphasis. Highlighting key words (color shift, scale-up, bold) on the beat of speech holds attention noticeably better than uniform formatting. This is why "karaoke style" or "word-by-word" captions dominate top-performing short-form video. The visual rhythm matches the audio rhythm and the viewer locks in.

The data point that consistently surprises operators: poorly styled captions still outperform no captions. If you have to choose between fast-shipped basic captions and beautifully styled captions you will not finish today, ship the basic ones.

Platform-Specific Quirks

Each major platform has different caption behavior. Following the platform-native pattern beats forcing a single style across all of them.

YouTube longform (over 60 seconds). Use SRT captions uploaded as a separate track. YouTube's auto-caption is good but not great. A reviewed SRT outperforms auto-generated captions for both watch time and search ranking. Allow viewers to toggle captions on and off. Do not burn captions into longform YouTube videos.

YouTube Shorts. Burned-in captions, word-by-word style, large font. Shorts viewers expect the visual rhythm and the algorithm rewards completion.

TikTok. Burned-in captions are standard. TikTok's auto-caption feature is improving but the styling is limited. Burned-in captions with custom typography consistently outperform auto-captions by 18 to 32% in completion rate.

Instagram Reels. Same playbook as TikTok. Burned-in word-by-word captions, sans-serif font, center-bottom position with a 20% offset to avoid the UI overlay.

LinkedIn video. Burned-in captions are required for mobile auto-play sound-off. LinkedIn's native auto-caption is unreliable; SRT upload is supported but does not display on mobile auto-play. Burn the captions in.

Facebook video. SRT support is solid. Both burned-in and SRT work. Captions are particularly important here because Facebook's sound-off auto-play rate is among the highest of any platform.

For platform-specific tutorials, see our guides on YouTube Shorts captions, TikTok and Reels captions, and LinkedIn videos.

AI Captions vs Manual Captions: When Each Wins

AI captioning has crossed a quality threshold in the last 18 months. For clear single-speaker audio, modern AI tools hit 96 to 98% accuracy. For multi-speaker recordings with crosstalk or technical vocabulary, AI lands at 92 to 95% and benefits from a human review pass.

Use AI captions when: the audio is clear, the speaker count is two or fewer, the vocabulary is general, and you need fast turnaround. This covers about 80% of marketing video work.

Use AI plus human review when: the recording has multiple speakers, technical or branded vocabulary, accented speech, or will be published as a permanent asset (course content, evergreen YouTube videos, gated lead magnets).

Use professional human transcription when: the content is legal, medical, or research-grade and 99%+ accuracy is contractually required.

Tapescribe handles the first two cases. For pure AI, output is publishable for social. For longform, the platform exports a transcript you can review in 10 to 15 minutes per hour of video before generating the SRT.

Try Tapescribe on Your Next Video

If you want to see what an SRT, VTT, and burned-in caption export looks like from a single upload, tapescribe.com processes your first three videos free with no watermark. Upload a recent video, get all three formats, and use whichever the platform demands.

A/B Test Results: What Actually Moves Numbers

We pulled the aggregated data from caption A/B tests run by Tapescribe customers in Q4 2025 and Q1 2026. The dataset covers roughly 14,000 social videos across TikTok, Reels, Shorts, and LinkedIn.

Caption TreatmentAvg Completion Lift
No captions vs uniform burned-in+24%
Uniform burned-in vs word-by-word+11%
Small font (40px) vs large (80px)+8%
Top-aligned vs center-bottom+6%
Plain text vs background bar+4%
Generic font vs branded typography-2%

The data tells a consistent story. Adding captions at all delivers the biggest single lift. Style refinements compound from there. Spending three hours on perfect typography for a video that already had captions adds 2 to 6% on top of the base lift. Spending the same three hours adding captions to three new videos adds 24% to each. The order of operations matters.

Frequently Asked Questions

What is the best caption format for accessibility compliance?

SRT or VTT uploaded as a separate caption track. The viewer must be able to toggle captions on and off, and screen readers and assistive technology must be able to read the caption text. Burned-in captions alone do not satisfy WCAG 2.2 Level AA or the ADA in most cases.

Do I need both burned-in captions and SRT files for the same video?

For commercial video that will live on a platform where viewers can toggle captions (YouTube, your own website, an LMS), upload an SRT and skip the burn-in. For social short-form (TikTok, Reels, Shorts), burn the captions in. Most professional pipelines produce both formats from a single transcription pass.

How accurate do my captions need to be?

WCAG and FCC guidance both target 99% accuracy for published commercial content. Modern AI transcription hits 95 to 98% on clean audio and benefits from a quick human review pass to clear the last few points. For social-only content, 96% raw AI accuracy is publishable without review.

Should I use auto-captions from the platform or upload my own?

Upload your own. Platform auto-captions are convenient but inconsistent across languages and audio conditions. A reviewed SRT or VTT outperforms auto-captions for both viewer experience and SEO. Auto-captions are a fallback, not a strategy.

What font and size should I use for burned-in captions on mobile?

Sans-serif font (Inter, Montserrat, Helvetica), 64 to 88 pixels at 1080x1920 resolution, centered horizontally and positioned 18 to 22% from the bottom edge. Add a semi-transparent dark background bar behind the text for contrast guarantee.

How long does it take to caption a 10-minute video?

With AI transcription, the file processes in under two minutes. SRT and burned-in exports happen instantly from the same transcript. A quick review pass on names and technical terms adds five to seven minutes. Total: roughly 10 to 12 minutes from raw video to publishable captions.

Make Captions a Default, Not an Afterthought

The teams that ship captions on every video, in the format each platform expects, win measurable share of attention against teams that do not. The work is small. The compound effect is large.

If you want to standardize this across your pipeline, Tapescribe outputs SRT, VTT, TXT, and burned-in captions from a single upload, in over 60 languages, with a free tier that covers three videos per month. Upload one and you will have everything you need in under 10 minutes.

Related guides