Closed captions are an important step in making video (and in some cases audio) materials more accessible to larger groups of people. In addition to being a pivotal part of the video-watching experience for viewers who are Deaf or hard of hearing, closed captions are also a manner of universal design – Even viewers who can hear benefit from captions when they are watching in a setting where they can’t use sound, or when the audio is indistinguishable. In this day and age, there are many options available for closed caption creation and editing, both manually and automatically with platforms that utilize artificial intelligence (AI). This post will delve into some options that I’ve personally explored, organized here from least expensive to most expensive.
Before diving in, it should be mentioned that closed caption files typically come in one of two file formats: SRT (SubRip subTitle or .srt) and VTT (Video Text Tracks or .vtt). SRT is essentially just basic text with time markers, while VTT does allow for personalization and metadata. When creating captions, it can be very helpful to have a transcript of what was said if it’s available, but in lieu of that, listening to the video or using an auto-caption platform (like those below) can be even more efficient.
At this point, most operating systems have some sort of a native note-taking application available for those who need to create low-tech captions without much of a budget (Notepad on PC, TextEdit on Mac, Google Docs on Android, etc.). Both SRT and VTT files can be edited in this way, as well as basic TXT/.txt files. Writing captions from scratch on these applications isn’t the most straightforward thing, so if possible it may be most useful to download automatic captions from one of the below options and then open them with a Notepad app. This will provide you with a template that includes timing for the captions in the form of timestamp ranges, which is crucial for captions to display correctly with a video.
YouTube, available for free and as a native app for Android and other operating systems, provides automatic captioning for newer videos, and allows you to create captions from scratch for all videos. Once these captions are created, they can be downloaded as SRT or VTT files, or SBV, which is YouTube’s native caption format. Caption files in these formats can also be uploaded to YouTube. The user interface is quite user-friendly, and includes automatic syncing features, a “Pause while typing” feature, and keyboard shortcuts. Additionally, YouTube allows video titles, captions, and metadata to be translated into foreign languages. While the title and metadata must be translated by hand, the captions can be automatically translated by Google Translate.
Vimeo is available for free, with additional features available with subscriptions at varying tiers. Like YouTube, Vimeo offers automatic captioning, as well as caption file upload and download. Only VTT is supported, but Vimeo does cooperate with Rev to allow uploaders to pay $1.50 per minute of content for accurate captions in English or other languages.
Amara is a crowd-sourced caption editing site that allows caption editing through Vimeo and YouTube, as well as with video uploads in MP4, WebM, OGG, and MP3 formats. Upon uploading (or linking via URL to Vimeo or YouTube), videos are added to “Amara Public,” which is a “workspace…designed for collective creation and use for public videos by all Amara users.” In other words, once a video is in Amara, it’s theoretically available for others to help caption, though I have never personally had anyone assist with my caption editing. Alternatively, $12 or more per month per user can be spent for a private Amara workspace that is not publicly accessible. Once a video is uploaded/linked, a caption file can be uploaded or created from scratch, so no auto-caption options are natively available. For this reason, Amara is really best for caption editing/accuracy rather than creation, but its user interface is very robust and flexible. It’s an especially great platform for creating foreign language captions, and works with SBV, SRT, TXT, and VTT file formats, as well as DFXP and SSA. Once captions are completed in Amara, they must be downloaded and then uploaded back to the original platform (i.e. Vimeo or YouTube), as needed. The Music Library Association (MLA) has used Amara for conference session recording caption editing in the past, though the MLA Web Team eventually decided that more volunteer hours were needed than we had available, so we transitioned away toward 3Play (below).
TechSmith’s Camtasia is a video editing software suite that has a caption editing/adding feature in addition to countless others for video creation. The 2023 edition costs $299.99 for a perpetual license (with discounts for educational and governmental organizations), so it is not a worthwhile investment purely for caption editing, but for beginning-to-end video creation and editing it’s quite a bargain. It has had an auto-caption feature for years, which is honestly one of the most accurate I’ve seen (at least in terms of offline, AI caption software). Camtasia imports and exports captions in SAMI or SRT files.
For those with a larger volume of videos (as well as the funds and desire to have very accurate captions), 3Play is one of the very best captioning resources on the market. In addition to closed captions, they provide live captioning, audio description, subtitling, and translation services. Videos can be shared with 3Play from over twenty video-sharing platforms including YouTube, Facebook, Vimeo, and Panopto, and costs depend on how quickly accurate captions are requested (starting at $2.95 per minute of content for express captions in English, or even cheaper for captions requested 10 or more days out). 3Play prefers to start from scratch, so uploading captions for editing is neither necessary nor possible. For increased accuracy, 3Play requests that any unusual words, names, or acronyms mentioned in the video are provided by the customer, and words that the [human] captioner was unsure about after review are flagged for the customer to correct. While there is a bit of a learning curve with their user interface, once it’s learned the process is incredibly efficient, and even at the cheapest/slowest rate, 3Play often delivers early. Their customer support team takes great care in what they do, and they come highly recommended by other experts in the field of professional video editing.
While investing in accurate closed captions can be time consuming and possibly expensive, it’s a very important step that should be taken to make your videos available to a larger audience. In this post, I only discussed caption creation/editing post-recording/-event; live captioning is an entirely different beast, but one that MLA hopes to explore in greater detail soon. I also did not address transcripts for audio-only recordings, though that tends to be more straightforward since a special file format isn’t usually necessary for timing’s sake. In general, captions are easiest to edit when they’re initially created using software because of the timing element, but the very best captions still need a human touch for the highest accuracy. Happy captioning!