Creating and Editing Closed Captions

Closed captions are an important step in making video (and in some cases audio) materials more accessible to larger groups of people.  In addition to being a pivotal part of the video-watching experience for viewers who are Deaf or hard of hearing, closed captions are also a manner of universal design – Even viewers who can hear benefit from captions when they are watching in a setting where they can’t use sound, or when the audio is indistinguishable.  In this day and age, there are many options available for closed caption creation and editing, both manually and automatically with platforms that utilize artificial intelligence (AI).  This post will delve into some options that I’ve personally explored, organized here from least expensive to most expensive.

Before diving in, it should be mentioned that closed caption files typically come in one of two file formats: SRT (SubRip subTitle or .srt) and VTT (Video Text Tracks or .vtt).  SRT is essentially just basic text with time markers, while VTT does allow for personalization and metadata.  When creating captions, it can be very helpful to have a transcript of what was said if it’s available, but in lieu of that, listening to the video or using an auto-caption platform (like those below) can be even more efficient.

Notepad

At this point, most operating systems have some sort of a native note-taking application available for those who need to create low-tech captions without much of a budget (Notepad on PC, TextEdit on Mac, Google Docs on Android, etc.).  Both SRT and VTT files can be edited in this way, as well as basic TXT/.txt files.  Writing captions from scratch on these applications isn’t the most straightforward thing, so if possible it may be most useful to download automatic captions from one of the below options and then open them with a Notepad app.  This will provide you with a template that includes timing for the captions in the form of timestamp ranges, which is crucial for captions to display correctly with a video.

YouTube

                YouTube, available for free and as a native app for Android and other operating systems, provides automatic captioning for newer videos, and allows you to create captions from scratch for all videos.  Once these captions are created, they can be downloaded as SRT or VTT files, or SBV, which is YouTube’s native caption format.  Caption files in these formats can also be uploaded to YouTube.  The user interface is quite user-friendly, and includes automatic syncing features, a “Pause while typing” feature, and keyboard shortcuts.  Additionally, YouTube allows video titles, captions, and metadata to be translated into foreign languages.  While the title and metadata must be translated by hand, the captions can be automatically translated by Google Translate.

Vimeo

            Vimeo is available for free, with additional features available with subscriptions at varying tiers.  Like YouTube, Vimeo offers automatic captioning, as well as caption file upload and download.  Only VTT is supported, but Vimeo does cooperate with Rev to allow uploaders to pay $1.50 per minute of content for accurate captions in English or other languages.

Amara

            Amara is a crowd-sourced caption editing site that allows caption editing through Vimeo and YouTube, as well as with video uploads in MP4, WebM, OGG, and MP3 formats.  Upon uploading (or linking via URL to Vimeo or YouTube), videos are added to “Amara Public,” which is a “workspace…designed for collective creation and use for public videos by all Amara users.”  In other words, once a video is in Amara, it’s theoretically available for others to help caption, though I have never personally had anyone assist with my caption editing.  Alternatively, $12 or more per month per user can be spent for a private Amara workspace that is not publicly accessible.  Once a video is uploaded/linked, a caption file can be uploaded or created from scratch, so no auto-caption options are natively available.  For this reason, Amara is really best for caption editing/accuracy rather than creation, but its user interface is very robust and flexible.  It’s an especially great platform for creating foreign language captions, and works with SBV, SRT, TXT, and VTT file formats, as well as DFXP and SSA.  Once captions are completed in Amara, they must be downloaded and then uploaded back to the original platform (i.e. Vimeo or YouTube), as needed.  The Music Library Association (MLA) has used Amara for conference session recording caption editing in the past, though the MLA Web Team eventually decided that more volunteer hours were needed than we had available, so we transitioned away toward 3Play (below).

Camtasia

            TechSmith’s Camtasia is a video editing software suite that has a caption editing/adding feature in addition to countless others for video creation.  The 2023 edition costs $299.99 for a perpetual license (with discounts for educational and governmental organizations), so it is not a worthwhile investment purely for caption editing, but for beginning-to-end video creation and editing it’s quite a bargain.  It has had an auto-caption feature for years, which is honestly one of the most accurate I’ve seen (at least in terms of offline, AI caption software).  Camtasia imports and exports captions in SAMI or SRT files.

3Play

            For those with a larger volume of videos (as well as the funds and desire to have very accurate captions), 3Play is one of the very best captioning resources on the market.  In addition to closed captions, they provide live captioning, audio description, subtitling, and translation services.  Videos can be shared with 3Play from over twenty video-sharing platforms including YouTube, Facebook, Vimeo, and Panopto, and costs depend on how quickly accurate captions are requested (starting at $2.95 per minute of content for express captions in English, or even cheaper for captions requested 10 or more days out).  3Play prefers to start from scratch, so uploading captions for editing is neither necessary nor possible.  For increased accuracy, 3Play requests that any unusual words, names, or acronyms mentioned in the video are provided by the customer, and words that the [human] captioner was unsure about after review are flagged for the customer to correct.  While there is a bit of a learning curve with their user interface, once it’s learned the process is incredibly efficient, and even at the cheapest/slowest rate, 3Play often delivers early.  Their customer support team takes great care in what they do, and they come highly recommended by other experts in the field of professional video editing.

Conclusion

While investing in accurate closed captions can be time consuming and possibly expensive, it’s a very important step that should be taken to make your videos available to a larger audience.  In this post, I only discussed caption creation/editing post-recording/-event; live captioning is an entirely different beast, but one that MLA hopes to explore in greater detail soon.  I also did not address transcripts for audio-only recordings, though that tends to be more straightforward since a special file format isn’t usually necessary for timing’s sake.  In general, captions are easiest to edit when they’re initially created using software because of the timing element, but the very best captions still need a human touch for the highest accuracy.  Happy captioning!

Gamified Instruction with Twine

Twine is a text-based, interactive fiction platform created in 2009 by Baltimore-based writer, game designer, and web developer Chris Klimas. Twine runs on Windows, Mac OS, or Linux, and is also available as a web app. It can be downloaded from twinery.org.

Interactive fiction means you, the reader, make choices about the direction of the narrative. Twine is a free, open-source program that allows helps you author a story whose twists and turns are determined by your reader choosing which links to click on in every panel of text. Easy to use, Twine requires no experience with coding. The Twine Cookbook shows you first how to create a simple, basic story, then how to add features and complexity. Twine is designed to work with text, but it is easy to add graphics audio and to customize the look and feel with CSS.

Authoring a narrative in Twine, you will see a graphic display that shows text passages as discrete elements and maps the relationships between them based on the links they contain. When you are ready, you can export your story as a small, stand-alone HTML file that can be opened in a web browser.

Part of the structure of the project Marion Sparkle and the Mansion of Fire, opened in Twine.
(source: University of York)

The splash screen of Marion Sparkle and the Mansion of Fire.
(source: University of York)

Game designers embraced Twine right from the beginning. The same capabilities that make Twine ideal for authoring interactive fiction also make it a powerful tool for creating text-based games. The simplicity of using Twine makes it especially appealing to individual game authors, the kind who create prosocial games for reasons other than commercial ones. The most influential is one you have probably never heard of, Depression Quest. This is a journey through the life of a person with clinical depression. Not very fun, really, but it made its own kind of history when it touched off the “Gamergate” harassment campaign against women in the video game industry in 2014. (Was that really less than ten years ago?) Another critically acclaimed title developed with Twine is Howling Dogs, a meditation on trauma and escape.

Thousands of interactive fiction works have been created with Twine; You can browse many of them on the Interactive Fiction Database and on itch.io.

The simplicity of use that makes Twine easy for an individual fiction or game author to work with – no need for a team of developers! – makes it a great platform to develop a gamified instruction activity. Because Twine games are published to HTML files, they can be distributed to students directly or published to interactive fiction repositories like IFDB. This flexibility makes them especially useful for asynchronous instruction. Here is an example of a gamified library instruction activity that guides players through a tour of library collections, subscriptions, services, and policies at the University of Denver: Aliens in the AAC.

Aliens in the AAC, a game that guides players through a tour of University of Denver’s
collections, subscriptions, services, and policies.

Gamified instruction allows you to introduce concepts in a fun context that bypasses the anxiety and resistance students sometimes have toward formal instruction. Twine is a simple, easy-to use game authoring platform optimized for a busy instructor without a background in coding. Give it a try!

Open Research Practices

As I prepare to attend the 2023 Music Librarian Association Conference in St. Louis, I find myself excitedly browsing through the draft schedule, planning how to juggle my time in order to hear from as many of my music research colleagues as possible.Conferences such as this one, which pull from many different communities, offer opportunity to expand creativity by exploring ideas with people outside local communication channels (Rogers, 2003).  I am thankful for the opportunity to connect with others and learn about ideas or practices they have implemented that I might perceive as new, and consider how I might adopt those innovations as I seek to identify and solve problems in my own scholarly and creative work.

The Emerging Technologies and Services Committee (ETSC) Tech Hub will provide opportunities for attendees to explore a variety of innovations of potential use in identifying and solving problems. In response to MLA community feedback, this session will build on past years’ Tech Hub presentations and is planned to include topics and platforms facilitating data sonification for beginners, open research practices, music therapy and evidence synthesis, and digital music score platforms, among others. 

One of the platforms which will be provided for participant exploration is Pressbooks, an online platform intended to support the creation, adaptation and sharing of content. Attendees will have a hands-on opportunity to experiment with the platform via the OpenOKState program, the Oklahoma State University Libraries’ initiative supporting integration of open practices into research, teaching and learning at Oklahoma State University. A quick search of the term ‘music’ in the Pressbooks Directory returns at least 74 books whose authors have intentionally created and licensed them for use and customization by other scholars and instructors. During the ETSC Pressbooks session, participants will learn how they might adapt, customize, or even create similar resources of their own.

The Pressbooks session presentation is supported in part by a grant from the Institute of Museum and Library Services in support of a project using open research practices to explore open educational resources (OER) and lifelong learning. The goal of the project is to develop a replicable, reliable method to assess the efficacy of OER on lifelong learning competencies. Anticipated project deliverables include a toolkit applicable to multiple contexts which faculty can easily implement to measure the efficacy of OER on developing lifelong learning competencies in their own courses. A second deliverable will be an openly available book on research methodology focused on librarians conducting research. The entire project, including its final deliverables, has been intentionally implemented using open research practices.

An understanding of open research practices begins with an operational definition of research, itself. The Open Lifelong Learning project has defined research as a systematic investigation whose goal is identifying and/or solving problems. The term systematic refers to the intentionality with which the investigation is planned and implemented, and the goal leaves room for continued curiosity as well as provision of solutions as acceptable outcomes. ‘Open’ refers to transparent processes and practices through which the project is strengthened by the input of others’ expertise and experiences. 

One aspect of open processes has to do with the point at which the research is shared with others. Rather than waiting until the research has been completed and the project deliverables finalized to share their work, the Open Lifelong Learning team has presented at scholarly conferences throughout the research process. The intent of this transparency has been to seed ideas for a wide range of research projects, as well as to invite the unique expertise of other scholars. For example, a close study of empirical research into OER surfaces an emphasis on quantitative research investigating the impact of OER on outcomes such as DFW rates or grades. While these findings are useful, the field will benefit from research using a broader range of methodologies to explore a variety of outcomes. Another challenge has to do with the dispositions, skills, and subject matter understanding of individual researchers. As the Open Lifelong Learning team opened their work for input from others at scholarly conferences, questions were surfaced and answered by scholars and experts outside disciplines represented by the researchers. The outcome of this democratized approach to the scholarly conversation is a survey instrument which has been strengthened through intradisciplinary interrogation. It will also be interesting to note to what extent interest in the final project is influenced by others’ interaction with the process overall. 

The Pressbooks platform helps facilitate the open research process implemented by the Open Lifelong Learning team. Since both the process and the product embed ideas of contextual customization, the usability and discoverability of Pressbooks made use of the platform a logical choice. While open research practices can certainly take place independent of the Pressbooks platform, we hope those who are curious about its potential are able to come try it out during the MLA 2023 TechHub session.

Data Sonification for Beginners

Data Sonification

What if you could make data more engaging? Imagine a data presentation that could elicit an emotional response from your audience. Data that can talk to you. Even sing to you. This is the world of data sonification.

We are all familiar with data visualization, the realm of techniques that translate data into visual images. These images allow us to grasp data patterns quickly and easily. We learn to produce and consume simple visualizations – pie charts, bar charts, line graphs – as early as elementary school. These are so ubiquitous, we rarely notice them.

Data sonification is analogous to data visualization, but instead of perceptualizing data in the visual realm, it perceptualizes data in the sonic realm. Sonification has a reputation as a cutting-edge, experimental practice, and in many ways it is just that. But it has also been around longer than many of us realize. David Worrall, in his 2019 book, Sonification Design, describes how the Egyptian Pharaoh audited granary accounts by having independently-prepared ledgers read aloud before him, and listening for discrepancies. (In fact, the very word, “audit,” comes from the Latin word meaning “to hear.”)

Another newer, but still retro, manifestation of data sonification should be familiar from cold-war era science fiction movies, or maybe old episodes of Mission Impossible: the sound of a Geiger counter, an electronic instrument that measures ionizing radiation. Hand-held Geiger counters characteristically produce audible clicks in response to ionizing events, to optimize their usefulness when the user’s attention must be focused somewhere other than reading a meter visually.

Modern attempts at computer-assisted data sonification began to gather speed in the early 1990s. A typical study is Scaletti and Craig’s 1991 paper, “Using Sound to Extract Meaning from Complex Data,” which explored the possibilities of parameter-mapped sonification using technology available at the time. The International Community for Audio Display (ICAD) was founded in 1992 and has held a conference most years since then. The Sound and Music Computing Conference and the Interactive Sonification Workshop both started in 2004. Sonification research is now regularly published in engineering journals, psychology journals, music journals, and small handful of specialty interdisciplinary publications like the Journal of Multimodal User Interfaces.

Most data sonification projects fall into one of three categories: audification, parameter-mapped sonification, or model-based sonification. Of these, audification is the simplest; it involves shifting a data stream into the audible realm by using it to produce sound directly, often dramatically speeding it up or slowing it down in the process. This has often been applied in seismology to allow researchers to listen to earthquakes, such as this sonification of the 2011 Tohoku Earthquake in Japan. It has also been applied to astronomical data, notably by NASA, and also in the work of Wanda Diaz Merced.

Model-based sonification is a much more subtle process. Here the technique is to take a basic sound and modify particular aspects of it according to data values. The sound must first be represented by a mathematical model, as is done routinely with musical instruments for computer music applications. Then, different parts of the model are made to interact with values representing different data variables. The resulting transformations of the model yield a new sound, which reflects the influence of the data values. Think of the sound of a bell being rung. The sound of the bell depends on various qualities: its size, its thickness, the ratio of length to width, how much it flares, what kind of metal it is made of. Vary any one of these, and the sound is altered. This is how model-based sonification works, except that a mathematical model of a bell is subjected to variation, rather than an actual bell. (No bells were harmed in the course of this research!) These four sounds use this kind of process to sonify distributions of neurons in artificial networks: id=1 cluster, id=3 cluster, id=5 cluster, id=6 cluster. (The examples are from Chapter 16 of Thomas Hermann’s The Sonification Handbook, and can be found along with others here.

Parameter-mapped sonification lies somewhere in the middle between these two approaches on the continuum of sophistication. In parameter-mapped sonification, individual sound parameters such as pitch, loudness, duration, or timbre are mapped to values in a dataset. This is the most accessible approach to sonification for most people; the easiest to grasp intuitively, and the easiest to experiment with. It works particularly well for single-variable, time-series data.

Low Barrier to Entry

A number of easy-to-use tools have been developed by sonification researchers to allow you to develop parameter-mapped sonifications on your own. One of these is TwoTone, developed by Sonify, Inc., in partnership with Google. Twotone is available as a free web app with an intuitive user interface. It comes with a library of existing datasets for you to play around with, or you can upload a spreadsheet file of your own to sonify. TwoTone will map your data onto midi pitches, according to ranges and constraints you can specify. In addition to sonification, it shows a real-time animated graph indicating what part of the data you are listening to at any particular moment, making it a multi-modal tool for experiencing data. You can download your sonification as an MP3 file, but to capture the visualization you need to use a screen recorder.

Another free web app for data sonification is offered by Music Algorithms, developed by Jonathan N. Middleton of Eastern Washington University. Music Algorithms does not offer a visualization to go along with its sonification, but it does offer duration as a parameter for sonification, which TwoTone does not. Where TwoTone comes pre-loaded with sample datasets to play with, Music Algorithms offers mathematical series such as Pi, Fibonacci, or a DNA sequence, in addition to a “custom” function that allows you to input your own data. You can download your finished sonification as a MIDI file.

Much of the leading work in data sonification happens in the Sonification Lab at Georgia Tech. Their Sonification Sandbox was one of the first tools to allow public users to create their own sonifications. First released publicly in 2007, it is a free program you can download and install on your own computer. However the program is written in Java, and the creators have not kept it current with Java version updates. The last version (still available), is from 2014, and includes modifications to support Java 7. The most recent Java version is Java 19, released in Sept. 2022, and Sonification Sandbox works poorly with it. To get the best results with Sonification Sandbox, use a dedicated system (or a virtual installation within another system) running Java 7.

That doesn’t mean Georgia Tech has been sitting on its hands. Highcharts Sonification Studio, released in 2021, is a fully-updated web-based sonification platform, developed in partnership between GT and the data visualization software developer Highcharts. Users can upload a CSV file, choose data and sonification parameters, and produce a MIDI-based sonic rendering of their data.

Medium Barrier to Entry

Anyone who has spent much time around electronic composition is probably familiar with a visual object-oriented programming environment called Max, originally developed by Miller Puckette at IRCAM in the 1980s. Although not developed with data sonification in mind, this is one of Max’s capabilities. Max offers great flexibility, but it comes with a correspondingly steep learning curve. Fortunately, it is known for its great documentation, tutorials, and a user community not shy about posting instructional videos. If you are interested in using Max for sonification, tutorial 18 is the one you will be shooting for. Start at the beginning and take the tutorials one by one, and when you get to tutorial 18, you will learn how to use Max to convert spreadsheet data to sound and animated graphing.

Max is a little pricey, at least compared to a free web app; you can expect to pay around $400 for a license, or $250 with an academic discount. For the more adventurous, there is a free, open-source alternative called Pure Data. Pure Data (or PD), also developed by Puckette, is a completely separate and independent tool, but is designed to do the things Max does, using an interface similar to that of Max. The big difference is in the documentation: PD’s documentation is mostly community-developed, so it isn’t always as beginner-friendly as the documentation in Max. However, if you are patient, you can learn to do the same things in PD that you can do in Max. Besides being free, PD also has the advantage that it is available in a version for Linux, as well as for MacOS and Windows. (Max is available for Windows and Mac only.)

Sonification for Librarians

So what might you do once you get your hands on these tools? Good question! Here are a few sonifications I have created using the humblest data at my disposal: the log of the gate counter in my music library at University of Denver. Using TwoTone, I created a sonification (and recorded the animated graph) of patron visits to the library in FY 2018. Play the video, and watch for the small orange line moving from the left through the rows of blue lines. You will notice that higher pitch correlates to higher values in the sonified data.

The top row is a repeating sequence of seven values indicating weeks in the year; it is placed in the lowest range. The next row is the mornings, with the noon hour included; it is in a higher range. It begins with a period of lower values representing reduced library traffic in the middle and late summer, then jumps dramatically when school begins in the fall. The next row is the afternoon/evening reading; it is in an even higher range. You may notice that after school begins, the gaps in this row between groups of values seen in the summer disappear. This is because during school breaks we do not open on the weekend, while during school terms, we have weekend hours in the afternoon. (But not in the morning – note the contrast with the row above.) University of Denver is on the quarter system; you can easily identify Fall, Winter, and Spring Quarters, separated by a long Winter Break and a much shorter Spring Break. The last row is the night shift, and it is placed in the highest range of all; I have also further distinguished it by representing it in two-note arpeggios. You can see that we are open nights only during school terms.

The same data was differently, for this sonification. Here, the daily totals were used instead of individual shifts, but Fall, Winter, and Spring Quarters were mapped against each other as a comparison.

Here is another sonification of the quarter-against quarter data, this one created using Music Algorithms. Data values are again mapped to pitch. The time values in this sonification are manipulated in such a way that they grow longer in a repeating cycle of seven values, with Monday as the shortest and Sunday as the longest. This allows us to identify the weekend data as the two longest time values at the end of each series.

A sonification of the same data using Sonification Sandbox is here. Like the other tools, this one has its own look and sound. The recording is a little slow; it takes 60 seconds to hear all of it. You might want to try speeding it up on YouTube; alternatively, this recording completes the process in ten seconds.

Here is a sonification created with Max. In this case, three years’ gate-count data were superimposed on each other, with each displayed in a different-colored graph, and all three sonified simultaneously.

Listening to these sonifications, you are likely to experience them asthetically, or even emotionally, in a way that rarely happens when we inspect a visual chart or graph. This is possibly the most unusual aspect of data sonification: the immediacy and urgency of sound make vision seems cool and analytical by comparison. Unlike our eyes, we cannot “close” our ears; nor can we “listen away” from one stimulus as we can look away in order to focus on another. This difference in quality between aural and visual perception helps explain why so many sonification tools are designed for multimodal presentation – hearing and vision are designed to reinforce each other.

Try some of these tools and see what you can create with data sonification. If you come up with something interesting, post a comment here!

ETSC TechHub

ETSC TechHub is a drop-in session held at MLA conferences that includes a variety of technology-related discussion groups. By attending TechHub, MLA members can get quick informal tutorials on various digital tools or ideas. The first four posts on our blog will be dedicated to videos and resources produced for MLA TechHub 2022.

Music Library Association Emerging Technologies and Services Committee

Welcome to the Music Library Association (MLA) Emerging Technologies and Services Committee (ETSC) blog! This committee works to identify and evaluate current trends, tools, services, and developments relating to technologies used by libraries and librarians, with special attention to their handling of music materials. It coordinates and facilitates the exchange of this information to the MLA membership. Our blog will be used to share information about emerging technologies and services identified by committee members and guests.