In order to obtain the best quality live closed captioning, SOVO leverages unique elements from its technology, personnel and operational methodology. In this section, we list some of the specific items that make SOVO’s service truly unmatched:


SOVO’s proprietary live-captioning software is unique on many aspects including its multi-modality, based on the best speech recognition engine, a computer game joystick and a graphical user interface to allow our live captioners to produce the best possible captions using the input mode that is most appropriate to the task, in realtime. The software is especially efficient when captioning high speech rate, unscripted content.

The STDirect speech recognition sub-system relies on “acoustic models” which represent all the different sounds that make up words in a given language. These models have been built with advanced artificial intelligence techniques powered through very large server farms composed of hundreds of computing processors using tens of thousands of hours of carefully annotated audio files. SOVO is leading the way in applying deep learning technologies to the task of respeaking for the production of closed captions for live broadcasts. The resulting high-performance acoustic models are speaker independent, such that they do not need to be trained for the particular voice of the captioner. This leads to more consistent quality results.

Version 8.0 of the STDirect software also introduces Dynamic-LM technology. Aside from the acoustic models, another key component of a speech recognition system is the language model which take into account the probabilities for words and sequence of words in a specific language or domain. The Dynamic-LM technology allows the use of very complex, arbitrarily large language models that cover a sub-domain or language with unmatched precision. As one particular example of the use of this technology, SOVO leverages a very detailed sub-grammar for representing numbers with all variations of decimals, fractions, orders, etc. Achieving superior accuracy on numbers and all their variations is of course of critical importance for caption intelligibility in domains such as business, public affairs, sports, weather, and more.

Aside from their voice, the live captioners use a joystick to input punctuation and special characters or text indicating either a change of speaker, background noises, clapping or other. This allows the captioner not to have to verbalize every period, comma, or other character that is often required, thus yielding a faster, more fluid and precise delivery of the live captions, especially when the speech rate is high. Finally, a custom-designed graphical interface allows the captioners to “click” words or other prepared text to be sent through in combination with the rest of the captions. This gives tremendous flexibility to the voice writers in delivering the best possible live captions. Complex, single-event names of teams or players (for instance the Ukrainian bobsleigh team members during a winter Olympics) or the specific name of a small town in Northern Manitoba can be prepared ahead of time (or even in realtime during production) and “clicked” through as an alternative to using voice input. Any scripted material available ahead of time can also be prepared to use directly in this user interface to reach 100% accuracy during these specific segments.

SOVO’s software architecture, unique in the industry, also allows the sharing of data, hardware and software among the entire group of voice writers working at SOVO. The server approach used by SOVO, contrary to an approach where each speaker uses personal dictation software running on their own computer, permits tapping much more powerful centralized software and hardware resources. Furthermore, the approach lets each captioner benefit immediately from system changes provided by their colleagues over the years in order to achieve better performance. In other organizations, captioners often work in silos and can hardly share new words, grammars or other improvements to solve transcription problems.


Throughout the years, SOVO has built significant intellectual property in the live captioning of TV broadcasts, parliamentary sessions, and other events. A great portion of this intellectual property takes form in a wide variety of language models and dictionaries, which model how words, sequence of words and sentences are used in the different live productions. The SOVO dictionaries, containing over half a million words associated with one or many pronunciations, are continuously updated and improved. SOVO has worked for a decade to build its dictionaries, containing tens of thousands of unique names of people, places, teams and other specific terms used in TV broadcasts, including all the names of athletes and teams for sports-related broadcasts, names and terms related to politics, public affairs and popular culture, weather and local geography terms, etc. The SOVO dictionaries (shared among all captioners, present and future) represent an invaluable asset.

The language models, which represent the likelihood that certain words are sequenced in particular ways according to the ‘domain’ of the production, are optimized for the specific task at hand. SOVO has developed and refined over the years language models for each sport discipline covered on live television, for public affairs, for children’s shows, for cultural events, for the different types of news, weather, etc. The models are used in production and automatically updated on a continuous basis using corrected texts from the productions in order to achieve optimal captioning performance.
SOVO invests a considerable amount of time, money and energy on a yearly basis to develop and maintain its set of dictionaries and specialized language models, which, unlike many of its competitors, are shared among all the captioners, now across both French and English (when it applies).


ensure consistency
and quality

Even if a session is held live, it is always possible to prepare in advance in order to improve the accuracy of captions once it starts. At SOVO, the voice writers are assigned a period of pre-production for each live event. They use this time to review the production guide for this particular show or content type. SOVO creates and maintains a detailed production guide for each type of show or content we cover, in order to ensure consistency and quality. The captioners also review any document delivered by the customer concerning the upcoming session (such as news items and topics of the day, names of people and places, guest lists, etc). The dictionaries are updated if required, pronunciations are verified, and lists of words or names are created in advance to deliver the best possible results.

Immediately after the live production, or in the following hours, SOVO’s team analyses the results. If some words have caused issues, the language models and dictionaries are revisited. In numerous cases, SOVO undertakes a systematic correction of the transcriptions for the production (according to the level of service agreed to with the customer). Corrected captions can serve up to three objectives: improvement of the language models for the specific “domain” used for the production, the evaluation of the live performance, and as a basis for the creation of a corrected caption file when needed.

One should note that even if the contract or service level agreed with the customer does not require SOVO to correct the captions, the team will nevertheless correct a portion of all live events so that we can benchmark and continuously optimize the system. The considerable effort invested by SOVO in post-production is one of the reasons the service keeps improving over time for each of the voice writers and for each type of show on our production calendar. Furthermore, improvements made by SOVO based on one customer’s’ production help improve the performance for all other customers or potential customers.


SOVO’s investment in continuous training and performance monitoring ensures great quality in live captioning throughout our production portfolio.

All the technology and best operational practices in the world will not provide the desired results if the voice writers are not totally proficient, well trained at the different tasks and highly motivated.

SOVO pays special attention to the recruiting and to the ongoing training of its staff. The performance of each voice writer in different domains is continuously measured, and practice/correction sessions are regularly booked in the voice writers’ schedules in order to optimize their results on a very wide variety of content.

The voice writers are offered a variety of types of productions, and have the opportunity to participate in R&D activities, pilot projects, etc. The fact that all employees perform their live captioning sessions on SOVO’s premises largely improves the efficiency of our program.