System and method for providing descriptive video

US12587718B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12587718-B2
Application numberUS-202318216190-A
CountryUS
Kind codeB2
Filing dateJun 29, 2023
Priority dateDec 21, 2018
Publication dateMar 24, 2026
Grant dateMar 24, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method for providing described video for media content generates a plurality of individual audio files, possibly using text-to-speech, for each line of a described video script. The described video script provides an indication of the timing, such as for example the start time and length, of the individual described video lines. The described video script can then be used to combine the individual audio files into a single audio file for inclusion with the media content.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of preparing described video for media content comprising: receiving a described video script comprising a plurality of script lines each comprising timing information and an associated line of text; generating a plurality of speech synthesis markup language (SSML) files from the described video script, each SSML file corresponding to a respective script line of the plurality of script lines and having a maximum length based on the timing information of the respective script line, wherein a length of time the respective script line will take to playback is estimated, and generating at least one of the plurality of SSML files comprises applying a SSML tag to set a rate of synthesized speech for the respective script line according to the timing information of the respective script line and the estimated length of time the respective script line will take to playback; generating a plurality of audio files, each audio file generated from a respective SSML file; and combining the plurality of audio files into described video audio for the media content according to the timing information of the described video script. 2 . The method of claim 1 , wherein generating the plurality of audio files comprises: generating each of the audio files using a text to speech converter according to the respective SSML file. 3 . The method of claim 2 , wherein generating a respective one of the SSML files comprises: search for a match of words in the line of text to words in a pronunciation database; and if a match is found, replacing the matched word with an associated pronunciation from the pronunciation database. 4 . The method of claim 2 , wherein generating each of the audio files using the text to speech converter according to the respective SSML file comprises: transmitting each of the SSML files to the text to speech converter; and receiving each of the audio files from the text to speech converter. 5 . The method of claim 1 , the plurality of audio files are generated in parallel. 6 . The method of claim 1 , further comprising: mixing the described video audio with audio of the media content to provide a final described video audio mix; and multiplexing the final described video audio mix into the media content. 7 . The method of claim 6 , wherein mixing the described video audio with the audio of the media and multiplexing the final described video audio mix is done using an edit decision list (EDL). 8 . The method of claim 1 , wherein the timing information comprises at least two of: a start time; a stop time; and a duration. 9 . The method of claim 1 , wherein the described video script is received in a defined format. 10 . The method of claim 1 , further comprising generating the described video script by: displaying a low resolution version of the media content; for each of the plurality of script lines: receiving a first input indicative of a start point in the displayed media content; determining a start time in the media content for the start point; receiving a second input indicative of a stop point in the displayed media content; determining a stop time in the media content for the stop point; generating the timing information from the start time and stop time; and receiving a text input of the line of text associated with the timing information. 11 . The method of claim 1 , further comprising: generating a second described video script by converting each of the associated lines of text to a different language; generating a respective audio file from the line of text of each of the plurality of script lines in the second described video script; and combining the plurality of audio files into a second described video audio for the media content according to the timing information of the second described video script. 12 . A method of generating an audio file comprising: receiving a script comprising a plurality of script lines each comprising timing information and an associated line of text; generating a plurality of speech synthesis markup language (SSML) files from the described video script, each SSML file corresponding to a respective script line of the plurality of script lines and having a maximum length based on the timing information of the respective script line, wherein a length of time the respective script line will take to playback is estimated, and generating at least one of the plurality of SSML files comprises applying a SSML tag to set a rate of synthesized speech for the respective script line according to the timing information of the respective script line and the estimated length of time the respective script line will take to playback; generating a plurality of audio files, each audio file generated from a respective SSML file; and combining the plurality of audio files into a complete audio file according to the timing information of the script. 13 . The method of claim 12 , wherein generating the plurality of audio files comprises: generating each of the audio files using a text to speech converter according to the respective SSML file. 14 . The method of claim 13 , wherein generating each of the audio files using the text to speech converter according to the respective SSML file comprises: transmitting each of the SSML files to the text to speech converter; and receiving each of the audio files from the text to speech converter. 15 . The method of claim 12 , the plurality of audio files are generated in parallel. 16 . The method of claim 12 , wherein the timing information comprises at least two of: a start time; a stop time; and a duration. 17 . A system for preparing described video for media content, the system comprising: a processor for executing instructions; and a memory storing instructions, which when executed by the processor configure the system to perform a method according to claim 1 . 18 . A system for generating an audio file, the system comprising: a processor for executing instructions; and a memory storing instructions, which when executed by the processor configure the system to perform a method according to claim 12 .

Assignees

Inventors

Classifications

  • Speech synthesis; Text to speech systems · CPC title

  • G11B27/10Primary

    Indexing; Addressing; Timing or synchronising; Measuring tape travel · CPC title

  • Insert-editing · CPC title

  • for automatically generating descriptors from content, e.g. when it is not made available by its provider, using content analysis techniques · CPC title

  • by media transcoding, e.g. video is transformed into a slideshow of still pictures or audio is converted into text · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12587718B2 cover?
A system and method for providing described video for media content generates a plurality of individual audio files, possibly using text-to-speech, for each line of a described video script. The described video script provides an indication of the timing, such as for example the start time and length, of the individual described video lines. The described video script can then be used to combin…
Who is the assignee on this patent?
Bce Inc
What technology area does this patent fall under?
Primary CPC classification G11B27/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).