Automatic generation of descriptive video service tracks
US-2019069045-A1 · Feb 28, 2019 · US
US11729475B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11729475-B2 |
| Application number | US-201916699391-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 29, 2019 |
| Priority date | Dec 21, 2018 |
| Publication date | Aug 15, 2023 |
| Grant date | Aug 15, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system and method for providing described video for media content generates a plurality of individual audio files, possibly using text-to-speech, for each line of a described video script. The described video script provides an indication of the timing, such as for example the start time and length, of the individual described video lines. The described video script can then be used to combine the individual audio files into a single audio file for inclusion with the media content.
Opening claim text (preview).
What is claimed is: 1. A method of preparing described video for media content comprising: receiving a described video script comprising: a plurality of script lines, each script line comprising: an associated line of text; and timing information defining timing of an audio gap within the media content for insertion of audio corresponding to the associated line of text within the media content; generating a plurality of speech synthesis markup language (SSML) files, each SSML file corresponding to a respective script line of the plurality of script lines and having a maximum length based on the timing information of the respective script line; generating a plurality of audio files, each audio file generated from a respective SSML file, wherein at least one of the plurality of audio files comprises audio generated from the line of text that has been sped up in order to fit within the timing of the respective audio gap; and combining the plurality of audio files into described video audio for the media content according to the timing information of the described video script. 2. The method of claim 1 , wherein generating the plurality of audio files comprises: generating each of the audio files using a text to speech converter according to the respective SSML file. 3. The method of claim 2 , wherein generating a respective one of the SSML files comprises: search for a match of words in the line of text to words in a pronunciation database; and if a match is found, replacing the matched word with an associated pronunciation from the pronunciation database. 4. The method of claim 2 , wherein generating each of the audio files using the text to speech converter according to the respective SSML file comprises: transmitting each of the SSML files to the text to speech converter; and receiving each of the audio files from the text to speech converter. 5. The method of claim 1 , further comprising: mixing the described video audio with audio of the media content to provide a final described video audio mix; and multiplexing the final described video audio mix into the media content. 6. The method of claim 5 , wherein mixing the described video audio with the audio of the media and multiplexing the final described video audio mix is done using an edit decision list (EDL). 7. The method of claim 1 , wherein the timing information comprises at least two of: a start time; a stop time; and a duration. 8. The method of claim 1 , wherein the described video script is received in a defined format. 9. The method of claim 1 , further comprising generating the described video script by: displaying a low resolution version of the media content; for each of the plurality of script lines: receiving a first input indicative of a start point in the displayed media content; determining a start time in the media content for the start point; receiving a second input indicative of a stop point in the displayed media content; determining a stop time in the media content for the stop point; generating the timing information from the start time and stop time; and receiving a text input of the line of text associated with the timing information. 10. The method of claim 1 , further comprising: generating a second described video script by converting each of the associated lines of text to a different language; generating a respective audio file from the line of text of each of the plurality of script lines in the second described video script; and combining the plurality of audio files into a second described video audio for the media content according to the timing information of the second described video script. 11. A method of generating an audio file comprising: receiving a script comprising a plurality of script lines, each script line comprising: an associated line of text; and timing information defining possible timing of the associated line of text within a complete audio file; generating a plurality of speech synthesis markup language (SSML) files, each SSML file corresponding to a respective script line of the plurality of script lines and having a maximum length based on the timing information of the respective script line; generating a plurality of audio files, each audio file generated from a respective SSML file, wherein at least one of the plurality of audio files comprises audio generated from the line of text that has been sped up in order to fit within the timing of the respective audio gap; and combining the plurality of audio files into the complete audio file according to the timing information of the script. 12. The method of claim 11 , wherein generating the plurality of audio files comprises: generating each of the audio files using a text to speech converter according to the respective SSML file. 13. The method of claim 12 , wherein generating each of the audio files using the text to speech converter according to the respective SSML file comprises: transmitting each of the SSML files to the text to speech converter; and receiving each of the audio files from the text to speech converter. 14. The method of claim 11 , wherein the timing information comprises at least two of: a start time; a stop time; and a duration. 15. A system for preparing described video for media content, the system comprising: a processor for executing instructions; and a memory storing instructions, which when executed by the processor configure the system to perform a method according to claim 1 . 16. A system for generating an audio file, the system comprising: a processor for executing instructions; and a memory storing instructions, which when executed by the processor configure the system to perform a method according to claim 11 .
Generation or processing of descriptive data, e.g. content descriptors {(systems specially adapted for using meta-information in broadcast systems H04H60/73)} · CPC title
Insert-editing · CPC title
Indexing; Addressing; Timing or synchronising; Measuring tape travel · CPC title
Speech synthesis; Text to speech systems · CPC title
Content authoring · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.