What technology area does this patent fall under?

Primary CPC classification G11B27/10. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and method for providing descriptive video

US12587718B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12587718-B2
Application number	US-202318216190-A
Country	US
Kind code	B2
Filing date	Jun 29, 2023
Priority date	Dec 21, 2018
Publication date	Mar 24, 2026
Grant date	Mar 24, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method for providing described video for media content generates a plurality of individual audio files, possibly using text-to-speech, for each line of a described video script. The described video script provides an indication of the timing, such as for example the start time and length, of the individual described video lines. The described video script can then be used to combine the individual audio files into a single audio file for inclusion with the media content.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of preparing described video for media content comprising: receiving a described video script comprising a plurality of script lines each comprising timing information and an associated line of text; generating a plurality of speech synthesis markup language (SSML) files from the described video script, each SSML file corresponding to a respective script line of the plurality of script lines and having a maximum length based on the timing information of the respective script line, wherein a length of time the respective script line will take to playback is estimated, and generating at least one of the plurality of SSML files comprises applying a SSML tag to set a rate of synthesized speech for the respective script line according to the timing information of the respective script line and the estimated length of time the respective script line will take to playback; generating a plurality of audio files, each audio file generated from a respective SSML file; and combining the plurality of audio files into described video audio for the media content according to the timing information of the described video script. 2 . The method of claim 1 , wherein generating the plurality of audio files comprises: generating each of the audio files using a text to speech converter according to the respective SSML file. 3 . The method of claim 2 , wherein generating a respective one of the SSML files comprises: search for a match of words in the line of text to words in a pronunciation database; and if a match is found, replacing the matched word with an associated pronunciation from the pronunciation database. 4 . The method of claim 2 , wherein generating each of the audio files using the text to speech converter according to the respective SSML file comprises: transmitting each of the SSML files to the text to speech converter; and receiving each of the audio files from the text to speech converter. 5 . The method of claim 1 , the plurality of audio files are generated in parallel. 6 . The method of claim 1 , further comprising: mixing the described video audio with audio of the media content to provide a final described video audio mix; and multiplexing the final described video audio mix into the media content. 7 . The method of claim 6 , wherein mixing the described video audio with the audio of the media and multiplexing the final described video audio mix is done using an edit decision list (EDL). 8 . The method of claim 1 , wherein the timing information comprises at least two of: a start time; a stop time; and a duration. 9 . The method of claim 1 , wherein the described video script is received in a defined format. 10 . The method of claim 1 , further comprising generating the described video script by: displaying a low resolution version of the media content; for each of the plurality of script lines: receiving a first input indicative of a start point in the displayed media content; determining a start time in the media content for the start point; receiving a second input indicative of a stop point in the displayed media content; determining a stop time in the media content for the stop point; generating the timing information from the start time and stop time; and receiving a text input of the line of text associated with the timing information. 11 . The method of claim 1 , further comprising: generating a second described video script by converting each of the associated lines of text to a different language; generating a respective audio file from the line of text of each of the plurality of script lines in the second described video script; and combining the plurality of audio files into a second described video audio for the media content according to the timing information of the second described video script. 12 . A method of generating an audio file comprising: receiving a script comprising a plurality of script lines each comprising timing information and an associated line of text; generating a plurality of speech synthesis markup language (SSML) files from the described video script, each SSML file corresponding to a respective script line of the plurality of script lines and having a maximum length based on the timing information of the respective script line, wherein a length of time the respective script line will take to playback is estimated, and generating at least one of the plurality of SSML files comprises applying a SSML tag to set a rate of synthesized speech for the respective script line according to the timing information of the respective script line and the estimated length of time the respective script line will take to playback; generating a plurality of audio files, each audio file generated from a respective SSML file; and combining the plurality of audio files into a complete audio file according to the timing information of the script. 13 . The method of claim 12 , wherein generating the plurality of audio files comprises: generating each of the audio files using a text to speech converter according to the respective SSML file. 14 . The method of claim 13 , wherein generating each of the audio files using the text to speech converter according to the respective SSML file comprises: transmitting each of the SSML files to the text to speech converter; and receiving each of the audio files from the text to speech converter. 15 . The method of claim 12 , the plurality of audio files are generated in parallel. 16 . The method of claim 12 , wherein the timing information comprises at least two of: a start time; a stop time; and a duration. 17 . A system for preparing described video for media content, the system comprising: a processor for executing instructions; and a memory storing instructions, which when executed by the processor configure the system to perform a method according to claim 1 . 18 . A system for generating an audio file, the system comprising: a processor for executing instructions; and a memory storing instructions, which when executed by the processor configure the system to perform a method according to claim 12 .

Assignees

Bce Inc

Inventors

Classifications

G10L13/00
Speech synthesis; Text to speech systems · CPC title
G11B27/10Primary
Indexing; Addressing; Timing or synchronising; Measuring tape travel · CPC title
G11B27/036
Insert-editing · CPC title
H04N21/26603
for automatically generating descriptors from content, e.g. when it is not made available by its provider, using content analysis techniques · CPC title
H04N21/234336
by media transcoding, e.g. video is transformed into a slideshow of still pictures or audio is converted into text · CPC title

Patent family

Related publications grouped by family.

View patent family 71096945

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12587718B2 cover?: A system and method for providing described video for media content generates a plurality of individual audio files, possibly using text-to-speech, for each line of a described video script. The described video script provides an indication of the timing, such as for example the start time and length, of the individual described video lines. The described video script can then be used to combin…
Who is the assignee on this patent?: Bce Inc
What technology area does this patent fall under?: Primary CPC classification G11B27/10. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Contextually generated computer speech

Automatic generation of descriptive video service tracks

Generating audio rendering from textual content based on character models

Frequently asked questions