What technology area does this patent fall under?

Primary CPC classification G11B27/10. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 15 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems, computer-implemented methods, and tangible computer-readable storage media for transcription alignment

US9495964B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9495964-B2
Application number	US-201615071644-A
Country	US
Kind code	B2
Filing date	Mar 16, 2016
Priority date	Aug 17, 2009
Publication date	Nov 15, 2016
Grant date	Nov 15, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for captioning a media presentation. The method includes receiving automatic speech recognition (ASR) output from a media presentation and a transcription of the media presentation. The method includes selecting via a processor a pair of anchor words in the media presentation based on the ASR output and transcription and generating captions by aligning the transcription with the ASR output between the selected pair of anchor words. The transcription can be human-generated. Selecting pairs of anchor words can be based on a similarity threshold between the ASR output and the transcription. In one variation, commonly used words on a stop list are ineligible as anchor words. The method includes outputting the media presentation with the generated captions. The presentation can be a recording of a live event.

First claim

Opening claim text (preview).

We claim: 1. A method comprising: comparing a transcription of a media presentation to a list of anchor word candidates to identify a pair of anchor words separated from one another within the media presentation by a time greater than an anchor word time duration requirement; and generating captions by aligning the transcription with an automatic speech recognition output of the media presentation according to the pair of anchor words. 2. The method of claim 1 , wherein the list of anchor word candidates further comprises a stop word list of words not to be considered as the pair of anchor words. 3. The method of claim 1 , wherein the anchor word time duration requirement represents a minimal time duration requirement between words in the pair of anchor words. 4. The method of claim 1 , wherein generating the captions further comprises aligning the transcription between the pair of anchor words. 5. The method of claim 1 , further comprising: outputting the media presentation with the captions. 6. The method of claim 1 , wherein the media presentation is in real-time, the method further comprising: buffering the captions based on the media presentation and the aligning of the transcription to yield buffered caption; and outputting a delayed media presentation and the buffered captions together. 7. The method of claim 1 , wherein the media presentation is in real-time, the method further comprising: buffering the media presentation to yield a delayed media presentation; and outputting the delayed media presentation and the captions together. 8. A system comprising: a processor; and a computer-readable storage device storing instructions which, when executed by the processor, cause the processor to perform operations comprising: comparing a transcription of a media presentation to a list of anchor word candidates to identify a pair of anchor words separated from one another within the media presentation by a time greater than an anchor word time duration requirement; and generating captions by aligning the transcription with an automatic speech recognition output of the media presentation according to the pair of anchor words. 9. The system of claim 8 , wherein the list of anchor word candidates further comprises a stop word list of words not to be considered as the pair of anchor words. 10. The system of claim 8 , wherein the anchor word time duration requirement represents a minimal time duration requirement between words in the pair of anchor words. 11. The system of claim 8 , wherein generating the captions further comprises aligning the transcription between the pair of anchor words. 12. The system of claim 8 , wherein the computer-readable storage device stores further instructions which, when executed by the processor, cause the processor to perform an operation further comprising outputting the media presentation with the captions. 13. The system of claim 8 , wherein the media presentation is in real-time, and wherein the computer-readable storage device stores further instructions which, when executed by the processor, cause the processor to perform an operation further comprising: buffering the captions based on the media presentation and the aligning of the transcription to yield buffered caption; and outputting a delayed media presentation and the buffered captions together. 14. A computer-readable storage device storing instructions which, when executed by a processor, cause the processor to perform operations comprising: comparing a transcription of a media presentation to a list of anchor word candidates to identify a pair of anchor words separated from one another within the media presentation by a time greater than an anchor word time duration requirement; and generating captions by aligning the transcription with an automatic speech recognition output of the media presentation according to the pair of anchor words. 15. The computer-readable storage device of claim 14 , wherein the list of anchor word candidates further comprises a stop word list of words not to be considered as the pair of anchor words. 16. The computer-readable storage device of claim 14 , wherein the anchor word time duration requirement represents a minimal time duration requirement between words in the pair of anchor words. 17. The computer-readable storage device of claim 14 , wherein generating the captions further comprises aligning the transcription between the pair of anchor words. 18. The computer-readable storage device of claim 14 , wherein the computer-readable storage device stores further instructions which, when executed by the processor, cause the processor to perform an operation further comprising outputting the media presentation with the captions. 19. The computer-readable storage device of claim 14 , wherein the media presentation is in real-time, and wherein the computer-readable storage device stores further instructions which, when executed by the processor, cause the processor to perform an operation further comprising: buffering the captions based on the media presentation and the aligning of the transcription to yield buffered caption; and outputting a delayed media presentation and the buffered captions together. 20. The computer-readable storage device of claim 14 , wherein the media presentation is in real-time, and wherein the computer-readable storage device stores further instructions which, when executed by the processor, cause the processor to perform an operation further comprising: buffering the media presentation to yield a delayed media presentation; and outputting the delayed media presentation and the captions together.

Assignees

At & T Ip I Lp

Inventors

Classifications

H04N21/4884
for displaying subtitles · CPC title
H04N21/44004
involving video buffer management, e.g. video decoder buffer or video display buffer · CPC title
G11B27/10Primary
Indexing; Addressing; Timing or synchronising; Measuring tape travel · CPC title
G10L15/265Primary
Physics · mapped topic
G10L21/055
for synchronising with other signals, e.g. video signals · CPC title

Patent family

Related publications grouped by family.

View patent family 43589105

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9495964B2 cover?: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for captioning a media presentation. The method includes receiving automatic speech recognition (ASR) output from a media presentation and a transcription of the media presentation. The method includes selecting via a processor a pair of anchor words in the media presentation based on the AS…
Who is the assignee on this patent?: At & T Ip I Lp
What technology area does this patent fall under?: Primary CPC classification G11B27/10. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 15 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).