Content-based audio playback emphasis

US9454965B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9454965-B2
Application numberUS-201514852021-A
CountryUS
Kind codeB2
Filing dateSep 11, 2015
Priority dateAug 20, 2004
Publication dateSep 27, 2016
Grant dateSep 27, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are disclosed for facilitating the process of proofreading draft transcripts of spoken audio streams. In general, proofreading of a draft transcript is facilitated by playing back the corresponding spoken audio stream with an emphasis on those regions in the audio stream that are highly relevant or likely to have been transcribed incorrectly. Regions may be emphasized by, for example, playing them back more slowly than regions that are of low relevance and likely to have been transcribed correctly. Emphasizing those regions of the audio stream that are most important to transcribe correctly and those regions that are most likely to have been transcribed incorrectly increases the likelihood that the proofreader will accurately correct any errors in those regions, thereby improving the overall accuracy of the transcript.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed by a computer processor executing computer program instructions tangibly stored on a first computer-readable medium to perform a method comprising: (A) deriving, from a region of a document and a corresponding region of a spoken audio stream, a likelihood score representing a likelihood that the region of the document correctly represents content in the corresponding region of the spoken audio stream, and tangibly storing a representation of the likelihood score in a second computer-readable medium; (B) selecting a relevance score representing a measure of relevance of the region of the spoken audio stream, the measure of relevance representing a measure of importance that the region of the spoken audio stream be brought to the attention of a human proofreader, and tangibly storing a representation of the relevance score in a third computer-readable medium; (C) deriving, by dividing the relevance score by the likelihood score, an emphasis factor for modifying an emphasis placed on the region of the spoken audio stream when played back, and storing a representation of the emphasis factor in a fourth computer-readable medium; and (D) modifying, in accordance with the emphasis factor, the emphasis placed on the region of the spoken audio stream by gradually increasing an emphasis factor applied to at least one word occurring before a first word in the region of the spoken audio stream, producing an emphasis-adjusted audio stream. 2. The method of claim 1 , wherein (C) comprises deriving, from the likelihood and the measure of relevance, a timescale adjustment factor for adjusting a playback rate of the region of the spoken audio stream. 3. The method of claim 1 , wherein (C) comprises deriving, from the likelihood and the measure of relevance, a signal power adjustment factor for adjusting a signal power of the region of the spoken audio stream. 4. The method of claim 1 , further comprising: (E) playing the emphasis-adjusted audio stream. 5. The method of claim 4 , further comprising: (F) correcting errors in the document based on the emphasis-adjusted audio stream. 6. A method performed by a computer processor executing computer program instructions tangibly stored on a first computer-readable medium to perform a method comprising: (A) deriving, from a region of a document and a corresponding region of a spoken audio stream, a likelihood score representing a likelihood that the region of the document correctly represents content in the corresponding region of the spoken audio stream, and tangibly storing a representation of the likelihood score in a second computer-readable medium; (B) selecting a relevance score representing a measure of relevance of the region of the spoken audio stream, the measure of relevance representing a measure of importance that the region of the spoken audio stream be brought to the attention of a human proofreader, and tangibly storing a representation of the relevance score in a third computer-readable medium; (C) deriving, by dividing the relevance score by the likelihood score, an emphasis factor for modifying an emphasis placed on the region of the spoken audio stream when played back, and storing a representation of the emphasis factor in a fourth computer-readable medium; and (D) modifying, in accordance with the emphasis factor, the emphasis placed on the region of the spoken audio stream by gradually decreasing the emphasis factor applied to at least one word occurring after a last word in the region of the spoken audio stream, producing an emphasis-adjusted audio stream. 7. The method of claim 6 , wherein (C) comprises deriving, from the likelihood and the measure of relevance, a timescale adjustment factor for adjusting a playback rate of the region of the spoken audio stream. 8. The method of claim 6 , wherein (C) comprises deriving, from the likelihood and the measure of relevance, a signal power adjustment factor for adjusting a signal power of the region of the spoken audio stream. 9. The method of claim 6 , further comprising: (E) playing the emphasis-adjusted audio stream. 10. The method of claim 9 , further comprising: (F) correcting errors in the document based on the emphasis-adjusted audio stream.

Assignees

Inventors

Classifications

  • using prosody or stress · CPC title

  • Time compression or expansion · CPC title

  • G10L15/26Primary

    Speech to text systems (G10L15/08 takes precedence) · CPC title

  • Orthographic correction, e.g. spell checking or vowelisation · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9454965B2 cover?
Techniques are disclosed for facilitating the process of proofreading draft transcripts of spoken audio streams. In general, proofreading of a draft transcript is facilitated by playing back the corresponding spoken audio stream with an emphasis on those regions in the audio stream that are highly relevant or likely to have been transcribed incorrectly. Regions may be emphasized by, for example…
Who is the assignee on this patent?
Mmodal Ip Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 27 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).