Server side hotwording
US-2024412734-A1 · Dec 12, 2024 · US
US9454965B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9454965-B2 |
| Application number | US-201514852021-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 11, 2015 |
| Priority date | Aug 20, 2004 |
| Publication date | Sep 27, 2016 |
| Grant date | Sep 27, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are disclosed for facilitating the process of proofreading draft transcripts of spoken audio streams. In general, proofreading of a draft transcript is facilitated by playing back the corresponding spoken audio stream with an emphasis on those regions in the audio stream that are highly relevant or likely to have been transcribed incorrectly. Regions may be emphasized by, for example, playing them back more slowly than regions that are of low relevance and likely to have been transcribed correctly. Emphasizing those regions of the audio stream that are most important to transcribe correctly and those regions that are most likely to have been transcribed incorrectly increases the likelihood that the proofreader will accurately correct any errors in those regions, thereby improving the overall accuracy of the transcript.
Opening claim text (preview).
What is claimed is: 1. A method performed by a computer processor executing computer program instructions tangibly stored on a first computer-readable medium to perform a method comprising: (A) deriving, from a region of a document and a corresponding region of a spoken audio stream, a likelihood score representing a likelihood that the region of the document correctly represents content in the corresponding region of the spoken audio stream, and tangibly storing a representation of the likelihood score in a second computer-readable medium; (B) selecting a relevance score representing a measure of relevance of the region of the spoken audio stream, the measure of relevance representing a measure of importance that the region of the spoken audio stream be brought to the attention of a human proofreader, and tangibly storing a representation of the relevance score in a third computer-readable medium; (C) deriving, by dividing the relevance score by the likelihood score, an emphasis factor for modifying an emphasis placed on the region of the spoken audio stream when played back, and storing a representation of the emphasis factor in a fourth computer-readable medium; and (D) modifying, in accordance with the emphasis factor, the emphasis placed on the region of the spoken audio stream by gradually increasing an emphasis factor applied to at least one word occurring before a first word in the region of the spoken audio stream, producing an emphasis-adjusted audio stream. 2. The method of claim 1 , wherein (C) comprises deriving, from the likelihood and the measure of relevance, a timescale adjustment factor for adjusting a playback rate of the region of the spoken audio stream. 3. The method of claim 1 , wherein (C) comprises deriving, from the likelihood and the measure of relevance, a signal power adjustment factor for adjusting a signal power of the region of the spoken audio stream. 4. The method of claim 1 , further comprising: (E) playing the emphasis-adjusted audio stream. 5. The method of claim 4 , further comprising: (F) correcting errors in the document based on the emphasis-adjusted audio stream. 6. A method performed by a computer processor executing computer program instructions tangibly stored on a first computer-readable medium to perform a method comprising: (A) deriving, from a region of a document and a corresponding region of a spoken audio stream, a likelihood score representing a likelihood that the region of the document correctly represents content in the corresponding region of the spoken audio stream, and tangibly storing a representation of the likelihood score in a second computer-readable medium; (B) selecting a relevance score representing a measure of relevance of the region of the spoken audio stream, the measure of relevance representing a measure of importance that the region of the spoken audio stream be brought to the attention of a human proofreader, and tangibly storing a representation of the relevance score in a third computer-readable medium; (C) deriving, by dividing the relevance score by the likelihood score, an emphasis factor for modifying an emphasis placed on the region of the spoken audio stream when played back, and storing a representation of the emphasis factor in a fourth computer-readable medium; and (D) modifying, in accordance with the emphasis factor, the emphasis placed on the region of the spoken audio stream by gradually decreasing the emphasis factor applied to at least one word occurring after a last word in the region of the spoken audio stream, producing an emphasis-adjusted audio stream. 7. The method of claim 6 , wherein (C) comprises deriving, from the likelihood and the measure of relevance, a timescale adjustment factor for adjusting a playback rate of the region of the spoken audio stream. 8. The method of claim 6 , wherein (C) comprises deriving, from the likelihood and the measure of relevance, a signal power adjustment factor for adjusting a signal power of the region of the spoken audio stream. 9. The method of claim 6 , further comprising: (E) playing the emphasis-adjusted audio stream. 10. The method of claim 9 , further comprising: (F) correcting errors in the document based on the emphasis-adjusted audio stream.
using prosody or stress · CPC title
Time compression or expansion · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Orthographic correction, e.g. spell checking or vowelisation · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.