Continuous speech transcription performance indication

US9583107B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9583107-B2
Application numberUS-201414517720-A
CountryUS
Kind codeB2
Filing dateOct 17, 2014
Priority dateApr 5, 2006
Publication dateFeb 28, 2017
Grant dateFeb 28, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of providing speech transcription performance indication includes receiving, at a user device data representing text transcribed from an audio stream by an ASR system, and data representing a metric associated with the audio stream; displaying, via the user device, said text; and via the user device, providing, in user-perceptible form, an indicator of said metric. Another method includes displaying, by a user device, text transcribed from an audio stream by an ASR system; and via the user device, providing, in user-perceptible form, an indicator of a level of background noise of the audio stream. Another method includes receiving data representing an audio stream; converting said data representing an audio stream to text via an ASR system; determining a metric associated with the audio stream; transmitting data representing said text to a user device; and transmitting data representing said metric to the user device.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving first text, wherein the first text was created by performing automatic speech recognition on a first portion of audio data, wherein the audio data includes a plurality of portions; receiving a first value associated with the first text; causing presentation of the first text with a first graphical element indicating the first value; receiving, during presentation of the first text, second text, wherein the second text was created by performing automatic speech recognition on a second portion of the plurality of portions of audio data and wherein the second portion of the audio data is subsequent to the first portion of the audio data; and causing presentation of the second text. 2. The computer-implemented method of claim 1 , wherein the first value indicates a volume level of a portion of the first portion of the audio data corresponding to the first text, a level of background noise in the portion of the first portion of the audio data or a confidence level associated with the first text as determined by an automatic speech recognition engine. 3. The computer-implemented method of claim 1 , wherein the first graphical element comprises font color, font grayscale, font weight, font size or underlining. 4. The computer-implemented method of claim 1 , wherein the first portion of the audio data comprises at least part of a voicemail message. 5. The computer-implemented method of claim 1 , further comprising: receiving a second value associated with the second text; and causing presentation of the second text with a second graphical element indicating the second value, wherein the second text comprises a modified version of the first text. 6. The computer-implemented method of claim 1 , further comprising: receiving a third value associated with a third text, wherein the third text was created by performing automatic speech recognition on the first portion of the audio data; and causing presentation of the third text with a third graphical element indicating the third value. 7. A system comprising: an electronic data store configured to store transcription information; and one or more computing devices in communication with the electronic data store, the one or more computing devices configured to at least: receive first text, wherein the first text was created by performing automatic speech recognition on a first portion of audio data, wherein the audio data includes a plurality of portions; receive a first value associated with the first text; cause presentation of the first text with a first graphical element indicating the first value; receive, during presentation of the first text, second text, wherein the second text was created by performing automatic speech recognition on a second portion of the plurality of portions of audio data and wherein the second portion of the audio data is subsequent to the first portion of the audio data; and cause presentation of the second text. 8. The system of claim 7 , wherein the first graphical element comprises a lighter color if the first value is in a lower range, or a darker color if the first value is in a higher range. 9. The system of claim 8 , wherein the lighter color is gray and the darker color is black. 10. The system of claim 7 , wherein the first portion of the audio data comprises at least part of a voicemail message. 11. The system of claim 7 , wherein the one or more computing devices are further configured to: receive a second value associated with the second text; and cause presentation of the second text with a second graphical element indicating the second value, wherein the second text comprises a modified version of the first text. 12. The system of claim 7 , wherein the one or more computing devices are further configured to: receive a third value associated with third text, wherein the third text was created by performing automatic speech recognition on the first portion of the audio data; and cause presentation of the third text with a third graphical element indicating the third value. 13. The system of claim 12 , wherein the third value indicates a volume level of a portion of the first portion of the audio data corresponding to the third text, a level of background noise in the portion of the first portion of the audio data or a confidence level associated with the third text as determined by an automatic speech recognition engine. 14. A non-transitory computer-readable medium storing instructions that, when executed by a processor on a computing device, cause the computing device to at least: receive first text, wherein the first text was created by performing automatic speech recognition on a first portion of audio data, wherein the audio data includes a plurality of portions; receive a first value associated with the first text; cause presentation of the first text with a first graphical element indicating the first value; receive, during presentation of the first text, second text, wherein the second text was created by performing automatic speech recognition on a second portion of the plurality of portions of audio data and wherein the second portion of the audio data is subsequent to the first portion of the audio data; and cause presentation of the second text. 15. The non-transitory computer-readable medium of claim 14 , wherein the first graphical element indicates a volume level associated with a portion of the first portion of the audio data corresponding to the first text and is presented substantially simultaneously with the presentation of the first text. 16. The non-transitory computer-readable medium of claim 14 , wherein the first graphical element comprises font color, font grayscale, font weight, font size or underlining. 17. The non-transitory computer-readable medium of claim 14 , wherein the first portion of the audio data comprises at least part of a voicemail message. 18. The non-transitory computer-readable medium of claim 14 , further comprising instructions to filter the first text by replacing one or more words in the first text with corresponding numbers or digits. 19. The non-transitory computer-readable medium of claim 14 , wherein the first portion of the audio data is captured at a first device and the first text is presented at a second device. 20. The non-transitory computer-readable medium of claim 14 , further comprising instructions to: receive a second value associated with the second text; and cause presentation of the second text with a second graphical element indicating the second value, wherein the second text comprises a modified version of the first text.

Assignees

Inventors

Classifications

  • Announcement of recognition results · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • G10L15/26Primary

    Speech to text systems (G10L15/08 takes precedence) · CPC title

  • G10L15/01Primary

    Assessment or evaluation of speech recognition systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9583107B2 cover?
A method of providing speech transcription performance indication includes receiving, at a user device data representing text transcribed from an audio stream by an ASR system, and data representing a metric associated with the audio stream; displaying, via the user device, said text; and via the user device, providing, in user-perceptible form, an indicator of said metric. Another method inclu…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).