Automatic speech recognition based on user feedback

US10446141B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10446141-B2
Application numberUS-201514591754-A
CountryUS
Kind codeB2
Filing dateJan 7, 2015
Priority dateAug 28, 2014
Publication dateOct 15, 2019
Grant dateOct 15, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and processes for processing speech in a digital assistant are provided. In one example process, a first speech input can be received from a user. The first speech input can be processed using a first automatic speech recognition system to produce a first recognition result. An input indicative of a potential error in the first recognition result can be received. The input can be used to improve the first recognition result. For example, the input can include a second speech input that is a repetition of the first speech input. The second speech input can be processed using a second automatic speech recognition system to produce a second recognition result.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for processing speech in a digital assistant, the method comprising: at an electronic device with a processor and memory storing one or more programs for execution by the processor: receiving, from a network interface, a first speech input; processing the first speech input using a first automatic speech recognition system to produce a first speech recognition result; performing a first task corresponding to a first user intent determined from the first speech recognition result; upon performing the first task, receiving, from the network interface, an input representing a rejection of the first task; in response to receiving the input, providing a prompt seeking a repetition of at least a portion of the first speech input; receiving, from the network interface, a second speech input; in accordance with the received input representing a rejection of the first task, processing the second speech input using a second automatic speech recognition system to produce a second speech recognition result, wherein the first automatic speech recognition system includes one or more speech recognition models, and the second automatic speech recognition system includes one or more speech recognition models that are different from the one or more speech recognition models of the first automatic speech recognition system; determining a combined speech recognition result based on the first speech recognition result and the second speech recognition result; and performing a second task corresponding to a second user intent determined from the combined speech recognition result. 2. The method of claim 1 , wherein the input is a speech input that includes a predetermined utterance. 3. The method of claim 1 , wherein the input comprises a selection of an affordance. 4. The method of claim 1 , wherein at least a portion of text of the first speech recognition result is displayed on the electronic device, and wherein the input comprises a selection of at least a portion of the displayed text. 5. The method of claim 1 , further comprising: in accordance with receiving the input, identifying a portion of the first speech input corresponding to a potential error in the first speech recognition result. 6. The method of claim 5 , wherein processing the first speech input using the first automatic speech recognition system includes determining a confidence measure of each word in a text of the first speech recognition result, and wherein the portion of the first speech input associated with the potential error is identified based on the confidence measure of each word in the text. 7. The method of claim 5 , wherein the prompt includes a request to repeat the identified portion of the first speech input corresponding to the potential error. 8. The method of claim 1 , wherein the combined result is determined by performing automatic speech recognition system combination using the first speech recognition result and the second speech recognition result. 9. The method of claim 1 , wherein the second automatic speech recognition system is associated with a greater computation cost than the first automatic speech recognition system in order to achieve greater accuracy. 10. A method for processing speech in a digital assistant, the method comprising: at an electronic device with a processor and memory storing one or more programs for execution by the processor: receiving an input containing user speech; processing the input using a first automatic speech recognition system to produce a first speech recognition result; performing a first task corresponding to a first user intent determined from the first speech recognition result; upon performing the first task, receiving a second input representing a rejection of the first task; in response to receiving the second input, processing at least a portion of the audio signal using a second automatic speech recognition system to produce a second speech recognition result, wherein the first automatic speech recognition system includes one or more speech recognition models, and the second automatic speech recognition system includes one or more speech recognition models that are different from the one or more speech recognition models of the first automatic speech recognition system; determining a combined speech recognition result based on the first speech recognition result and the second speech recognition result; and performing a second task corresponding to a second user intent determined from the combined speech recognition result. 11. The method of claim 10 , wherein an error rate of the second automatic speech recognition system is lower than an error rate of the first automatic speech recognition system. 12. The method of claim 10 , wherein a latency of the second automatic speech recognition system is greater than a latency of the first automatic speech recognition system. 13. The method of claim 10 , wherein the combined result is determined by performing automatic speech recognition system combination using the first speech recognition result and the second speech recognition result. 14. The method of claim 13 , wherein performing automatic speech recognition system combination comprises implementing at least one of recognition output voting error reduction, cross-adaptation, confusion network combination, and lattice combination. 15. A non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of an electronic device, the one or more programs comprising instructions for: receiving, from a network interface, a first speech input; processing the first speech input using a first automatic speech recognition system to produce a first speech recognition result; performing a first task corresponding to a first user intent determined from the first speech recognition result; receiving, from the network interface, a second speech input; determining whether a phonemic transcription of the second speech input has an error rate that is less than a predetermined value when compared against a phonemic transcription of a corresponding portion of the first speech input; in response to determining that the phonemic transcription of the second speech input has an error rate that is less than the predetermined value when compared against the phonemic transcription of a corresponding portion of the first speech input, processing the second speech input using a second automatic speech recognition system to produce a second speech recognition result; and performing a second task corresponding to a second user intent determined based on the second speech recognition result. 16. The non-transitory computer-readable storage medium of claim 15 , wherein the first automatic speech recognition system includes one or more speech recognition models, and the second automatic speech recognition system includes one or more speech recognition models that are different from the one or more speech recognition models of the first automatic speech recognition system. 17. The non-transitory computer-readable storage medium of claim 15 , wherein the one or more programs further including instructions for: determining a combined speech recognition result based on the first speech recognition result and the second speech recognition result, wherein the second user intent is determined further based on the combined speech recognition result. 18. The non-transitory computer-readable storage medium of claim 17 , wherein the combined speech recognition

Assignees

Inventors

Classifications

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Assessment or evaluation of speech recognition systems · CPC title

  • Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title

  • Phonemes, fenemes or fenones being the recognition units · CPC title

  • Feature extraction for speech recognition; Selection of recognition unit · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10446141B2 cover?
Systems and processes for processing speech in a digital assistant are provided. In one example process, a first speech input can be received from a user. The first speech input can be processed using a first automatic speech recognition system to produce a first recognition result. An input indicative of a potential error in the first recognition result can be received. The input can be used t…
Who is the assignee on this patent?
Apple Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 15 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).