Speech recognition system and method
US-2015019221-A1 · Jan 15, 2015 · US
US11423908B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11423908-B2 |
| Application number | US-201916554475-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 28, 2019 |
| Priority date | May 6, 2019 |
| Publication date | Aug 23, 2022 |
| Grant date | Aug 23, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In an exemplary process for interpreting spoken requests, audio input containing a user utterance is received. In accordance with a determination that a text representation of the user utterance does not exactly match any of a plurality of user-defined invocation phrases, the process determines whether a comparison between the text representation and a user-defined invocation phrase of the plurality of user-defined invocation phrases satisfies one or more rule-based conditions. In accordance with a determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions, the text representation and the user-defined invocation phrase is processed using a machine-learned model to determine a score representing a degree of semantic equivalence between the text representation and the user-defined invocation phrase. In accordance with a determination that the score satisfies a threshold condition, a predefined task corresponding to the user-defined invocation phrase is performed.
Opening claim text (preview).
What is claimed is: 1. A method for processing spoken requests, performed by an electronic device having one or more processors and memory, the method comprising: at the electronic device: receiving audio input containing a user utterance; determining, from the audio input, a text representation of the utterance; in accordance with a determination that the text representation does not exactly match any of a plurality of user-defined invocation phrases, determining whether a comparison between the text representation and a user-defined invocation phrase of the plurality of user-defined invocation phrases satisfies one or more rule-based conditions; in accordance with a determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions, processing the text representation and the user-defined invocation phrase using a machine-learned model to determine a score representing a degree of semantic equivalence between the text representation and the user-defined invocation phrase without performing semantic parsing on the text representation; in accordance with a determination that the determined score satisfies a threshold condition, performing a predefined task flow corresponding to the user-defined invocation phrase, wherein each of the plurality of user-defined invocation phrases corresponds to a respective predefined task flow of a plurality of predefined task flows; and in accordance with a determination that the determined score does not satisfy the threshold condition: performing natural language processing on the text representation to determine an actionable intent corresponding to the text representation by performing semantic parsing on the text representation; and performing a task flow corresponding to the actionable intent. 2. The method of claim 1 , further comprising: in accordance with a determination that the text representation exactly matches one of the plurality of user-defined invocation phrases, performing a second predefined task flow corresponding to the one of the plurality of user-defined invocation phrases. 3. The method of claim 1 , further comprising: in accordance with a determination that a comparison between the text representation and each of the plurality of user-defined invocation phrases does not satisfy the one or more rule-based conditions, forgo processing the text representation through the machine-learned model. 4. The method of claim 3 , further comprising: in accordance with a determination that the comparison between the text representation and each of the plurality of user-defined invocation phrases does not satisfy the one or more rule-based conditions: performing natural language processing on the text representation to determine the actionable intent corresponding to the text representation; and performing the task flow corresponding to the actionable intent. 5. The method of claim 1 , further comprising: in accordance with the determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions and in accordance with the determination that a comparison between the text representation and a second user-defined invocation phrase of the plurality of user-defined invocation phrases does not satisfy the one or more rule-based conditions: processing the text representation and the user-defined invocation phrase through the machine-learned model without processing the text representation and the second user-defined invocation phrase through the machine-learned model. 6. The method of claim 1 , wherein the one or more rule-based conditions include a first rule-based condition that the text representation contains a word that is also contained in the user-defined invocation phrase. 7. The method of claim 1 , wherein the one or more rule-based conditions include a second rule-based condition that the text representation contains: the user-defined invocation phrase; and additional text positioned before or after the user-defined invocation phrase. 8. The method of claim 1 , wherein the one or more rule-based conditions include a third rule-based condition that a text normalization of the text representation contains a text normalization of the user-defined invocation phrase. 9. The method of claim 1 , further comprising: in accordance with a determination that a comparison between the text representation and a third user-defined invocation phrase of the plurality of user-defined invocation phrases satisfies the one or more rule-based conditions, processing the text representation and the third user-defined invocation phrase through the machine-learned model to determine a second score representing a degree of semantic equivalence between the text representation and the third user-defined invocation phrase; and performing the predefined task flow in accordance with the determination that the determined score satisfies the threshold condition and in accordance with the determined score being greater than the determined second score. 10. The method of claim 1 , wherein the machine-learned model is configured to: receive, as input, a feature vector of the text representation and a feature vector of the user-defined invocation phrase; and generate, as output, the score. 11. The method of claim 1 , wherein the machine-learned model determines the score representing the degree of semantic equivalence between the text representation and the user-defined invocation phrase without determining an actionable intent corresponding to the text representation. 12. The method of claim 1 , wherein the machine-learned model is trained using a plurality of sets of one or more text representations that correspond to a plurality of sets of one or more user utterances received prior to receiving the audio input. 13. The method of claim 12 , wherein each set of the one or more user utterances of the plurality of sets of the one or more user utterances is associated with a respective user-defined invocation phrase of the plurality of user-defined invocation phrases. 14. The method of claim 13 , wherein each set of the one or more user utterances of the plurality of sets of the one or more user utterances is received within a respective predetermined time period prior to the respective user-defined invocation phrase being invoked. 15. The method of claim 1 , wherein each of the plurality of user-defined invocation phrases is assigned to a respective predefined task flow of the plurality of predefined task flows in accordance with user input received at the electronic device or at one or more other electronic devices, and wherein the electronic device and the one or more other electronic devices are each registered to a same user. 16. The method of claim 1 , wherein performing natural language processing includes determining, from a plurality of domains of an ontology, a domain corresponding to the text representation, and wherein the method further comprises: resolving, based on the text representation, one or more parameters of the determined domain. 17. An electronic device, comprising: one or more processors; memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for: receiving audio input containing a user utterance; determining, from the audio input, a text representation of the utterance; in accordance with a determin
Probabilistic grammars, e.g. word n-grams · CPC title
Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Execution procedure of a spoken command · CPC title
using neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.