Interpreting spoken requests

US11423908B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11423908-B2
Application numberUS-201916554475-A
CountryUS
Kind codeB2
Filing dateAug 28, 2019
Priority dateMay 6, 2019
Publication dateAug 23, 2022
Grant dateAug 23, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In an exemplary process for interpreting spoken requests, audio input containing a user utterance is received. In accordance with a determination that a text representation of the user utterance does not exactly match any of a plurality of user-defined invocation phrases, the process determines whether a comparison between the text representation and a user-defined invocation phrase of the plurality of user-defined invocation phrases satisfies one or more rule-based conditions. In accordance with a determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions, the text representation and the user-defined invocation phrase is processed using a machine-learned model to determine a score representing a degree of semantic equivalence between the text representation and the user-defined invocation phrase. In accordance with a determination that the score satisfies a threshold condition, a predefined task corresponding to the user-defined invocation phrase is performed.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for processing spoken requests, performed by an electronic device having one or more processors and memory, the method comprising: at the electronic device: receiving audio input containing a user utterance; determining, from the audio input, a text representation of the utterance; in accordance with a determination that the text representation does not exactly match any of a plurality of user-defined invocation phrases, determining whether a comparison between the text representation and a user-defined invocation phrase of the plurality of user-defined invocation phrases satisfies one or more rule-based conditions; in accordance with a determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions, processing the text representation and the user-defined invocation phrase using a machine-learned model to determine a score representing a degree of semantic equivalence between the text representation and the user-defined invocation phrase without performing semantic parsing on the text representation; in accordance with a determination that the determined score satisfies a threshold condition, performing a predefined task flow corresponding to the user-defined invocation phrase, wherein each of the plurality of user-defined invocation phrases corresponds to a respective predefined task flow of a plurality of predefined task flows; and in accordance with a determination that the determined score does not satisfy the threshold condition: performing natural language processing on the text representation to determine an actionable intent corresponding to the text representation by performing semantic parsing on the text representation; and performing a task flow corresponding to the actionable intent. 2. The method of claim 1 , further comprising: in accordance with a determination that the text representation exactly matches one of the plurality of user-defined invocation phrases, performing a second predefined task flow corresponding to the one of the plurality of user-defined invocation phrases. 3. The method of claim 1 , further comprising: in accordance with a determination that a comparison between the text representation and each of the plurality of user-defined invocation phrases does not satisfy the one or more rule-based conditions, forgo processing the text representation through the machine-learned model. 4. The method of claim 3 , further comprising: in accordance with a determination that the comparison between the text representation and each of the plurality of user-defined invocation phrases does not satisfy the one or more rule-based conditions: performing natural language processing on the text representation to determine the actionable intent corresponding to the text representation; and performing the task flow corresponding to the actionable intent. 5. The method of claim 1 , further comprising: in accordance with the determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions and in accordance with the determination that a comparison between the text representation and a second user-defined invocation phrase of the plurality of user-defined invocation phrases does not satisfy the one or more rule-based conditions: processing the text representation and the user-defined invocation phrase through the machine-learned model without processing the text representation and the second user-defined invocation phrase through the machine-learned model. 6. The method of claim 1 , wherein the one or more rule-based conditions include a first rule-based condition that the text representation contains a word that is also contained in the user-defined invocation phrase. 7. The method of claim 1 , wherein the one or more rule-based conditions include a second rule-based condition that the text representation contains: the user-defined invocation phrase; and additional text positioned before or after the user-defined invocation phrase. 8. The method of claim 1 , wherein the one or more rule-based conditions include a third rule-based condition that a text normalization of the text representation contains a text normalization of the user-defined invocation phrase. 9. The method of claim 1 , further comprising: in accordance with a determination that a comparison between the text representation and a third user-defined invocation phrase of the plurality of user-defined invocation phrases satisfies the one or more rule-based conditions, processing the text representation and the third user-defined invocation phrase through the machine-learned model to determine a second score representing a degree of semantic equivalence between the text representation and the third user-defined invocation phrase; and performing the predefined task flow in accordance with the determination that the determined score satisfies the threshold condition and in accordance with the determined score being greater than the determined second score. 10. The method of claim 1 , wherein the machine-learned model is configured to: receive, as input, a feature vector of the text representation and a feature vector of the user-defined invocation phrase; and generate, as output, the score. 11. The method of claim 1 , wherein the machine-learned model determines the score representing the degree of semantic equivalence between the text representation and the user-defined invocation phrase without determining an actionable intent corresponding to the text representation. 12. The method of claim 1 , wherein the machine-learned model is trained using a plurality of sets of one or more text representations that correspond to a plurality of sets of one or more user utterances received prior to receiving the audio input. 13. The method of claim 12 , wherein each set of the one or more user utterances of the plurality of sets of the one or more user utterances is associated with a respective user-defined invocation phrase of the plurality of user-defined invocation phrases. 14. The method of claim 13 , wherein each set of the one or more user utterances of the plurality of sets of the one or more user utterances is received within a respective predetermined time period prior to the respective user-defined invocation phrase being invoked. 15. The method of claim 1 , wherein each of the plurality of user-defined invocation phrases is assigned to a respective predefined task flow of the plurality of predefined task flows in accordance with user input received at the electronic device or at one or more other electronic devices, and wherein the electronic device and the one or more other electronic devices are each registered to a same user. 16. The method of claim 1 , wherein performing natural language processing includes determining, from a plurality of domains of an ontology, a domain corresponding to the text representation, and wherein the method further comprises: resolving, based on the text representation, one or more parameters of the determined domain. 17. An electronic device, comprising: one or more processors; memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for: receiving audio input containing a user utterance; determining, from the audio input, a text representation of the utterance; in accordance with a determin

Assignees

Inventors

Classifications

  • Probabilistic grammars, e.g. word n-grams · CPC title

  • Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Execution procedure of a spoken command · CPC title

  • using neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11423908B2 cover?
In an exemplary process for interpreting spoken requests, audio input containing a user utterance is received. In accordance with a determination that a text representation of the user utterance does not exactly match any of a plurality of user-defined invocation phrases, the process determines whether a comparison between the text representation and a user-defined invocation phrase of the plur…
Who is the assignee on this patent?
Apple Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 23 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).