Speech interpretation based on environmental context

US12266354B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12266354-B2
Application numberUS-202117500518-A
CountryUS
Kind codeB2
Filing dateOct 13, 2021
Priority dateJul 15, 2021
Publication dateApr 1, 2025
Grant dateApr 1, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and processes for speech interpretation based on environmental context are provided. For example, a user gaze direction is detected, and a speech input is received from a first user of the electronic device. In accordance with a determination that the user gaze is directed at a digital assistant object, the speech input is processed by the digital assistant. In accordance with a determination that the user gaze is not directed at a digital assistant object, contextual information associated with the electronic device is obtained, wherein the contextual information includes speech from a second user. Determination is made whether the speech input is directed to a digital assistant of the electronic device. In accordance with a determination that the speech input is directed to a digital assistant of the electronic device, the speech input is processed by the digital assistant.

First claim

Opening claim text (preview).

What is claimed is: 1. An electronic device, comprising: one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions, which when executed, cause the electronic device to: detect a user gaze direction, wherein the user gaze direction is associated with a first user of the electronic device; receive, from a first user of the electronic device, a first speech input including first content; in accordance with a determination that the user gaze direction associated with the first user is not directed at a displayed digital assistant object: obtain contextual information associated with the electronic device, wherein the contextual information includes a second speech input from a second user, wherein the second speech input includes second content; adjust a confidence value based on the first content and the second content; determine, based on the contextual information and the confidence value, whether the first speech input is directed to the digital assistant of the electronic device; and in accordance with a determination that the first speech input is directed to the digital assistant of the electronic device: process, by the digital assistant, the first speech input. 2. The electronic device of claim 1 , wherein the instructions cause the electronic device to: detect a beginning of the second speech input from the second user; and in response to detecting the beginning of the second speech input from the second user, store, in the memory, the second speech input from the second user. 3. The electronic device of claim 2 , wherein the instructions cause the electronic device to: in accordance with a determination that the user gaze direction is directed at a displayed digital assistant object, remove, from the memory, the second speech input from the second user. 4. The electronic device of claim 2 , wherein the instructions cause the electronic device to: in accordance with a determination that the first speech input is directed to a digital assistant of the electronic device, remove, from the memory, the second speech input from the second user. 5. The electronic device of claim 2 , wherein the instructions cause the electronic device to: identify a first time associated with the storing of the second speech input from the second user; and in accordance with a determination that a current time is not within a threshold time duration from the first time, remove, from the memory, the second speech input from the second user. 6. The electronic device of claim 1 , wherein the instructions cause the electronic device to: detect, at a first time, motion corresponding to the second user; identify a second time associated with a beginning of the second speech input from the second user; and in accordance with a determination that the first time is not within a threshold duration of time from the second time, adjust a confidence value associated with the first speech input. 7. The electronic device of claim 6 , wherein the detected motion corresponds to one of movement of the second user and movement of an avatar associated with second user. 8. The electronic device of claim 1 , wherein determining, based on the contextual information, whether the first speech input is directed to a digital assistant of the electronic device comprises: obtaining a confidence value corresponding to a confidence that the first speech input is directed to the digital assistant of the electronic device; and in accordance with a determination that the confidence value exceeds a threshold confidence value, determining that the first speech input is directed to the digital assistant. 9. The electronic device of claim 1 , wherein the instructions cause the electronic device to: determine a direction associated with the second speech input from the second user; and in accordance with a determination that the direction associated with the second speech input from the second user corresponds to the user gaze direction associated with the first user, adjust a confidence value associated with the first speech input. 10. The electronic device of claim 1 , wherein the instructions cause the electronic device to: identify a time associated with the second speech input from the second user; determine a direction associated with the second speech input from the second user; and obtain second contextual information within a time range from the identified time, wherein the second contextual information includes user gaze information within the time range. 11. The electronic device of claim 10 , wherein the instructions cause the electronic device to: in accordance with a determination that the second contextual information includes a user gaze direction corresponding to the direction associated with the second speech input from the second user: adjust a confidence value associated with the first speech input. 12. The electronic device of claim 1 , wherein the instructions cause the electronic device to: identify a first time associated with the first speech input; identify a second time associated with the second speech input from the second user; and in accordance with a determination that the first time and the second time are within a predetermined time range, adjust a confidence value associated with the first speech input. 13. The electronic device of claim 1 , wherein the instructions cause the electronic device to: determine a first word included within the first speech input; determine a second word included within the second speech input from the second user; and in accordance with a determination that the first word corresponds to the second word, adjust a confidence value associated with the first speech input. 14. The electronic device of claim 1 , wherein the instructions cause the electronic device to: obtain a first semantic representation of the first speech input; obtain a second semantic representation of the second speech input from the second user; and in accordance with a determination that the first semantic representation corresponds to the second semantic representation, adjust a confidence value associated with the first speech input. 15. The electronic device of claim 1 , wherein the instructions cause the electronic device to: determine content associated with the second speech input from the second user; and in accordance with a determination that the determined content corresponds to predefined content, adjust a confidence value associated with the first speech input. 16. The electronic device of claim 15 , wherein the predefined content includes at least one of an interrogatory sentence, a name associated with the first user, and a reference to a parameter associated with a profile corresponding to the first user. 17. The electronic device of claim 1 , wherein at least one of the first content and the second content includes a word. 18. A non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of an electronic device, the one or more programs including instructions, which when executed, cause the electronic device to: detect a user gaze direction, wherein the user gaze direction is associated with a first user of the electronic device; receive, from a first user of the electronic device, a first speech input including first content; in accordance with a determination that the user gaze direction associated with the first us

Assignees

Inventors

Classifications

  • using position of the lips, movement of the lips or face analysis · CPC title

  • Eye tracking input arrangements (G06F3/015 takes precedence) · CPC title

  • Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

  • G06F3/167Primary

    Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12266354B2 cover?
Systems and processes for speech interpretation based on environmental context are provided. For example, a user gaze direction is detected, and a speech input is received from a first user of the electronic device. In accordance with a determination that the user gaze is directed at a digital assistant object, the speech input is processed by the digital assistant. In accordance with a determi…
Who is the assignee on this patent?
Apple Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/1815. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).