Sensor Fusion Model to Enhance Machine Conversational Awareness
US-2019139541-A1 · May 9, 2019 · US
US12266354B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12266354-B2 |
| Application number | US-202117500518-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 13, 2021 |
| Priority date | Jul 15, 2021 |
| Publication date | Apr 1, 2025 |
| Grant date | Apr 1, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and processes for speech interpretation based on environmental context are provided. For example, a user gaze direction is detected, and a speech input is received from a first user of the electronic device. In accordance with a determination that the user gaze is directed at a digital assistant object, the speech input is processed by the digital assistant. In accordance with a determination that the user gaze is not directed at a digital assistant object, contextual information associated with the electronic device is obtained, wherein the contextual information includes speech from a second user. Determination is made whether the speech input is directed to a digital assistant of the electronic device. In accordance with a determination that the speech input is directed to a digital assistant of the electronic device, the speech input is processed by the digital assistant.
Opening claim text (preview).
What is claimed is: 1. An electronic device, comprising: one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions, which when executed, cause the electronic device to: detect a user gaze direction, wherein the user gaze direction is associated with a first user of the electronic device; receive, from a first user of the electronic device, a first speech input including first content; in accordance with a determination that the user gaze direction associated with the first user is not directed at a displayed digital assistant object: obtain contextual information associated with the electronic device, wherein the contextual information includes a second speech input from a second user, wherein the second speech input includes second content; adjust a confidence value based on the first content and the second content; determine, based on the contextual information and the confidence value, whether the first speech input is directed to the digital assistant of the electronic device; and in accordance with a determination that the first speech input is directed to the digital assistant of the electronic device: process, by the digital assistant, the first speech input. 2. The electronic device of claim 1 , wherein the instructions cause the electronic device to: detect a beginning of the second speech input from the second user; and in response to detecting the beginning of the second speech input from the second user, store, in the memory, the second speech input from the second user. 3. The electronic device of claim 2 , wherein the instructions cause the electronic device to: in accordance with a determination that the user gaze direction is directed at a displayed digital assistant object, remove, from the memory, the second speech input from the second user. 4. The electronic device of claim 2 , wherein the instructions cause the electronic device to: in accordance with a determination that the first speech input is directed to a digital assistant of the electronic device, remove, from the memory, the second speech input from the second user. 5. The electronic device of claim 2 , wherein the instructions cause the electronic device to: identify a first time associated with the storing of the second speech input from the second user; and in accordance with a determination that a current time is not within a threshold time duration from the first time, remove, from the memory, the second speech input from the second user. 6. The electronic device of claim 1 , wherein the instructions cause the electronic device to: detect, at a first time, motion corresponding to the second user; identify a second time associated with a beginning of the second speech input from the second user; and in accordance with a determination that the first time is not within a threshold duration of time from the second time, adjust a confidence value associated with the first speech input. 7. The electronic device of claim 6 , wherein the detected motion corresponds to one of movement of the second user and movement of an avatar associated with second user. 8. The electronic device of claim 1 , wherein determining, based on the contextual information, whether the first speech input is directed to a digital assistant of the electronic device comprises: obtaining a confidence value corresponding to a confidence that the first speech input is directed to the digital assistant of the electronic device; and in accordance with a determination that the confidence value exceeds a threshold confidence value, determining that the first speech input is directed to the digital assistant. 9. The electronic device of claim 1 , wherein the instructions cause the electronic device to: determine a direction associated with the second speech input from the second user; and in accordance with a determination that the direction associated with the second speech input from the second user corresponds to the user gaze direction associated with the first user, adjust a confidence value associated with the first speech input. 10. The electronic device of claim 1 , wherein the instructions cause the electronic device to: identify a time associated with the second speech input from the second user; determine a direction associated with the second speech input from the second user; and obtain second contextual information within a time range from the identified time, wherein the second contextual information includes user gaze information within the time range. 11. The electronic device of claim 10 , wherein the instructions cause the electronic device to: in accordance with a determination that the second contextual information includes a user gaze direction corresponding to the direction associated with the second speech input from the second user: adjust a confidence value associated with the first speech input. 12. The electronic device of claim 1 , wherein the instructions cause the electronic device to: identify a first time associated with the first speech input; identify a second time associated with the second speech input from the second user; and in accordance with a determination that the first time and the second time are within a predetermined time range, adjust a confidence value associated with the first speech input. 13. The electronic device of claim 1 , wherein the instructions cause the electronic device to: determine a first word included within the first speech input; determine a second word included within the second speech input from the second user; and in accordance with a determination that the first word corresponds to the second word, adjust a confidence value associated with the first speech input. 14. The electronic device of claim 1 , wherein the instructions cause the electronic device to: obtain a first semantic representation of the first speech input; obtain a second semantic representation of the second speech input from the second user; and in accordance with a determination that the first semantic representation corresponds to the second semantic representation, adjust a confidence value associated with the first speech input. 15. The electronic device of claim 1 , wherein the instructions cause the electronic device to: determine content associated with the second speech input from the second user; and in accordance with a determination that the determined content corresponds to predefined content, adjust a confidence value associated with the first speech input. 16. The electronic device of claim 15 , wherein the predefined content includes at least one of an interrogatory sentence, a name associated with the first user, and a reference to a parameter associated with a profile corresponding to the first user. 17. The electronic device of claim 1 , wherein at least one of the first content and the second content includes a word. 18. A non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of an electronic device, the one or more programs including instructions, which when executed, cause the electronic device to: detect a user gaze direction, wherein the user gaze direction is associated with a first user of the electronic device; receive, from a first user of the electronic device, a first speech input including first content; in accordance with a determination that the user gaze direction associated with the first us
using position of the lips, movement of the lips or face analysis · CPC title
Eye tracking input arrangements (G06F3/015 takes precedence) · CPC title
Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.