Human-machine interfaces and methods which determine intended responses by humans

US11848014B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11848014-B2
Application numberUS-202016924879-A
CountryUS
Kind codeB2
Filing dateJul 9, 2020
Priority dateJul 11, 2019
Publication dateDec 19, 2023
Grant dateDec 19, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Human-machine interfaces may capture interactions by humans with robots (e.g., robots with a humanoid appearance), the interactions taking a variety of forms (e.g., audio, visual), and may determine an intent of the humans or meaning of human responses via analysis of the interactions. Intent can be determined based on analysis of aural response, including meaning or semantics and/or tone. Intent can be determined based on analysis of visually detectable responses, including head motions, facial gestures, hand or arm gestures, eye gestures. Responses may be compared for consistency. Humans may be queried to confirm determined intended response.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of operation in a processor-based system to facilitate machine-human interaction between a user and the processor-based system, the method comprising: accessing, by at least one processor of the processor-based system, a first captured response to a first user query; accessing, by at least one processor of the processor-based system, a second captured response to the first user query; extracting, by at least one processor of the processor-based system, a first aural response from the first captured response; extracting, by at least one processor of the processor-based system, a second aural response from the first captured response; determining, by at least one processor of the processor-based system, an intended aural response based on the first aural response and the second aural response; extracting, by at least one processor of the processor-based system, one or more non-aural responses from the second captured response, wherein extracting the one or more non-aural responses comprises applying the second captured response to an input of a first trained machine learning model trained to extract the one or more non-aural responses from the second captured response; determining, by at least one processor of the processor-based system, an intended non-aural response based on the one or more non-aural responses; determining, by at least one processor of the processor-based system, a value of a first consistency parameter for the intended aural response; determining, by at least one processor of the processor-based system, a value of the first consistency parameter for the intended non-aural response; and in response to determining that the value of the first consistency parameter for the intended aural response is different from the value of the first consistency parameter for the intended non-aural response, generating a second user query based at least in part on the intended aural response, the intended non-aural response, and the values of the first consistency parameter. 2. The method of claim 1 wherein the first captured response comprises audio data, and wherein extracting, by at least one processor of the processor-based system, the first aural response from the first captured response comprises: deriving a set of words from the audio data; and determining whether the set of words includes any words that indicate a positive response to the first user query or any words that indicate a negative response to the first user query. 3. The method of claim 2 wherein extracting, by at least one processor of the processor-based system, the second aural response from the first captured response comprises: deriving a tone of voice from the audio data; and determining whether the tone of voice indicates a positive response to the first user query, indicates a negative response to the first user query, or indicates neither a positive nor a negative response to the first user query. 4. The method of claim 3 , further comprising: determining, by at least one processor of the processor-based system, whether the words derived from the audio data and the tone of voice derived from the audio data are consistent with one another, and storing an indicator of their consistency in a long term storage repository. 5. The method of claim 1 wherein the second captured response comprises video data, and wherein the trained machine learning model is trained to perform operations comprising: deriving at least one gesture from the video data; and determining whether the at least one gesture indicates a positive response to the first user query, indicates a negative response to the first user query, or indicates neither a positive nor a negative response to the first user query. 6. The method of claim 5 wherein determining whether the at least one gesture indicates a positive response to the first user query, indicates a negative response to the first user query, or indicates neither a positive nor a negative response to the first user query includes determining whether the at least one gesture appears in a defined set of key gestures. 7. The method of claim 6 wherein determining whether the at least one gesture appears in a defined set of key gestures includes determining whether the at least one gesture appears in the defined set of key gestures which includes: an extension of a thumb upwards, an upward/downward nod of a head, a left/right sweeping of a head, or a movement away of a body or head relative to a viewpoint of a number of images that comprise the video data. 8. The method of claim 7 , further comprising: determining, by at least one processor of the processor-based system, whether two or more of the gestures are consistent with one another, and storing an indicator of their consistency in a long term storage repository. 9. The method of claim 1 further comprising: determining, by at least one processor of the processor-based system, a value of a second consistency parameter for the intended aural response; and determining, by at least one processor of the processor-based system, a value of the second consistency parameter for the intended non-aural response. 10. The method of claim 9 further comprising: in response to determining that the value of the second consistency parameter for the intended aural response is the same as the value of the second consistency parameter for the intended non-aural response, storing an indication of the value of the second consistency parameter along with at least a portion of the first and second captured responses in a long term storage repository. 11. The method of claim 1 further comprising: causing a presentation of the second user query to the user. 12. The method of claim 1 wherein extracting the first aural response or extracting the second aural response from the first captured response comprises: providing the first captured response as input to a trained neural network/reinforced learning system taught to ascertain whether a response to the first user query is a positive response or a negative response to the first user query. 13. The method of claim 1 , further comprising: generating, by at least one processor of the processor-based system, an intended response based on the intended aural response and the intended non-aural response; and causing, by at least one processor of the processor-based system, a confirmation request to be presented to the user to confirm that the intended response determined by the at least one processor-based system matches the response to the first user query intended by the user. 14. The method of claim 1 , further comprising: providing the first and second captured responses or processed data derived therefrom for review by a human. 15. The method of claim 1 , further comprising: presenting the first user query to the user by a humanoid robot. 16. The method of claim 15 , further comprising: receiving the first captured response via one or more microphones at the humanoid robot; and receiving the second captured response via one or more cameras at the humanoid robot. 17. The method of claim 16 determining, by at least one processor that is part of the humanoid robot, an intended response to the first user query based on the intended aural response and the intended non-aural response. 18. The method of claim 16 , further comprising: receiving, by the processor-based system, the first and second captured responses from the humanoid robot, wherein the humanoid robot is remotely located from the processor-b

Assignees

Inventors

Classifications

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Querying · CPC title

  • using artificial neural networks · CPC title

  • using position of the lips, movement of the lips or face analysis · CPC title

  • for estimating an emotional state · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11848014B2 cover?
Human-machine interfaces may capture interactions by humans with robots (e.g., robots with a humanoid appearance), the interactions taking a variety of forms (e.g., audio, visual), and may determine an intent of the humans or meaning of human responses via analysis of the interactions. Intent can be determined based on analysis of aural response, including meaning or semantics and/or tone. Inte…
Who is the assignee on this patent?
Sanctuary Cognitive Systems Corp
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 19 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).