Interactive robot and human-robot interaction method
US-2019043511-A1 · Feb 7, 2019 · US
US11848014B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11848014-B2 |
| Application number | US-202016924879-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 9, 2020 |
| Priority date | Jul 11, 2019 |
| Publication date | Dec 19, 2023 |
| Grant date | Dec 19, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Human-machine interfaces may capture interactions by humans with robots (e.g., robots with a humanoid appearance), the interactions taking a variety of forms (e.g., audio, visual), and may determine an intent of the humans or meaning of human responses via analysis of the interactions. Intent can be determined based on analysis of aural response, including meaning or semantics and/or tone. Intent can be determined based on analysis of visually detectable responses, including head motions, facial gestures, hand or arm gestures, eye gestures. Responses may be compared for consistency. Humans may be queried to confirm determined intended response.
Opening claim text (preview).
The invention claimed is: 1. A method of operation in a processor-based system to facilitate machine-human interaction between a user and the processor-based system, the method comprising: accessing, by at least one processor of the processor-based system, a first captured response to a first user query; accessing, by at least one processor of the processor-based system, a second captured response to the first user query; extracting, by at least one processor of the processor-based system, a first aural response from the first captured response; extracting, by at least one processor of the processor-based system, a second aural response from the first captured response; determining, by at least one processor of the processor-based system, an intended aural response based on the first aural response and the second aural response; extracting, by at least one processor of the processor-based system, one or more non-aural responses from the second captured response, wherein extracting the one or more non-aural responses comprises applying the second captured response to an input of a first trained machine learning model trained to extract the one or more non-aural responses from the second captured response; determining, by at least one processor of the processor-based system, an intended non-aural response based on the one or more non-aural responses; determining, by at least one processor of the processor-based system, a value of a first consistency parameter for the intended aural response; determining, by at least one processor of the processor-based system, a value of the first consistency parameter for the intended non-aural response; and in response to determining that the value of the first consistency parameter for the intended aural response is different from the value of the first consistency parameter for the intended non-aural response, generating a second user query based at least in part on the intended aural response, the intended non-aural response, and the values of the first consistency parameter. 2. The method of claim 1 wherein the first captured response comprises audio data, and wherein extracting, by at least one processor of the processor-based system, the first aural response from the first captured response comprises: deriving a set of words from the audio data; and determining whether the set of words includes any words that indicate a positive response to the first user query or any words that indicate a negative response to the first user query. 3. The method of claim 2 wherein extracting, by at least one processor of the processor-based system, the second aural response from the first captured response comprises: deriving a tone of voice from the audio data; and determining whether the tone of voice indicates a positive response to the first user query, indicates a negative response to the first user query, or indicates neither a positive nor a negative response to the first user query. 4. The method of claim 3 , further comprising: determining, by at least one processor of the processor-based system, whether the words derived from the audio data and the tone of voice derived from the audio data are consistent with one another, and storing an indicator of their consistency in a long term storage repository. 5. The method of claim 1 wherein the second captured response comprises video data, and wherein the trained machine learning model is trained to perform operations comprising: deriving at least one gesture from the video data; and determining whether the at least one gesture indicates a positive response to the first user query, indicates a negative response to the first user query, or indicates neither a positive nor a negative response to the first user query. 6. The method of claim 5 wherein determining whether the at least one gesture indicates a positive response to the first user query, indicates a negative response to the first user query, or indicates neither a positive nor a negative response to the first user query includes determining whether the at least one gesture appears in a defined set of key gestures. 7. The method of claim 6 wherein determining whether the at least one gesture appears in a defined set of key gestures includes determining whether the at least one gesture appears in the defined set of key gestures which includes: an extension of a thumb upwards, an upward/downward nod of a head, a left/right sweeping of a head, or a movement away of a body or head relative to a viewpoint of a number of images that comprise the video data. 8. The method of claim 7 , further comprising: determining, by at least one processor of the processor-based system, whether two or more of the gestures are consistent with one another, and storing an indicator of their consistency in a long term storage repository. 9. The method of claim 1 further comprising: determining, by at least one processor of the processor-based system, a value of a second consistency parameter for the intended aural response; and determining, by at least one processor of the processor-based system, a value of the second consistency parameter for the intended non-aural response. 10. The method of claim 9 further comprising: in response to determining that the value of the second consistency parameter for the intended aural response is the same as the value of the second consistency parameter for the intended non-aural response, storing an indication of the value of the second consistency parameter along with at least a portion of the first and second captured responses in a long term storage repository. 11. The method of claim 1 further comprising: causing a presentation of the second user query to the user. 12. The method of claim 1 wherein extracting the first aural response or extracting the second aural response from the first captured response comprises: providing the first captured response as input to a trained neural network/reinforced learning system taught to ascertain whether a response to the first user query is a positive response or a negative response to the first user query. 13. The method of claim 1 , further comprising: generating, by at least one processor of the processor-based system, an intended response based on the intended aural response and the intended non-aural response; and causing, by at least one processor of the processor-based system, a confirmation request to be presented to the user to confirm that the intended response determined by the at least one processor-based system matches the response to the first user query intended by the user. 14. The method of claim 1 , further comprising: providing the first and second captured responses or processed data derived therefrom for review by a human. 15. The method of claim 1 , further comprising: presenting the first user query to the user by a humanoid robot. 16. The method of claim 15 , further comprising: receiving the first captured response via one or more microphones at the humanoid robot; and receiving the second captured response via one or more cameras at the humanoid robot. 17. The method of claim 16 determining, by at least one processor that is part of the humanoid robot, an intended response to the first user query based on the intended aural response and the intended non-aural response. 18. The method of claim 16 , further comprising: receiving, by the processor-based system, the first and second captured responses from the humanoid robot, wherein the humanoid robot is remotely located from the processor-b
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Querying · CPC title
using artificial neural networks · CPC title
using position of the lips, movement of the lips or face analysis · CPC title
for estimating an emotional state · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.