Multi-modal user interface

US11348581B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11348581-B2
Application numberUS-201916685946-A
CountryUS
Kind codeB2
Filing dateNov 15, 2019
Priority dateJul 12, 2019
Publication dateMay 31, 2022
Grant dateMay 31, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device for multi-modal user input includes a processor configured to process first data received from a first input device. The first data indicates a first input from a user based on a first input mode. The first input corresponds to a command. The processor is configured to send a feedback message to an output device based on processing the first data. The feedback message instructs the user to provide, based on a second input mode that is different from the first input mode, a second input that identifies a command associated with the first input. The processor is configured to receive second data from a second input device, the second data indicating the second input, and to update a mapping to associate the first input to the command identified by the second input.

First claim

Opening claim text (preview).

What is claimed is: 1. A device for multi-modal user input, the device comprising: one or more processors configured to: process first data received from a first input device, the first data indicating a first input from a user based on a first input mode; send a feedback message to an output device based on processing the first data, wherein the feedback message instructs the user to provide, based on a second input mode that is different from the first input mode, a second input that identifies a command associated with the first input; receive second data from a second input device, the second data indicating the second input; identify the command based on the second data; and update a mapping to associate the first input to the command to enable activation of the command in response to subsequent receipt of the first data via the first input device. 2. The device of claim 1 , wherein the first input mode is one of a speech mode, a gesture mode, or a video mode, and wherein the second input mode is a different one of the speech mode, the gesture mode, or the video mode. 3. The device of claim 1 , wherein the feedback message instructs the user to provide the second input to disambiguate the first input. 4. The device of claim 3 , wherein the one or more processors are further configured to send the feedback message in response to a confidence level associated with recognition processing of the first input failing to satisfy a confidence threshold. 5. The device of claim 1 , wherein the updated mapping associates a combination of the first input and the second input with the command. 6. The device of claim 1 , wherein the one or more processors include a multi-modal recognition engine, the multi-modal recognition engine including: a fusion embedding network configured to combine outputs of a first embedding network associated with the first input mode and a second embedding network associated with the second input mode to generate combined embedding vectors; and a classifier configured to map the combined embedding vectors to particular commands. 7. The device of claim 6 , further comprising a memory configured to store: first embedding network data and first weight data corresponding to the user; and second embedding network data and second weight data corresponding to a second user, the first embedding network data differing from the second embedding network data based on input command differences between the user and the second user, and the first weight data differing from the second weight data based on input mode reliability differences between the user and the second user. 8. The device of claim 1 , wherein the first input mode corresponds to a video mode, and wherein the one or more processors are configured to send the feedback message in response to an ambient light metric having a value below a lighting threshold. 9. The device of claim 1 , wherein the first input mode corresponds to a speech mode, and wherein the one or more processors are configured to send the feedback message in response to a noise metric having a value exceeding a noise threshold. 10. The device of claim 1 , further comprising a display configured to represent a graphical user interface. 11. The device of claim 1 , further comprising one or more microphones configured to capture audio input that includes one or more keywords or voice commands. 12. The device of claim 1 , further comprising one or more cameras configured to capture video input that includes one or more gestures or visual commands. 13. The device of claim 1 , further comprising one or more antennas configured to receive data indicative of a gesture input. 14. The device of claim 1 , further comprising one or more loudspeakers configured to render or direct the feedback message to the user. 15. The device of claim 1 , wherein the user includes a robot or other electronic device. 16. The device of claim 1 , wherein the first input device and the output device are incorporated into a virtual reality headset or augmented reality headset. 17. The device of claim 1 , wherein the first input device and the output device are incorporated into a vehicle. 18. A method for multi-modal user input, the method comprising: processing, at one or more processors of a device, first data received from a first input device, the first data indicating a first input from a user based on a first input mode; sending, from the one or more processors, a feedback message to an output device based on processing the first data, wherein the feedback message instructs the user to provide, based on a second input mode that is different from the first input mode, a second input that identifies a command associated with the first input; receiving, at the one or more processors, second data from a second input device, the second data indicating the second input; identifying the command based on the second data; and updating, at the one or more processors, a mapping to associate the first input to the command to enable activation of the command in response to subsequent receipt of the first data via the first input device. 19. The method of claim 18 , wherein the first input mode is one of a speech mode, a gesture mode, or a video mode, and wherein the second input mode is a different one of the speech mode, the gesture mode, or the video mode. 20. The method of claim 18 , wherein the feedback message instructs the user to provide the second input to disambiguate the first input. 21. The method of claim 20 , wherein the feedback message is sent in response to a confidence level associated with recognition processing of the first input failing to satisfy a confidence threshold. 22. The method of claim 18 , wherein the updated mapping associates a combination of the first input and the second input with the command. 23. The method of claim 18 , wherein updating the mapping includes at least one of: updating embedding network data associated with the user; or updating weight data associated with the user. 24. The method of claim 18 , wherein the first input mode corresponds to a video mode, and wherein the feedback message is sent in response to an ambient light metric having a value below a lighting threshold. 25. The method of claim 18 , wherein the first input mode corresponds to a speech mode, and wherein the feedback message is sent in response to a noise metric having a value exceeding a noise threshold. 26. An apparatus for multi-modal user input, the apparatus comprising: means for processing first data received from a first input device, the first data indicating a first input from a user based on a first input mode; means for sending a feedback message to an output device based on processing the first data, wherein the feedback message instructs the user to provide, based on a second input mode that is different from the first input mode, a second input that identifies a command associated with the first input; means for receiving second data from a second input device, the second data indicating the second input; means for identifying the command based on the second data; and means for updating a mapping to associate the first input to the command to enable activation of the command in response to subsequent receipt of the first data via the first input device. 27. The apparatus of claim 26 , wherein the updated mapping associates

Assignees

Inventors

Classifications

  • Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Execution arrangements for user interfaces · CPC title

  • for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11348581B2 cover?
A device for multi-modal user input includes a processor configured to process first data received from a first input device. The first data indicates a first input from a user based on a first input mode. The first input corresponds to a command. The processor is configured to send a feedback message to an output device based on processing the first data. The feedback message instructs the use…
Who is the assignee on this patent?
Qualcomm Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 31 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).