What technology area does this patent fall under?

Primary CPC classification G10L15/22. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 31 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Multi-modal user interface

US11348581B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11348581-B2
Application number	US-201916685946-A
Country	US
Kind code	B2
Filing date	Nov 15, 2019
Priority date	Jul 12, 2019
Publication date	May 31, 2022
Grant date	May 31, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device for multi-modal user input includes a processor configured to process first data received from a first input device. The first data indicates a first input from a user based on a first input mode. The first input corresponds to a command. The processor is configured to send a feedback message to an output device based on processing the first data. The feedback message instructs the user to provide, based on a second input mode that is different from the first input mode, a second input that identifies a command associated with the first input. The processor is configured to receive second data from a second input device, the second data indicating the second input, and to update a mapping to associate the first input to the command identified by the second input.

First claim

Opening claim text (preview).

What is claimed is: 1. A device for multi-modal user input, the device comprising: one or more processors configured to: process first data received from a first input device, the first data indicating a first input from a user based on a first input mode; send a feedback message to an output device based on processing the first data, wherein the feedback message instructs the user to provide, based on a second input mode that is different from the first input mode, a second input that identifies a command associated with the first input; receive second data from a second input device, the second data indicating the second input; identify the command based on the second data; and update a mapping to associate the first input to the command to enable activation of the command in response to subsequent receipt of the first data via the first input device. 2. The device of claim 1 , wherein the first input mode is one of a speech mode, a gesture mode, or a video mode, and wherein the second input mode is a different one of the speech mode, the gesture mode, or the video mode. 3. The device of claim 1 , wherein the feedback message instructs the user to provide the second input to disambiguate the first input. 4. The device of claim 3 , wherein the one or more processors are further configured to send the feedback message in response to a confidence level associated with recognition processing of the first input failing to satisfy a confidence threshold. 5. The device of claim 1 , wherein the updated mapping associates a combination of the first input and the second input with the command. 6. The device of claim 1 , wherein the one or more processors include a multi-modal recognition engine, the multi-modal recognition engine including: a fusion embedding network configured to combine outputs of a first embedding network associated with the first input mode and a second embedding network associated with the second input mode to generate combined embedding vectors; and a classifier configured to map the combined embedding vectors to particular commands. 7. The device of claim 6 , further comprising a memory configured to store: first embedding network data and first weight data corresponding to the user; and second embedding network data and second weight data corresponding to a second user, the first embedding network data differing from the second embedding network data based on input command differences between the user and the second user, and the first weight data differing from the second weight data based on input mode reliability differences between the user and the second user. 8. The device of claim 1 , wherein the first input mode corresponds to a video mode, and wherein the one or more processors are configured to send the feedback message in response to an ambient light metric having a value below a lighting threshold. 9. The device of claim 1 , wherein the first input mode corresponds to a speech mode, and wherein the one or more processors are configured to send the feedback message in response to a noise metric having a value exceeding a noise threshold. 10. The device of claim 1 , further comprising a display configured to represent a graphical user interface. 11. The device of claim 1 , further comprising one or more microphones configured to capture audio input that includes one or more keywords or voice commands. 12. The device of claim 1 , further comprising one or more cameras configured to capture video input that includes one or more gestures or visual commands. 13. The device of claim 1 , further comprising one or more antennas configured to receive data indicative of a gesture input. 14. The device of claim 1 , further comprising one or more loudspeakers configured to render or direct the feedback message to the user. 15. The device of claim 1 , wherein the user includes a robot or other electronic device. 16. The device of claim 1 , wherein the first input device and the output device are incorporated into a virtual reality headset or augmented reality headset. 17. The device of claim 1 , wherein the first input device and the output device are incorporated into a vehicle. 18. A method for multi-modal user input, the method comprising: processing, at one or more processors of a device, first data received from a first input device, the first data indicating a first input from a user based on a first input mode; sending, from the one or more processors, a feedback message to an output device based on processing the first data, wherein the feedback message instructs the user to provide, based on a second input mode that is different from the first input mode, a second input that identifies a command associated with the first input; receiving, at the one or more processors, second data from a second input device, the second data indicating the second input; identifying the command based on the second data; and updating, at the one or more processors, a mapping to associate the first input to the command to enable activation of the command in response to subsequent receipt of the first data via the first input device. 19. The method of claim 18 , wherein the first input mode is one of a speech mode, a gesture mode, or a video mode, and wherein the second input mode is a different one of the speech mode, the gesture mode, or the video mode. 20. The method of claim 18 , wherein the feedback message instructs the user to provide the second input to disambiguate the first input. 21. The method of claim 20 , wherein the feedback message is sent in response to a confidence level associated with recognition processing of the first input failing to satisfy a confidence threshold. 22. The method of claim 18 , wherein the updated mapping associates a combination of the first input and the second input with the command. 23. The method of claim 18 , wherein updating the mapping includes at least one of: updating embedding network data associated with the user; or updating weight data associated with the user. 24. The method of claim 18 , wherein the first input mode corresponds to a video mode, and wherein the feedback message is sent in response to an ambient light metric having a value below a lighting threshold. 25. The method of claim 18 , wherein the first input mode corresponds to a speech mode, and wherein the feedback message is sent in response to a noise metric having a value exceeding a noise threshold. 26. An apparatus for multi-modal user input, the apparatus comprising: means for processing first data received from a first input device, the first data indicating a first input from a user based on a first input mode; means for sending a feedback message to an output device based on processing the first data, wherein the feedback message instructs the user to provide, based on a second input mode that is different from the first input mode, a second input that identifies a command associated with the first input; means for receiving second data from a second input device, the second data indicating the second input; means for identifying the command based on the second data; and means for updating a mapping to associate the first input to the command to enable activation of the command in response to subsequent receipt of the first data via the first input device. 27. The apparatus of claim 26 , wherein the updated mapping associates

Assignees

Qualcomm Inc

Inventors

Classifications

G10L15/20
Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title
G10L15/26
Speech to text systems (G10L15/08 takes precedence) · CPC title
G10L15/22Primary
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G06F9/451
Execution arrangements for user interfaces · CPC title
G06F3/0484
for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range · CPC title

Patent family

Related publications grouped by family.

View patent family 74101815

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11348581B2 cover?: A device for multi-modal user input includes a processor configured to process first data received from a first input device. The first data indicates a first input from a user based on a first input mode. The first input corresponds to a command. The processor is configured to send a feedback message to an output device based on processing the first data. The feedback message instructs the use…
Who is the assignee on this patent?: Qualcomm Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 31 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).