Learning intended user actions

US2016239259A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016239259-A1
Application numberUS-201514748296-A
CountryUS
Kind codeA1
Filing dateJun 24, 2015
Priority dateFeb 16, 2015
Publication dateAug 18, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system are provided. The method includes receiving, by a microphone and camera, user utterances indicative of user commands and associated user gestures for the user utterances. The method further includes parsing, by a hardware-based recognizer, sample utterances and the user utterances into verb parts and noun parts. The method also includes recognizing, by a hardware-based recognizer, the user utterances and the associated user gestures based on the sample utterances and descriptions of associated supporting gestures for the sample utterances. The recognizing step includes comparing the verb parts and the noun parts from the user utterances individually and as pairs to the verb parts and the noun parts of the sample utterances. The method additionally includes selectively performing a given one of the user commands responsive to a recognition result.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, comprising receiving, by a microphone and camera, user utterances indicative of user commands and associated user gestures for the user utterances; parsing, by a hardware-based recognizer, sample utterances and the user utterances into verb parts and noun parts; recognizing, by a hardware-based recognizer, the user utterances and the associated user gestures based on the sample utterances and descriptions of associated supporting gestures for the sample utterances, wherein said recognizing step comprises comparing the verb parts and the noun parts from the user utterances individually and as pairs to the verb parts and the noun parts of the sample utterances; and selectively performing a given one of the user commands responsive to a recognition result. 2 . The method of claim 1 , wherein said recognizing step comprises forming triples of a verb, a noun, and a gesture from the user utterances of the user commands and the associated user gestures for the user utterances. 3 . The method of claim 2 , wherein said recognizing step comprises: at least one of, comparing at least one of the verb and the noun in a triple to at least one of a verb and a noun from one or more of the sample utterances, and comparing at least one synonym of at least one of the verb and the noun from the one or more of the sample utterances; and determining whether the gesture in the triple fits a description of a corresponding one or more of the associated supporting gestures. 4 . The method of claim 2 , wherein said recognizing step compares the verb and the noun to the gesture as a pair and individually. 5 . The method of claim 4 , wherein the given one of the user commands is selectively performed in an absence of one of the verb or the noun corresponding thereto, responsive to a match between an existing one of the verb or the noun and a lack of contrary intent evidence that the existing one of the verb or the noun is unrelated to the gesture. 6 . The method of claim 1 , further comprising: learning from multiple recognition sessions by acquiring user accepted examples and user rejected examples of the user utterances and the associated user gestures; and selectively performing a given one of the user commands responsive to the user accepted examples and the user rejected examples. 7 . The method of claim 6 , further comprising generating respective confidence values for at least one of the noun, the verb, the gesture, and a combination thereof including at least the gesture, responsive to at least one of a number of user accepted examples and a number of user rejected examples involving the gesture and at least one of the noun and the verb for a particular one of the user commands. 8 . The method of claim 7 , wherein said recognizing step comprises recognizing multiple possible intended actions, and the method further comprises arbitrating between the possible intended actions based on the respective confidence values corresponding thereto. 9 . The method of claim 6 , further comprising generating respective error values for at least one of the noun, the verb, the gesture, and a combination thereof including at least the gesture, responsive to at least one of a number of user accepted examples and a number of user rejected examples involving the gesture and at least one of the noun and the verb for a particular one of the user commands. 10 . The method of claim 6 , wherein said learning step comprises acquiring at least one of user spoken words and user performed gestures potentially applicable to one or more of the user commands, for storing in a memory device as at least one of new sample utterances and new descriptions of associated sample gestures for the new sample utterances. 11 . The method of claim 6 , wherein said learning step: acquires a user accepted example of at least one particular user utterance and at least one particular associated user gesture responsive to the user allowing a particular one of the user commands, represented by the at least one particular user utterance and the at least one particular associated user gesture, to be ultimately performed; and acquires a user rejected example of the at least one particular user utterance and the at least one particular associated user gesture responsive to the user preventing or undoing the particular one of the user commands represented by the at least one particular user utterance and the at least one particular associated user gesture. 12 . The method of claim 6 , wherein said learning step comprises generating statistical data to inform subsequent trials based on whether the user allows the given one of the user commands to proceed or intends to undo the given one of the user commands. 13 . The method of claim 6 , wherein said learning step comprises learning one or more ways in which the user expresses an intention to perform a particular one of the user commands using a combination of user gestures and deixis. 14 . The method of claim 1 , wherein the user commands comprise a command for moving content from a first location to a second location in a virtual environment.

Assignees

Inventors

Classifications

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • G06F3/167Primary

    Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

  • Speech recognition using non-acoustical features · CPC title

  • Gesture based interaction, e.g. based on a set of recognized hand gestures (interaction based on gestures traced on a digitiser G06F3/04883) · CPC title

  • Parsing for meaning understanding · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016239259A1 cover?
A method and system are provided. The method includes receiving, by a microphone and camera, user utterances indicative of user commands and associated user gestures for the user utterances. The method further includes parsing, by a hardware-based recognizer, sample utterances and the user utterances into verb parts and noun parts. The method also includes recognizing, by a hardware-based recog…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F3/167. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Aug 18 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).