System and method for multi-modal input synchronization and disambiguation
US-9123341-B2 · Sep 1, 2015 · US
US9298287B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9298287-B2 |
| Application number | US-201113077368-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 31, 2011 |
| Priority date | Mar 31, 2011 |
| Publication date | Mar 29, 2016 |
| Grant date | Mar 29, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A user interaction activation may be provided. A plurality of signals received from a user may be evaluated to determine whether the plurality of signals are associated with a visual display. If so, the plurality of signals may be translated into an agent action and a context associated with the visual display may be retrieved. The agent action may be performed according to the retrieved context and a result associated with the performed agent action may be displayed to the user.
Opening claim text (preview).
What is claimed is: 1. A system for providing a user-interaction activation, the system comprising: a display configured to display a multimedia presentation, the multimedia presentation comprising a series of frames that are displayed sequentially, each frame having a plurality of spatial areas, each spatial area having a context that comprises information relating to the multimedia presentation; at least one camera operatively coupled to the display; at least one microphone; a memory storage; and a processing unit coupled to the memory storage, the camera and the microphone, wherein the processing unit is operative to: receive, at the at least one microphone, a speech signal; analyze the speech signal to identify a request relating to the multimedia presentation, the request relating to a frame; receive, at the at least one camera, a gesture identifying a spatial area of the frame; create a query based on the request, the identified spatial area, and the context associated with the spatial area; perform the query; and display a result based on the performed query. 2. The method of claim 1 , wherein the speech signal comprises a spoken phrase. 3. The method of claim 1 , wherein the multimedia presentation comprises at least [one of the following: a still image displayed to the user and] a video image displayed to the user. 4. The method of claim 1 , wherein the gesture and the speech signal are received from the user contemporaneously. 5. The method of claim 1 , wherein the display comprises an image captured by a recording device associated with the user. 6. The method of claim 1 , wherein the gesture comprises an activation gesture. 7. The method of claim 1 , wherein the gesture is identified via a plurality of cameras. 8. The method of claim 1 , wherein perform the query comprises querying a database for the result. 9. The method of claim 1 , wherein the context associated with the multimedia presentation comprises background information about the multimedia presentation. 10. The method of claim 1 , further comprising: receive, at the at least one microphone, a second speech signal; analyze the second speech signal to identify a second request relating to the multimedia presentation, the second request relating to the frame; receive, at the at least one camera, a second gesture identifying a second spatial area of the frame; create a second query based on the second request, the identified second spatial area, and the context associated with the second spatial area; perform the second query; and display a second result based on the performed query. 11. A system for providing user-interaction activation, including: at least one processor; and a memory operatively coupled to the at least one processor and including instructions that, when executed by the at least one processor, cause the at least one processor to perform a method, the method comprising: receiving, at a microphone, a speech signal; analyzing the speech signal to identify a request relating to a multimedia presentation displayed on a display, wherein the multimedia presentation comprises a series of frames, each frame having a plurality of spatial areas, each spatial area having different information, wherein the frames are displayed sequentially and wherein each frame in the series of frames is associated with a context that comprises informational content of the multimedia presentation; wherein the request relates to a frame of the multimedia presentation; receiving, at a camera, a gesture identifying a spatial area of the frame; creating a query based on the request, the identified spatial area, and the context associated with the frame; performing the query; and providing a result based on the performed query. 12. A device comprising: a display for displaying a multimedia presentation, the multimedia presentation comprising a series of frames, each frame having a plurality of spatial areas, each spatial area having different information, wherein the frames are displayed sequentially, and wherein each frame in the series of frames is associated with a context that comprises informational content of the multimedia presentation; a microphone for capturing one or more speech signals; a motion detector for detecting one or more user gestures, the motion detector operatively connected to the display; a programmable circuit operatively connected to the display, the microphone, and the motion detector, the programmable circuit configured to execute program instructions which, when executed, cause the device to: receive, at the microphone, a speech signal; analyze the speech signal to identify a request relating to the multimedia presentation, the request relating to a frame; receive, at the motion detector, a user gesture identifying a spatial area of the frame; create a query based on the request, the identified spatial area, and the context associated with the frame; send the query to a server; receive, from the server, a result of the query; and display the result. 13. The device of claim 12 , wherein the multimedia presentation comprises at least [one of the following: a still image, a photo, and] a video. 14. The device of claim 12 , wherein the device receives the speech signal and the user gesture contemporaneously. 15. The device of claim 12 , wherein an image captured by a recording device associated with the user is displayed on the display of the device. 16. The device of claim 12 , wherein one of the detected user gestures comprises an activation gesture. 17. The device of claim 12 , further comprises a second motion detector operatively connected to the programmable circuit. 18. The device of claim 12 , wherein the motion detector comprises at least one of: a video camera and a still camera. 19. The device of claim 12 , wherein the context of the multimedia presentation further comprises background information about the multimedia presentation displayed on the device. 20. The device of claim 19 , wherein the result is received, over the network from a database.
Gesture based interaction, e.g. based on a set of recognized hand gestures (interaction based on gestures traced on a digitiser G06F3/04883) · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Multimodal input, i.e. interface arrangements enabling the user to issue commands by simultaneous use of input devices of different nature, e.g. voice plus gesture on digitizer · CPC title
Control and interface arrangements therefor, e.g. drivers or device-embedded control circuitry · CPC title
using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.