System and method for processing multi-modal device interactions in a natural language voice services environment

US10553213B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10553213-B2
Application numberUS-201815957158-A
CountryUS
Kind codeB2
Filing dateApr 19, 2018
Priority dateFeb 20, 2009
Publication dateFeb 4, 2020
Grant dateFeb 4, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for processing one or more multi-modal user interactions in a natural language voice services environment that includes one or more electronic devices, the method being implemented by a computer system that includes one or more physical processors executing one or more computer program instructions which, when executed, perform the method, the method comprising: detecting a multi-modal user interaction received via one or more electronic devices, the multi-modal user interaction comprising at least a non-voice input and a natural language utterance, wherein the non-voice input is received from a non-voice input component of the one or more electronic devices, and wherein the natural language utterance is received from a voice input component of the one or more electronic devices and is related to the non-voice input; obtaining an indication of a first time at which the non-voice input was received by the non-voice input component; obtaining an indication of a second time at which the natural language utterance was received by the voice input component; determining that the non-voice input and the natural language utterance are related and are to be interpreted together based on the first time and the second time; and responsive to determining that the non-voice input and the natural language utterance are related and are to be interpreted together based on the first time and the second time, performing the following steps: determining first context information relating to the non-voice input; determining second context information relating to the natural language utterance; determining an intent of the multi-modal user interaction based on the first context information and the second context information; identifying a transaction lead based on the determined intent; and transmitting the identified transaction lead to a user via the one or more electronic devices. 2. The method of claim 1 , wherein the one or more processors, the non-voice input component, and the voice input component are housed within a single electronic device. 3. The method of claim 1 , wherein the one or more processors are housed in a first electronic device, the non-voice input component is housed in a second electronic device, and the voice input component is housed in a third electronic device. 4. The method of claim 1 , wherein the one or more processors are housed in a first electronic device, and wherein the non-voice input component and the voice input component are housed in a second electronic device. 5. The method of claim 1 , wherein the one or more processors and the non-voice input component are housed in a first electronic device, and wherein the voice input component is housed in a second electronic device. 6. The method of claim 1 , wherein the one or more processors and the voice input component are housed in a first electronic device, and wherein the non-voice input component is housed in a second electronic device. 7. The method of claim 1 , wherein the non-voice input comprises a point of focus input on a display of the non-voice input component. 8. The method of claim 1 , wherein the non-voice input comprises a highlighting of text on a display of the non-voice input component. 9. The method of claim 1 , the method further comprising: obtaining preference information of a user, wherein the transaction lead is identified based further on the preference information. 10. The method of claim 1 , wherein the transaction lead comprises at least one of an advertisement or a recommendation related to the determined intent of the multi-modal user interaction. 11. The method of claim 1 , the method further comprising: receiving a further input after the transaction lead was transmitted; determining a second intent of the further input; and providing further information relating to the transaction lead based on the second intent. 12. The method of claim 1 , the method further comprising: receiving a further input after the transaction lead was transmitted; determining a second intent of the further input; and completing a purchase transaction in response to receiving the further input based on the determined second intent. 13. The method of claim 12 , wherein the further input comprises a second natural language utterance. 14. The method of claim 12 , wherein the further input comprises a second non-voice input. 15. The method of claim 1 , wherein the non-voice input component comprises a map display, and wherein the transaction lead is presented as a point on the map display. 16. A system of processing one or more multi-modal user interactions in a natural language voice services environment that includes one or more electronic devices, the system comprising: one or more physical processors programmed with one or more computer program instructions which, when executed, cause the one or more physical processors to: detect a multi-modal user interaction received via one or more electronic devices, the multi-modal user interaction comprising at least a non-voice input and a natural language utterance, wherein the non-voice input is received from a non-voice input component of the one or more electronic devices, and wherein the natural language utterance is received from a voice input component of the one or more electronic devices and is related to the non-voice input; obtain an indication of a first time at which the non-voice input was received by the non-voice input component; obtain an indication of a second time at which the natural language utterance was received by the voice input component; determine that the non-voice input and the natural language utterance are related and are to be interpreted together based on the first time and the second time; and responsive to determining that the non-voice input and the natural language utterance are related and are to be interpreted together based on the first time and the second time, perform the following steps: determine first context information relating to the non-voice input; determine second context information relating to the natural language utterance; determine an intent of the multi-modal user interaction based on the first context information and the second context information; identify a transaction lead based on the determined intent; and transmit the identified transaction lead to a user via the one or more electronic devices. 17. The system of claim 16 , wherein the one or more processors, the non-voice input component, and the voice input component are housed within a single electronic device. 18. The system of claim 16 , wherein the one or more processors are housed in a first electronic device, the non-voice input component is housed in a second electronic device, and the voice input component is housed in a third electronic device. 19. The system of claim 16 , wherein the one or more processors are housed in a first electronic device, and wherein the non-voice input component and the voice input component are housed in a second electronic device. 20. The system of claim 16 , wherein the one or more processors and the non-voice input component are housed in a first electronic device, and wherein the voice input component is housed in a second electronic device. 21. The system of claim 16 , wherein the one or more processors and the voice input component are housed in a first electronic device, and wherein the non-voice input component is housed in a second electronic device.

Assignees

Inventors

Classifications

  • Execution procedure of a spoken command · CPC title

  • of the speaker; Human-factor methodology · CPC title

  • Advertisements · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Interactive procedures; Man-machine interfaces · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10553213B2 cover?
A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of t…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 04 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).