Dynamic voice response control based on a weighted pace of spoken terms
US-9443514-B1 · Sep 13, 2016 · US
US10553213B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10553213-B2 |
| Application number | US-201815957158-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 19, 2018 |
| Priority date | Feb 20, 2009 |
| Publication date | Feb 4, 2020 |
| Grant date | Feb 4, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction.
Opening claim text (preview).
What is claimed is: 1. A method for processing one or more multi-modal user interactions in a natural language voice services environment that includes one or more electronic devices, the method being implemented by a computer system that includes one or more physical processors executing one or more computer program instructions which, when executed, perform the method, the method comprising: detecting a multi-modal user interaction received via one or more electronic devices, the multi-modal user interaction comprising at least a non-voice input and a natural language utterance, wherein the non-voice input is received from a non-voice input component of the one or more electronic devices, and wherein the natural language utterance is received from a voice input component of the one or more electronic devices and is related to the non-voice input; obtaining an indication of a first time at which the non-voice input was received by the non-voice input component; obtaining an indication of a second time at which the natural language utterance was received by the voice input component; determining that the non-voice input and the natural language utterance are related and are to be interpreted together based on the first time and the second time; and responsive to determining that the non-voice input and the natural language utterance are related and are to be interpreted together based on the first time and the second time, performing the following steps: determining first context information relating to the non-voice input; determining second context information relating to the natural language utterance; determining an intent of the multi-modal user interaction based on the first context information and the second context information; identifying a transaction lead based on the determined intent; and transmitting the identified transaction lead to a user via the one or more electronic devices. 2. The method of claim 1 , wherein the one or more processors, the non-voice input component, and the voice input component are housed within a single electronic device. 3. The method of claim 1 , wherein the one or more processors are housed in a first electronic device, the non-voice input component is housed in a second electronic device, and the voice input component is housed in a third electronic device. 4. The method of claim 1 , wherein the one or more processors are housed in a first electronic device, and wherein the non-voice input component and the voice input component are housed in a second electronic device. 5. The method of claim 1 , wherein the one or more processors and the non-voice input component are housed in a first electronic device, and wherein the voice input component is housed in a second electronic device. 6. The method of claim 1 , wherein the one or more processors and the voice input component are housed in a first electronic device, and wherein the non-voice input component is housed in a second electronic device. 7. The method of claim 1 , wherein the non-voice input comprises a point of focus input on a display of the non-voice input component. 8. The method of claim 1 , wherein the non-voice input comprises a highlighting of text on a display of the non-voice input component. 9. The method of claim 1 , the method further comprising: obtaining preference information of a user, wherein the transaction lead is identified based further on the preference information. 10. The method of claim 1 , wherein the transaction lead comprises at least one of an advertisement or a recommendation related to the determined intent of the multi-modal user interaction. 11. The method of claim 1 , the method further comprising: receiving a further input after the transaction lead was transmitted; determining a second intent of the further input; and providing further information relating to the transaction lead based on the second intent. 12. The method of claim 1 , the method further comprising: receiving a further input after the transaction lead was transmitted; determining a second intent of the further input; and completing a purchase transaction in response to receiving the further input based on the determined second intent. 13. The method of claim 12 , wherein the further input comprises a second natural language utterance. 14. The method of claim 12 , wherein the further input comprises a second non-voice input. 15. The method of claim 1 , wherein the non-voice input component comprises a map display, and wherein the transaction lead is presented as a point on the map display. 16. A system of processing one or more multi-modal user interactions in a natural language voice services environment that includes one or more electronic devices, the system comprising: one or more physical processors programmed with one or more computer program instructions which, when executed, cause the one or more physical processors to: detect a multi-modal user interaction received via one or more electronic devices, the multi-modal user interaction comprising at least a non-voice input and a natural language utterance, wherein the non-voice input is received from a non-voice input component of the one or more electronic devices, and wherein the natural language utterance is received from a voice input component of the one or more electronic devices and is related to the non-voice input; obtain an indication of a first time at which the non-voice input was received by the non-voice input component; obtain an indication of a second time at which the natural language utterance was received by the voice input component; determine that the non-voice input and the natural language utterance are related and are to be interpreted together based on the first time and the second time; and responsive to determining that the non-voice input and the natural language utterance are related and are to be interpreted together based on the first time and the second time, perform the following steps: determine first context information relating to the non-voice input; determine second context information relating to the natural language utterance; determine an intent of the multi-modal user interaction based on the first context information and the second context information; identify a transaction lead based on the determined intent; and transmit the identified transaction lead to a user via the one or more electronic devices. 17. The system of claim 16 , wherein the one or more processors, the non-voice input component, and the voice input component are housed within a single electronic device. 18. The system of claim 16 , wherein the one or more processors are housed in a first electronic device, the non-voice input component is housed in a second electronic device, and the voice input component is housed in a third electronic device. 19. The system of claim 16 , wherein the one or more processors are housed in a first electronic device, and wherein the non-voice input component and the voice input component are housed in a second electronic device. 20. The system of claim 16 , wherein the one or more processors and the non-voice input component are housed in a first electronic device, and wherein the voice input component is housed in a second electronic device. 21. The system of claim 16 , wherein the one or more processors and the voice input component are housed in a first electronic device, and wherein the non-voice input component is housed in a second electronic device.
Execution procedure of a spoken command · CPC title
of the speaker; Human-factor methodology · CPC title
Advertisements · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Interactive procedures; Man-machine interfaces · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.