Specifying trip destinations from spoken dialogs
US-2022013108-A1 · Jan 13, 2022 · US
US11501754B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11501754-B2 |
| Application number | US-202016922946-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 7, 2020 |
| Priority date | Jul 7, 2020 |
| Publication date | Nov 15, 2022 |
| Grant date | Nov 15, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Desired vehicle destinations may be determined from spoken dialogs. A speech input may be received from a user through a voice user interface. Current utterance variables may be obtained by tokenizing the user speech input. One or more of a plurality of utterance templates for a reply to the user speech input may be determined by a trained automatic agent based on the plurality of current utterance variables. One of a plurality of Application Programming Interfaces (API) to call and one or more parameters for the API to call with may be determine by the trained automatic agent based on the plurality of current utterance variables. A response may be obtained from the API call. A context string for the reply to the user speech input by the trained automatic agent may be constructed based on the utterance templates and the response of the API call.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for determining a destination, comprising: receiving a user speech input through a voice user interface; obtaining a plurality of current utterance variables by tokenizing the user speech input; determining, by a trained automatic agent, one or more of a plurality of utterance templates for a reply to the user speech input based on the plurality of current utterance variables; determining, by the trained automatic agent based on the plurality of current utterance variables, one of a plurality of Application Programming Interfaces (API) to call and one or more parameters for the API to call with, wherein the one or more parameters are based on information of the plurality of current utterance variables; obtaining a response from the determined API call; and constructing a context string for the reply to the user speech input by the trained automatic agent based on the one or more of the plurality of utterance templates and the response of the API call, wherein the trained automatic agent is trained based on a plurality of training samples collected by: listing the plurality of utterance templates and the plurality of APIs in a Graphical User Interface (GUI) for a training agent to select from; tokenizing a training user speech input into a plurality of training utterance variables; and recording one or more of the plurality of training utterance templates, one or more of the plurality of APIs, and one or more of the plurality of training utterance variables that the training agent selects through the GUI in response to the training user speech input as a training sample of the plurality of training samples. 2. The method of claim 1 , wherein the plurality of training samples are recorded as fully-executable code. 3. The method of claim 1 , further comprising: providing a button for the training agent to add a new utterance template that is available for other training agents to use. 4. The method of claim 1 , wherein the trained automatic agent comprises a Bidirectional Encoder Representations from Transformers (BERT) natural language processing model. 5. The method of claim 1 , wherein the trained automatic agent comprises a generative pre-trained natural language processing model. 6. The method of claim 1 , wherein the API call comprises a find-place API call. 7. The method of claim 1 , wherein: the one or more parameters comprise a name of a trip destination from the plurality of current utterance variables and a starting location obtained locally; and the response from the API call comprises a latitude and longitude of a destination prediction. 8. The method of claim 1 , wherein: the one or more parameters comprise a starting location obtained locally and a destination; and the response from the API call comprises a distance and a duration from the starting location to the destination. 9. The method of claim 1 , wherein constructing the context string for the reply to the user speech input by the trained automatic agent comprises concatenating at least one of the current utterance variables and the response of the API call onto an existing context string. 10. A system for determining a destination, comprising one or more processors and one or more non-transitory computer-readable memories coupled to the one or more processors and configured with instructions executable by the one or more processors to cause the system to perform operations comprising: listing a plurality of utterance templates and a plurality of Application Programming Interfaces (APIs) in a Graphical User Interface (GUI) for a training agent to select from; tokenizing a training user speech input into a plurality of training utterance variables; and recording one or more of a plurality of training utterance templates, one or more of the plurality of APIs, and one or more of the plurality of training utterance variables that a training agent selects through the GUI in response to the training user speech input to obtain a plurality of training samples; obtaining a trained automatic agent trained based on the plurality of training samples; receiving a user speech input through a voice user interface; obtaining a plurality of current utterance variables by tokenizing the user speech input; determining, by the trained automatic agent, one or more of the plurality of utterance templates for a reply to the user speech input based on the plurality of current utterance variables and one of the plurality of APIs to call; obtaining a response from the determined API call; and constructing a context string for the reply to the user speech input based on the one or more of the plurality of utterance templates and the response of the API call. 11. The system of claim 10 , wherein the plurality of training samples are recorded as fully-executable code. 12. The system of claim 10 , wherein the trained automatic agent comprises a Bidirectional Encoder Representations from Transformers (BERT) natural language processing model. 13. The system of claim 10 , wherein the determining of the one of the plurality of APIs further comprises determining one or more parameters for the API to call with, the one or more parameters comprise a name of a trip destination from the plurality of current utterance variables and a starting location obtained locally; and the response from the API call comprises a latitude and longitude of a destination prediction. 14. The system of claim 10 , wherein the determining of the one of the plurality of APIs further comprises determining one or more parameters for the API to call with, the one or more parameters comprise a starting location obtained locally and a destination; and the response from the API call comprises a distance and a duration from the starting location to the destination. 15. The system of claim 10 , wherein constructing the context string for the reply to the user speech input comprises concatenating at least one of the current utterance variables and the response of the API call onto an existing context string. 16. A non-transitory computer-readable storage medium configured with instructions executable by one or more processors to cause the one or more processors to perform operations comprising: listing a plurality of utterance templates and a plurality of Application Programming Interfaces (APIs) in a Graphical User Interface (GUI) for a training agent to select from; tokenizing a training user speech input into a plurality of training utterance variables; and recording one or more of a plurality of training utterance templates, one or more of the plurality of APIs, and one or more of the plurality of training utterance variables that a training agent selects through the GUI in response to the training user speech input to obtain a plurality of training samples; obtaining a trained automatic agent trained based on the plurality of training samples; receiving a user speech input through a voice user interface; obtaining a plurality of current utterance variables by tokenizing the user speech input; determining, by the trained automatic agent, one or more of the plurality of utterance templates for a reply to the user speech input based on the plurality of current utterance variables and one of the plurality of APIs to call; obtaining a response from the determined API call; and constructing a context string for the reply to the user speech input based on the one or more of the plurality of utterance templates and the response of the API call. 17. The non-transitory computer-reada
Training · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
Discourse or dialogue representation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.