Specifying trip destinations from spoken dialogs

US11501754B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11501754-B2
Application numberUS-202016922946-A
CountryUS
Kind codeB2
Filing dateJul 7, 2020
Priority dateJul 7, 2020
Publication dateNov 15, 2022
Grant dateNov 15, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Desired vehicle destinations may be determined from spoken dialogs. A speech input may be received from a user through a voice user interface. Current utterance variables may be obtained by tokenizing the user speech input. One or more of a plurality of utterance templates for a reply to the user speech input may be determined by a trained automatic agent based on the plurality of current utterance variables. One of a plurality of Application Programming Interfaces (API) to call and one or more parameters for the API to call with may be determine by the trained automatic agent based on the plurality of current utterance variables. A response may be obtained from the API call. A context string for the reply to the user speech input by the trained automatic agent may be constructed based on the utterance templates and the response of the API call.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for determining a destination, comprising: receiving a user speech input through a voice user interface; obtaining a plurality of current utterance variables by tokenizing the user speech input; determining, by a trained automatic agent, one or more of a plurality of utterance templates for a reply to the user speech input based on the plurality of current utterance variables; determining, by the trained automatic agent based on the plurality of current utterance variables, one of a plurality of Application Programming Interfaces (API) to call and one or more parameters for the API to call with, wherein the one or more parameters are based on information of the plurality of current utterance variables; obtaining a response from the determined API call; and constructing a context string for the reply to the user speech input by the trained automatic agent based on the one or more of the plurality of utterance templates and the response of the API call, wherein the trained automatic agent is trained based on a plurality of training samples collected by: listing the plurality of utterance templates and the plurality of APIs in a Graphical User Interface (GUI) for a training agent to select from; tokenizing a training user speech input into a plurality of training utterance variables; and recording one or more of the plurality of training utterance templates, one or more of the plurality of APIs, and one or more of the plurality of training utterance variables that the training agent selects through the GUI in response to the training user speech input as a training sample of the plurality of training samples. 2. The method of claim 1 , wherein the plurality of training samples are recorded as fully-executable code. 3. The method of claim 1 , further comprising: providing a button for the training agent to add a new utterance template that is available for other training agents to use. 4. The method of claim 1 , wherein the trained automatic agent comprises a Bidirectional Encoder Representations from Transformers (BERT) natural language processing model. 5. The method of claim 1 , wherein the trained automatic agent comprises a generative pre-trained natural language processing model. 6. The method of claim 1 , wherein the API call comprises a find-place API call. 7. The method of claim 1 , wherein: the one or more parameters comprise a name of a trip destination from the plurality of current utterance variables and a starting location obtained locally; and the response from the API call comprises a latitude and longitude of a destination prediction. 8. The method of claim 1 , wherein: the one or more parameters comprise a starting location obtained locally and a destination; and the response from the API call comprises a distance and a duration from the starting location to the destination. 9. The method of claim 1 , wherein constructing the context string for the reply to the user speech input by the trained automatic agent comprises concatenating at least one of the current utterance variables and the response of the API call onto an existing context string. 10. A system for determining a destination, comprising one or more processors and one or more non-transitory computer-readable memories coupled to the one or more processors and configured with instructions executable by the one or more processors to cause the system to perform operations comprising: listing a plurality of utterance templates and a plurality of Application Programming Interfaces (APIs) in a Graphical User Interface (GUI) for a training agent to select from; tokenizing a training user speech input into a plurality of training utterance variables; and recording one or more of a plurality of training utterance templates, one or more of the plurality of APIs, and one or more of the plurality of training utterance variables that a training agent selects through the GUI in response to the training user speech input to obtain a plurality of training samples; obtaining a trained automatic agent trained based on the plurality of training samples; receiving a user speech input through a voice user interface; obtaining a plurality of current utterance variables by tokenizing the user speech input; determining, by the trained automatic agent, one or more of the plurality of utterance templates for a reply to the user speech input based on the plurality of current utterance variables and one of the plurality of APIs to call; obtaining a response from the determined API call; and constructing a context string for the reply to the user speech input based on the one or more of the plurality of utterance templates and the response of the API call. 11. The system of claim 10 , wherein the plurality of training samples are recorded as fully-executable code. 12. The system of claim 10 , wherein the trained automatic agent comprises a Bidirectional Encoder Representations from Transformers (BERT) natural language processing model. 13. The system of claim 10 , wherein the determining of the one of the plurality of APIs further comprises determining one or more parameters for the API to call with, the one or more parameters comprise a name of a trip destination from the plurality of current utterance variables and a starting location obtained locally; and the response from the API call comprises a latitude and longitude of a destination prediction. 14. The system of claim 10 , wherein the determining of the one of the plurality of APIs further comprises determining one or more parameters for the API to call with, the one or more parameters comprise a starting location obtained locally and a destination; and the response from the API call comprises a distance and a duration from the starting location to the destination. 15. The system of claim 10 , wherein constructing the context string for the reply to the user speech input comprises concatenating at least one of the current utterance variables and the response of the API call onto an existing context string. 16. A non-transitory computer-readable storage medium configured with instructions executable by one or more processors to cause the one or more processors to perform operations comprising: listing a plurality of utterance templates and a plurality of Application Programming Interfaces (APIs) in a Graphical User Interface (GUI) for a training agent to select from; tokenizing a training user speech input into a plurality of training utterance variables; and recording one or more of a plurality of training utterance templates, one or more of the plurality of APIs, and one or more of the plurality of training utterance variables that a training agent selects through the GUI in response to the training user speech input to obtain a plurality of training samples; obtaining a trained automatic agent trained based on the plurality of training samples; receiving a user speech input through a voice user interface; obtaining a plurality of current utterance variables by tokenizing the user speech input; determining, by the trained automatic agent, one or more of the plurality of utterance templates for a reply to the user speech input based on the plurality of current utterance variables and one of the plurality of APIs to call; obtaining a response from the determined API call; and constructing a context string for the reply to the user speech input based on the one or more of the plurality of utterance templates and the response of the API call. 17. The non-transitory computer-reada

Assignees

Inventors

Classifications

  • G10L15/063Primary

    Training · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

  • G06F40/35Primary

    Discourse or dialogue representation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11501754B2 cover?
Desired vehicle destinations may be determined from spoken dialogs. A speech input may be received from a user through a voice user interface. Current utterance variables may be obtained by tokenizing the user speech input. One or more of a plurality of utterance templates for a reply to the user speech input may be determined by a trained automatic agent based on the plurality of current utter…
Who is the assignee on this patent?
Beijing Didi Infinity Technology & Dev Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).