Speech-processing system
US-2022115016-A1 · Apr 14, 2022 · US
US11869488B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11869488-B2 |
| Application number | US-202017106701-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 30, 2020 |
| Priority date | Dec 18, 2019 |
| Publication date | Jan 9, 2024 |
| Grant date | Jan 9, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In cases in which a confidence score of an inferred intent label is a predetermined threshold or less, an intent inference section searches for whether or not wording pertaining to a location, such as “on the door”, is present in a question. In cases in which a word relating to a location is present, the intent inference section consults individual function identification data associated with OM item codes in order to find intent labels including individual functions relevant to the location (such as “door”). In cases in which an intent label including an individual function relevant to the “door” is found, an OMA interaction control section consults QA data to find and acquire associated response information based on the found intent label and the OM item code, and notifies a HMI interaction control section of such response information.
Opening claim text (preview).
What is claimed is: 1. An agent device comprising: a processor configured to: recognize spoken content as text data; ascertain an intent of the spoken content using a trained model that is generated by performing machine learning employing training data in which text data is input, and an intent label associated with an intent expressed in the text data is output, the trained model being configured to output the intent label if the intent of the spoken content can be ascertained, and output that a corresponding intent label does not exist if the intent of the spoken content cannot be ascertained; if the trained model outputs the intent outputting response information based on the intent label; and if the trained model outputs that the corresponding intent label does not exist, and a predetermined word used to identify an individual function pertaining to a vehicle has been recognized in the spoken content, obtain a relevant list that is relevant to the recognized word based on association data that includes association between the individual function pertaining to the vehicle identified by the recognized word and at least one intent label, and output the obtained relevant list as the response information, so as to supplement the output of the trained model. 2. The agent device of claim 1 , wherein: in cases in which response information for the spoken content cannot be prepared and in which, in the spoken content, the predetermined word and a word pertaining to a predetermined individual function has been recognized, the processor outputs a relevant list that is relevant to both the predetermined word and the predetermined individual function as the response information. 3. The agent device of claim 2 , wherein: the predetermined word is at least one of a word expressing a location where the individual function is disposed, a word expressing a shape of the individual function, or a word expressing a color of the individual function. 4. The agent device of claim 2 , wherein the processor is configured to output pre-prepared candidate images as the relevant list. 5. The agent device of claim 2 , wherein: in cases in which there is no corresponding relevant list, the processor is configured to output an error message including content requesting an utterance including a word used to identify the individual function. 6. The agent device of claim 2 , wherein the processor is configured to recognize spoken content regarding a manual of a vehicle. 7. The agent device of claim 1 , wherein: the predetermined word is at least one of a word expressing a location where the individual function is disposed, a word expressing a shape of the individual function, or a word expressing a color of the individual function. 8. The agent device of claim 1 , wherein the processor is configured to output pre-prepared candidate images as the relevant list. 9. The agent device of claim 1 , wherein: in cases in which there is no corresponding relevant list, the processor is configured to output an error message including content requesting an utterance including a word used to identify the individual function. 10. The agent device of claim 1 , wherein the processor is configured to recognize spoken content regarding a manual of a vehicle. 11. The agent device of claim 7 , wherein the processor is configured to output pre-prepared candidate images as the relevant list. 12. The agent device of claim 7 , wherein: in cases in which there is no corresponding relevant list, the processor is configured to output an error message including content requesting an utterance including a word used to identify the individual function. 13. The agent device of claim 7 , wherein the processor is configured to recognize spoken content regarding a manual of a vehicle. 14. An agent system comprising: an agent device including a first processor, the first processor being configured to: recognize spoken content as text data; ascertain an intent of the spoken content using a trained model that is generated by performing machine learning employing training data in which text data is input, and an intent label associated with an intent expressed in the text data is output, the trained model being configured to output the intent label if the intent of the spoken content can be ascertained, and output that a corresponding intent label does not exist if the intent of the spoken content cannot be ascertained; if the trained model outputs the intent, output response information based on the intent label; and if the trained model outputs that the corresponding intent label does not exist, and a predetermined word used to identify an individual function pertaining to a vehicle has been recognized in the spoken content, obtain a relevant list that is relevant to the recognized word based on association data that includes association between the individual function pertaining to the vehicle identified by the recognized word and at least one intent label, and output the obtained relevant list as the response information; and an information provision device that is installed in a vehicle and that includes a second processor that is configured to detect an utterance of an occupant, provide the detected utterance to the agent device, and report the response information output from the agent device, so as to supplement the output of the trained model. 15. A non-transitory computer-readable storage medium stored with an agent program for causing a computer to: recognize spoken content as text data; ascertain an intent of the spoken content using a trained model that is generated by performing machine learning employing training data in which text data is input, and an intent label associated with an intent expressed in the text data is output, the trained model being configured to output the intent label if the intent of the spoken content can be ascertained, and output that a corresponding intent label does not exist if the intent of the spoken content cannot be ascertained; if the trained model outputs the intent outputting response information based on the intent label; and if the trained model outputs that the corresponding intent label does not exist, and a predetermined word used to identify an individual function pertaining to a vehicle has been recognized in the spoken content, obtain a relevant list that is relevant to the recognized word based on association data that includes association between the individual function pertaining to the vehicle identified by the recognized word and at least one intent label, and output the obtained relevant list as the response information, so as to supplement the output of the trained model.
Parsing for meaning understanding · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Execution procedure of a spoken command · CPC title
Feedback of the input speech · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.