Vpa with integrated object recognition and facial expression recognition
US-2017160813-A1 · Jun 8, 2017 · US
US10573307B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10573307-B2 |
| Application number | US-201715797411-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 30, 2017 |
| Priority date | Oct 31, 2016 |
| Publication date | Feb 25, 2020 |
| Grant date | Feb 25, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A syntactic analysis unit 104 performs a syntactic analysis for linguistic information on acquired user' speech (hereinafter simply referred to as “user speech”). A non-linguistic information analysis unit 106 analyzes non-linguistic information different from the linguistic information for the acquired user speech. A topic continuation determination unit 110 determines whether a topic of the current conversation should be continued or should be changed to a different topic according to the non-linguistic information analysis result. A response generation unit 120 generates a response according to a result of a determination by the topic continuation determination unit 110.
Opening claim text (preview).
What is claimed is: 1. A voice interaction apparatus configured to have a conversation with a user by using a voice, comprising: a processor configured to: determine a first relation between normalized values of fundamental frequencies at phrase endings in user speech acquired in advance and frequencies of occurrences of cases where a topic is changed, and a second relation between the normalized values of the fundamental frequencies at the phrase endings in the user speech acquired in advance and frequencies of occurrences of cases where the topic is continued; acquire user speech given by the user; acquire frequency information by analyzing prosodic information on the user speech given by the user; determine whether or not a current topic of a current conversation should be continued according to a result of comparing the acquired frequency information with at least one of: the first relation or the second relation; generate a response according to a result of the determination of whether or not the current topic of the current conversation should be continued; and cause a voice corresponding to the generated response to be output. 2. The voice interaction apparatus according to claim 1 , wherein the processor is further configured to determine whether or not the current topic of the current conversation should be continued based on a comparison between at least one feature quantity included in a prosodic information analysis result and a predetermined threshold corresponding to the at least one feature quantity. 3. The voice interaction apparatus according to claim 2 , wherein the processor is further configured to determine that the current topic of the current conversation should be changed when a duration of the same topic is equal to or longer than a predetermined threshold. 4. The voice interaction apparatus according to claim 1 , wherein the processor is further configured to determine whether or not the current topic of the current conversation should be continued by determining whether a feature indicated by a prosodic information analysis result corresponds to continuation of the current topic or corresponds to a change of the current topic by using a determination model generated in advance through machine learning. 5. The voice interaction apparatus according to claim 1 , wherein the analyzing the prosodic information on the user speech given by the user includes analyzing history information. 6. The voice interaction apparatus according to claim 1 , wherein the processor is further configured to: analyze the prosodic information based on a voice waveform by performing a voice analysis for the acquired user speech; and calculate a value indicating a feature quantity indicating the prosodic information. 7. The voice interaction apparatus according to claim 6 , wherein the processor is further configured to calculate, for the acquired user speech, a fundamental frequency for each of frames that are obtained by dividing the acquired user speech at predetermined time intervals. 8. The voice interaction apparatus according to claim 1 , wherein the at least one feature quantity includes one of: an average of frequency in a predetermined time period before phrase end, a standard deviation of frequency in the predetermined time period before phrase end, a maximum value of frequency in the predetermined time period before phrase end, or an inclination of frequency in the predetermined time period before phrase end. 9. The voice interaction apparatus according to claim 1 , wherein the normalized values are normalized maximum values. 10. A voice interaction method performed by using a voice interaction apparatus configured to have a conversation with a user by using a voice, the voice interaction method comprising: determining a first relation between normalized values of fundamental frequencies at phrase endings in user speech acquired in advance and frequencies of occurrences of cases where a topic is changed, and a second relation between the normalized values of the fundamental frequencies at the phrase endings in the user speech acquired in advance and frequencies of occurrences of cases where the topic is continued; acquiring user speech given by the user; acquiring frequency information by analyzing prosodic information on the user speech given by the user; determining whether or not a current topic of a current conversation should be continued according to a result of comparing the acquired frequency information with at least one of: the first relation or the second relation; generating a response according to a result of the determination of whether or not the current topic of the current conversation should be continued; and outputting a voice corresponding to the generated response.
Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
using prosody or stress · CPC title
specially adapted for particular use · CPC title
Parsing for meaning understanding · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.