Voice interaction apparatus and voice interaction method

US10573307B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10573307-B2
Application numberUS-201715797411-A
CountryUS
Kind codeB2
Filing dateOct 30, 2017
Priority dateOct 31, 2016
Publication dateFeb 25, 2020
Grant dateFeb 25, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A syntactic analysis unit 104 performs a syntactic analysis for linguistic information on acquired user' speech (hereinafter simply referred to as “user speech”). A non-linguistic information analysis unit 106 analyzes non-linguistic information different from the linguistic information for the acquired user speech. A topic continuation determination unit 110 determines whether a topic of the current conversation should be continued or should be changed to a different topic according to the non-linguistic information analysis result. A response generation unit 120 generates a response according to a result of a determination by the topic continuation determination unit 110.

First claim

Opening claim text (preview).

What is claimed is: 1. A voice interaction apparatus configured to have a conversation with a user by using a voice, comprising: a processor configured to: determine a first relation between normalized values of fundamental frequencies at phrase endings in user speech acquired in advance and frequencies of occurrences of cases where a topic is changed, and a second relation between the normalized values of the fundamental frequencies at the phrase endings in the user speech acquired in advance and frequencies of occurrences of cases where the topic is continued; acquire user speech given by the user; acquire frequency information by analyzing prosodic information on the user speech given by the user; determine whether or not a current topic of a current conversation should be continued according to a result of comparing the acquired frequency information with at least one of: the first relation or the second relation; generate a response according to a result of the determination of whether or not the current topic of the current conversation should be continued; and cause a voice corresponding to the generated response to be output. 2. The voice interaction apparatus according to claim 1 , wherein the processor is further configured to determine whether or not the current topic of the current conversation should be continued based on a comparison between at least one feature quantity included in a prosodic information analysis result and a predetermined threshold corresponding to the at least one feature quantity. 3. The voice interaction apparatus according to claim 2 , wherein the processor is further configured to determine that the current topic of the current conversation should be changed when a duration of the same topic is equal to or longer than a predetermined threshold. 4. The voice interaction apparatus according to claim 1 , wherein the processor is further configured to determine whether or not the current topic of the current conversation should be continued by determining whether a feature indicated by a prosodic information analysis result corresponds to continuation of the current topic or corresponds to a change of the current topic by using a determination model generated in advance through machine learning. 5. The voice interaction apparatus according to claim 1 , wherein the analyzing the prosodic information on the user speech given by the user includes analyzing history information. 6. The voice interaction apparatus according to claim 1 , wherein the processor is further configured to: analyze the prosodic information based on a voice waveform by performing a voice analysis for the acquired user speech; and calculate a value indicating a feature quantity indicating the prosodic information. 7. The voice interaction apparatus according to claim 6 , wherein the processor is further configured to calculate, for the acquired user speech, a fundamental frequency for each of frames that are obtained by dividing the acquired user speech at predetermined time intervals. 8. The voice interaction apparatus according to claim 1 , wherein the at least one feature quantity includes one of: an average of frequency in a predetermined time period before phrase end, a standard deviation of frequency in the predetermined time period before phrase end, a maximum value of frequency in the predetermined time period before phrase end, or an inclination of frequency in the predetermined time period before phrase end. 9. The voice interaction apparatus according to claim 1 , wherein the normalized values are normalized maximum values. 10. A voice interaction method performed by using a voice interaction apparatus configured to have a conversation with a user by using a voice, the voice interaction method comprising: determining a first relation between normalized values of fundamental frequencies at phrase endings in user speech acquired in advance and frequencies of occurrences of cases where a topic is changed, and a second relation between the normalized values of the fundamental frequencies at the phrase endings in the user speech acquired in advance and frequencies of occurrences of cases where the topic is continued; acquiring user speech given by the user; acquiring frequency information by analyzing prosodic information on the user speech given by the user; determining whether or not a current topic of a current conversation should be continued according to a result of comparing the acquired frequency information with at least one of: the first relation or the second relation; generating a response according to a result of the determination of whether or not the current topic of the current conversation should be continued; and outputting a voice corresponding to the generated response.

Assignees

Inventors

Classifications

  • Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • using prosody or stress · CPC title

  • specially adapted for particular use · CPC title

  • Parsing for meaning understanding · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10573307B2 cover?
A syntactic analysis unit 104 performs a syntactic analysis for linguistic information on acquired user' speech (hereinafter simply referred to as “user speech”). A non-linguistic information analysis unit 106 analyzes non-linguistic information different from the linguistic information for the acquired user speech. A topic continuation determination unit 110 determines whether a topic of…
Who is the assignee on this patent?
Skantze Gabriel, Johansson Martin, Hori Tatsuro, and 3 more
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 25 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).