Voice interaction device, voice interaction method, voice interaction program, and robot

US10650815B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10650815-B2
Application numberUS-201715834030-A
CountryUS
Kind codeB2
Filing dateDec 6, 2017
Priority dateDec 14, 2016
Publication dateMay 12, 2020
Grant dateMay 12, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A topic providing device includes a candidate topic extractor, a provided topic determiner, a voice synthesizer, and a speaker. When a determination is made that a parent and child are conversing and that there is a need to provide a new topic to the parent and child, based on a conversation history database and a child activity database storing at least one activity name indicating an activity the child was engaged in for a first predetermined period of time, the candidate topic extractor extracts at least one candidate topic that corresponds to the at least one activity name in the child activity database and does not correspond to an activity name included in text data recorded in a first database. From the at least one candidate topic, the provided topic determiner selects one topic to provide to the parent and the child. The voice synthesizer generates voice data containing the one topic. The speaker outputs the voice data.

First claim

Opening claim text (preview).

What is claimed is: 1. A device performing voice interaction with a plurality of users, the device comprising: a sensor obtaining image data of an area around the device; a microphone obtaining audio of the area around the device; a speaker; a processor; and a non-transitory memory storing thereon a computer program, which when executed by the processor, causes the processor to perform operations including storing a plurality of image data corresponding to the plurality of users, the plurality of users including an adult and a child; identifying a person contained in the obtained image data based on the obtained image data and the stored plurality of image data, and outputting user information indicating the identified person; extracting a voice from the obtained audio, extracting a feature value of the voice and text data corresponding to the voice, and associating the text data with the feature value and recording the associated text data and feature value in a first database; first determining, based on the user information and the first database, whether the adult and the child are conversing, and determining that the adult and the child are conversing when the adult and the child are the identified persons and the feature value contains a plurality of mutually dissimilar feature values; second determining, based on the first database, whether there is a need to provide a new topic to the adult and the child when the adult and the child are determined to be conversing, and determining that there is a need to provide a new topic to the adult and the child when a first key phrase is contained in the text data indicating the conversation between the adult and the child during a current predetermined period of time; extracting at least one candidate topic based on the first database and a second database when providing the topic is determined to be necessary, the second database storing at least one activity name indicating an activity the child was engaged in for a first predetermined period of time, which is earlier than the current predetermined period of time, the at least one candidate topic corresponding to the at least one activity name in the second database and not corresponding to the at least one activity name included in the text data indicating the conversation between the adult and the child during the current predetermined period of time recorded in the first database; selecting from the at least one candidate topic one topic to provide to the adult and the child; generating voice data containing the one topic; and outputting the generated voice data via the speaker. 2. The device according to claim 1 , wherein the second database further stores movement amount information indicating an amount of movement corresponding to the activity name, audio level information indicating an audio level corresponding to the activity name, and date information indicating a date corresponding to the activity name, in the extracting, specifying the newest activity name based on the second database and extracting, as the at least one candidate topic, at least one second activity name different from the newest activity name and the at least one activity name included in the text data, and in the selecting, selecting, as the one topic, a third activity name from the at least one second activity name based on a first movement amount corresponding to the newest activity name, a first audio level corresponding to the newest activity name, a second movement amount corresponding to the at least one second activity name among the activity names, and a second audio level corresponding to the at least one second activity name. 3. The device according to claim 2 , wherein in the selecting, selecting, as the third activity name, the second activity name having the largest sum calculated according to the following formula: ( A−B ) 2 +( C−D ) 2 where A represents the first movement amount, B represents the second movement amount, C represents the first audio level, and D represents the second audio level. 4. The device according to claim 2 , wherein in the extracting, extracting, as the at least one candidate topic, at least one second activity name different from the newest activity name and the at least one activity name included in the text data, the at least one second activity name being recorded in a second predetermined period of time. 5. The device according to claim 2 , wherein the movement amount information is a value obtained by multiplying a first coefficient by the movement amount, and the audio level information is a value obtained by multiplying a second coefficient by the audio level. 6. The device according to claim 2 , wherein in the generating, based on the second database, when a third movement amount corresponding to the third activity name is equal to or greater than a first threshold value generating the voice data containing a second key phrase and, based on the second database, when the third movement amount corresponding to the third activity name is less than the first threshold value, generating the voice data containing a third key phrase. 7. The device according to claim 6 , wherein the second key phrase and the third key phrase contain phrasing providing feedback on the child's engagement level in the third activity name, and a meaning indicated by the second key phrase is the opposite of a meaning indicated by the third key phrase. 8. The device according to claim 2 , wherein in the generating based on the second database, when a third audio level corresponding to the third activity name is equal to or greater than a first threshold value, generating the voice data containing a second key phrase and, based on the second database, when the third audio level corresponding to the third activity name is less than the first threshold value, generating the voice data containing a third key phrase. 9. The device according to claim 8 , wherein the second key phrase and the third key phrase contain phrasing providing feedback on the child's engagement level in the third activity name, and a meaning indicated by the second key phrase is the opposite of a meaning indicated by the third key phrase. 10. The device according to claim 1 , wherein the feature value contains a voice-print of a speaker from whom a voice issues. 11. The device according to claim 1 , wherein the first key phrase includes wording that indicates the topic. 12. A robot comprising: the device according to claim 1 ; a casing incorporating the device; and a displacement mechanism displacing the casing. 13. A method in a device performing voice interaction with a plurality of users, wherein the device includes a processor and a non-transitory memory, the method comprising: obtaining image data of an area around the device via a sensor; obtaining audio of the area around the device via a microphone; identifying a person contained in the obtained image data based on the obtained image data and a plurality of image data stored in a memory storing a plurality of image data corresponding to the plurality of users, and outputting user information indicating the identified person, the plurality of users including an adult and a child; extracting a voice from the obtained audio, extracting a feature value of the voice and text data corresponding to the voice, and associating the text data with the feature value and recording the associated text data and feature value in a first database; first determining, based on the user information and the first database, whether the adult and the child are conversing, and when the adult and

Assignees

Inventors

Classifications

  • Mobile robot · CPC title

  • Speech recognition using non-acoustical features · CPC title

  • Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

  • Sensing device · CPC title

  • Execution procedure of a spoken command · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10650815B2 cover?
A topic providing device includes a candidate topic extractor, a provided topic determiner, a voice synthesizer, and a speaker. When a determination is made that a parent and child are conversing and that there is a need to provide a new topic to the parent and child, based on a conversation history database and a child activity database storing at least one activity name indicating an activity…
Who is the assignee on this patent?
Panasonic Ip Man Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 12 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).