Voice interaction apparatus and voice interaction method
US-10388279-B2 · Aug 20, 2019 · US
US10650815B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10650815-B2 |
| Application number | US-201715834030-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 6, 2017 |
| Priority date | Dec 14, 2016 |
| Publication date | May 12, 2020 |
| Grant date | May 12, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A topic providing device includes a candidate topic extractor, a provided topic determiner, a voice synthesizer, and a speaker. When a determination is made that a parent and child are conversing and that there is a need to provide a new topic to the parent and child, based on a conversation history database and a child activity database storing at least one activity name indicating an activity the child was engaged in for a first predetermined period of time, the candidate topic extractor extracts at least one candidate topic that corresponds to the at least one activity name in the child activity database and does not correspond to an activity name included in text data recorded in a first database. From the at least one candidate topic, the provided topic determiner selects one topic to provide to the parent and the child. The voice synthesizer generates voice data containing the one topic. The speaker outputs the voice data.
Opening claim text (preview).
What is claimed is: 1. A device performing voice interaction with a plurality of users, the device comprising: a sensor obtaining image data of an area around the device; a microphone obtaining audio of the area around the device; a speaker; a processor; and a non-transitory memory storing thereon a computer program, which when executed by the processor, causes the processor to perform operations including storing a plurality of image data corresponding to the plurality of users, the plurality of users including an adult and a child; identifying a person contained in the obtained image data based on the obtained image data and the stored plurality of image data, and outputting user information indicating the identified person; extracting a voice from the obtained audio, extracting a feature value of the voice and text data corresponding to the voice, and associating the text data with the feature value and recording the associated text data and feature value in a first database; first determining, based on the user information and the first database, whether the adult and the child are conversing, and determining that the adult and the child are conversing when the adult and the child are the identified persons and the feature value contains a plurality of mutually dissimilar feature values; second determining, based on the first database, whether there is a need to provide a new topic to the adult and the child when the adult and the child are determined to be conversing, and determining that there is a need to provide a new topic to the adult and the child when a first key phrase is contained in the text data indicating the conversation between the adult and the child during a current predetermined period of time; extracting at least one candidate topic based on the first database and a second database when providing the topic is determined to be necessary, the second database storing at least one activity name indicating an activity the child was engaged in for a first predetermined period of time, which is earlier than the current predetermined period of time, the at least one candidate topic corresponding to the at least one activity name in the second database and not corresponding to the at least one activity name included in the text data indicating the conversation between the adult and the child during the current predetermined period of time recorded in the first database; selecting from the at least one candidate topic one topic to provide to the adult and the child; generating voice data containing the one topic; and outputting the generated voice data via the speaker. 2. The device according to claim 1 , wherein the second database further stores movement amount information indicating an amount of movement corresponding to the activity name, audio level information indicating an audio level corresponding to the activity name, and date information indicating a date corresponding to the activity name, in the extracting, specifying the newest activity name based on the second database and extracting, as the at least one candidate topic, at least one second activity name different from the newest activity name and the at least one activity name included in the text data, and in the selecting, selecting, as the one topic, a third activity name from the at least one second activity name based on a first movement amount corresponding to the newest activity name, a first audio level corresponding to the newest activity name, a second movement amount corresponding to the at least one second activity name among the activity names, and a second audio level corresponding to the at least one second activity name. 3. The device according to claim 2 , wherein in the selecting, selecting, as the third activity name, the second activity name having the largest sum calculated according to the following formula: ( A−B ) 2 +( C−D ) 2 where A represents the first movement amount, B represents the second movement amount, C represents the first audio level, and D represents the second audio level. 4. The device according to claim 2 , wherein in the extracting, extracting, as the at least one candidate topic, at least one second activity name different from the newest activity name and the at least one activity name included in the text data, the at least one second activity name being recorded in a second predetermined period of time. 5. The device according to claim 2 , wherein the movement amount information is a value obtained by multiplying a first coefficient by the movement amount, and the audio level information is a value obtained by multiplying a second coefficient by the audio level. 6. The device according to claim 2 , wherein in the generating, based on the second database, when a third movement amount corresponding to the third activity name is equal to or greater than a first threshold value generating the voice data containing a second key phrase and, based on the second database, when the third movement amount corresponding to the third activity name is less than the first threshold value, generating the voice data containing a third key phrase. 7. The device according to claim 6 , wherein the second key phrase and the third key phrase contain phrasing providing feedback on the child's engagement level in the third activity name, and a meaning indicated by the second key phrase is the opposite of a meaning indicated by the third key phrase. 8. The device according to claim 2 , wherein in the generating based on the second database, when a third audio level corresponding to the third activity name is equal to or greater than a first threshold value, generating the voice data containing a second key phrase and, based on the second database, when the third audio level corresponding to the third activity name is less than the first threshold value, generating the voice data containing a third key phrase. 9. The device according to claim 8 , wherein the second key phrase and the third key phrase contain phrasing providing feedback on the child's engagement level in the third activity name, and a meaning indicated by the second key phrase is the opposite of a meaning indicated by the third key phrase. 10. The device according to claim 1 , wherein the feature value contains a voice-print of a speaker from whom a voice issues. 11. The device according to claim 1 , wherein the first key phrase includes wording that indicates the topic. 12. A robot comprising: the device according to claim 1 ; a casing incorporating the device; and a displacement mechanism displacing the casing. 13. A method in a device performing voice interaction with a plurality of users, wherein the device includes a processor and a non-transitory memory, the method comprising: obtaining image data of an area around the device via a sensor; obtaining audio of the area around the device via a microphone; identifying a person contained in the obtained image data based on the obtained image data and a plurality of image data stored in a memory storing a plurality of image data corresponding to the plurality of users, and outputting user information indicating the identified person, the plurality of users including an adult and a child; extracting a voice from the obtained audio, extracting a feature value of the voice and text data corresponding to the voice, and associating the text data with the feature value and recording the associated text data and feature value in a first database; first determining, based on the user information and the first database, whether the adult and the child are conversing, and when the adult and
Mobile robot · CPC title
Speech recognition using non-acoustical features · CPC title
Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title
Sensing device · CPC title
Execution procedure of a spoken command · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.