Voice recognition method, recording medium, voice recognition device, and robot

US10650802B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10650802-B2
Application numberUS-201816019701-A
CountryUS
Kind codeB2
Filing dateJun 27, 2018
Priority dateJul 5, 2017
Publication dateMay 12, 2020
Grant dateMay 12, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A voice recognition method is provided that includes extracting a first speech from the sound collected with a microphone connected to a voice processing device, and calculating a recognition result for the first speech and the confidence level of the first speech. The method also includes performing a speech for a repetition request based on the calculated confidence level of the first speech, and extracting with the microphone a second speech obtained through the repetition request. The method further includes calculating a recognition result for the second speech and the confidence level of the second speech, and generating a recognition result from the recognition result for the first speech and the recognition result for the second speech, based on the confidence level of the calculated second speech.

First claim

Opening claim text (preview).

What is claimed is: 1. A voice recognition method, comprising: receiving, via a microphone, a first speech that a speaker makes intending one word, the first speech including N phonemes, where N is a natural number of 2 or more; calculating occurrence probabilities of all kinds of phonemes for each of the N phonemes included in the first speech; recognizing a phoneme string, in which phonemes each having the highest probability are lined in order, to be a first phoneme string corresponding to the first speech, the phonemes corresponding to the respective N phonemes from a first phoneme to an N-th phoneme included in the first speech; calculating a first value by multiplying together occurrence probabilities that the N phonemes included in the first phoneme string have; when the first value is smaller than a first threshold, outputting a voice to prompt the speaker to repeat the one word, via a loudspeaker; receiving, via the microphone, a second speech that the speaker repeats intending the one word, the second speech including M phonemes, where M is a natural number of 2 or more; calculating occurrence probabilities of all kinds of phonemes for each of the M phonemes included in the second speech; recognizing a phoneme string, in which phonemes each having the highest probability are lined in order, to be a second phoneme string corresponding to the second speech, the phonemes corresponding to the respective M phonemes from a first phoneme to an M-th phoneme included in the second speech; calculating a second value by multiplying together occurrence probabilities that the M phonemes included in the second phoneme string have; when the second value is smaller than the first threshold, extracting a phoneme having occurrence probability higher than a second threshold out of the first phoneme string and a phoneme having occurrence probability higher than the second threshold out of the second phoneme string; extracting a word including the extracted phonemes from a dictionary stored in a memory, the dictionary associating words with respective phoneme strings; and when the number of extracted words is one, recognizing the extracted word to be the one word. 2. The voice recognition method according to claim 1 , further comprising: when the number of the extracted words is plural, outputting a voice to ask the speaker whether the speaker said each of the extracted words, via the loudspeaker; receiving an affirmative answer or a negative answer from the speaker via the microphone; and recognizing a word corresponding to the affirmative answer to be the one word. 3. A non-transitory computer-readable recording medium, storing a program that causes a computer to execute the voice recognition method according to claim 1 . 4. A voice recognition method, comprising: receiving, via a microphone a, first speech that a speaker makes intending one word string, the first speech including N phonemes, where N is a natural number of 2 or more; calculating a confidence level X1 of a word string estimated for the first speech X ⁢ ⁢ 1 = max ⁢ ∏ t = 1 T ⁢ ⁢ P A ⁢ ⁢ 1 ⁡ ( o t , s t | s t - 1 ) ⁢ P L ⁢ ⁢ 1 ⁡ ( s t , s t - 1 ) where t is a number specifying one of frames constituting the first speech, T is the total number of the frames constituting the first speech, P A1 (o t ,s t |s t-1 ) is a probability that a certain phoneme appears at a t-th frame, which is next to a phoneme string corresponding to a state s t-1 of from a first frame to a (t−1)-th frame of the first speech, and the phoneme string corresponding to the state s t-1 transitions to a phoneme string corresponding to a state s t , o t is a physical quantity that is for estimating the certain phoneme and is obtained from the first speech, the certain phoneme is one of all kinds of phonemes, and P L1 (s t ,s t-1 ) is a probability that a certain word appears at a t-th frame next to a word string corresponding to a state s t-1 , and the word string corresponding to the state s t-1 transitions to a word string corresponding to a state s t in the first speech; determining whether the confidence level X1 is higher than or equal to a threshold; when the confidence level X1 is lower than the threshold, outputting a voice to prompt the speaker to repeat the one word string, via a loudspeaker; receiving, via the microphone, a second speech that the speaker repeats intending the one word string; when the confidence level X1 of the second speech is lower than the threshold, calculating a combined confidence level X for each of all word strings estimated from the first speech and the second speech X = ∏ t = 1 T ⁢

Assignees

Inventors

Classifications

  • Execution procedure of a spoken command · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Phonemes, fenemes or fenones being the recognition units · CPC title

  • Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10650802B2 cover?
A voice recognition method is provided that includes extracting a first speech from the sound collected with a microphone connected to a voice processing device, and calculating a recognition result for the first speech and the confidence level of the first speech. The method also includes performing a speech for a repetition request based on the calculated confidence level of the first speech,…
Who is the assignee on this patent?
Panasonic Ip Man Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/02. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 12 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).