Automatic speech recognition with detection of at least one contextual element, and application management and maintenance of aircraft
US-10403274-B2 · Sep 3, 2019 · US
US10650802B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10650802-B2 |
| Application number | US-201816019701-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 27, 2018 |
| Priority date | Jul 5, 2017 |
| Publication date | May 12, 2020 |
| Grant date | May 12, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A voice recognition method is provided that includes extracting a first speech from the sound collected with a microphone connected to a voice processing device, and calculating a recognition result for the first speech and the confidence level of the first speech. The method also includes performing a speech for a repetition request based on the calculated confidence level of the first speech, and extracting with the microphone a second speech obtained through the repetition request. The method further includes calculating a recognition result for the second speech and the confidence level of the second speech, and generating a recognition result from the recognition result for the first speech and the recognition result for the second speech, based on the confidence level of the calculated second speech.
Opening claim text (preview).
What is claimed is: 1. A voice recognition method, comprising: receiving, via a microphone, a first speech that a speaker makes intending one word, the first speech including N phonemes, where N is a natural number of 2 or more; calculating occurrence probabilities of all kinds of phonemes for each of the N phonemes included in the first speech; recognizing a phoneme string, in which phonemes each having the highest probability are lined in order, to be a first phoneme string corresponding to the first speech, the phonemes corresponding to the respective N phonemes from a first phoneme to an N-th phoneme included in the first speech; calculating a first value by multiplying together occurrence probabilities that the N phonemes included in the first phoneme string have; when the first value is smaller than a first threshold, outputting a voice to prompt the speaker to repeat the one word, via a loudspeaker; receiving, via the microphone, a second speech that the speaker repeats intending the one word, the second speech including M phonemes, where M is a natural number of 2 or more; calculating occurrence probabilities of all kinds of phonemes for each of the M phonemes included in the second speech; recognizing a phoneme string, in which phonemes each having the highest probability are lined in order, to be a second phoneme string corresponding to the second speech, the phonemes corresponding to the respective M phonemes from a first phoneme to an M-th phoneme included in the second speech; calculating a second value by multiplying together occurrence probabilities that the M phonemes included in the second phoneme string have; when the second value is smaller than the first threshold, extracting a phoneme having occurrence probability higher than a second threshold out of the first phoneme string and a phoneme having occurrence probability higher than the second threshold out of the second phoneme string; extracting a word including the extracted phonemes from a dictionary stored in a memory, the dictionary associating words with respective phoneme strings; and when the number of extracted words is one, recognizing the extracted word to be the one word. 2. The voice recognition method according to claim 1 , further comprising: when the number of the extracted words is plural, outputting a voice to ask the speaker whether the speaker said each of the extracted words, via the loudspeaker; receiving an affirmative answer or a negative answer from the speaker via the microphone; and recognizing a word corresponding to the affirmative answer to be the one word. 3. A non-transitory computer-readable recording medium, storing a program that causes a computer to execute the voice recognition method according to claim 1 . 4. A voice recognition method, comprising: receiving, via a microphone a, first speech that a speaker makes intending one word string, the first speech including N phonemes, where N is a natural number of 2 or more; calculating a confidence level X1 of a word string estimated for the first speech X 1 = max ∏ t = 1 T P A 1 ( o t , s t | s t - 1 ) P L 1 ( s t , s t - 1 ) where t is a number specifying one of frames constituting the first speech, T is the total number of the frames constituting the first speech, P A1 (o t ,s t |s t-1 ) is a probability that a certain phoneme appears at a t-th frame, which is next to a phoneme string corresponding to a state s t-1 of from a first frame to a (t−1)-th frame of the first speech, and the phoneme string corresponding to the state s t-1 transitions to a phoneme string corresponding to a state s t , o t is a physical quantity that is for estimating the certain phoneme and is obtained from the first speech, the certain phoneme is one of all kinds of phonemes, and P L1 (s t ,s t-1 ) is a probability that a certain word appears at a t-th frame next to a word string corresponding to a state s t-1 , and the word string corresponding to the state s t-1 transitions to a word string corresponding to a state s t in the first speech; determining whether the confidence level X1 is higher than or equal to a threshold; when the confidence level X1 is lower than the threshold, outputting a voice to prompt the speaker to repeat the one word string, via a loudspeaker; receiving, via the microphone, a second speech that the speaker repeats intending the one word string; when the confidence level X1 of the second speech is lower than the threshold, calculating a combined confidence level X for each of all word strings estimated from the first speech and the second speech X = ∏ t = 1 T
Execution procedure of a spoken command · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Phonemes, fenemes or fenones being the recognition units · CPC title
Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.