Speech recognition method and electronic apparatus
US-2015112675-A1 · Apr 23, 2015 · US
US10403274B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10403274-B2 |
| Application number | US-201615264722-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 14, 2016 |
| Priority date | Sep 15, 2015 |
| Publication date | Sep 3, 2019 |
| Grant date | Sep 3, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An automatic speech recognition with detection of at least one contextual element, and application to aircraft flying and maintenance are provided. The automatic speech recognition device comprises a unit for acquiring an audio signal, a device for detecting the state of at least one contextual element, and a language decoder for determining an oral instruction corresponding to the audio signal. The language decoder comprises at least one acoustic model defining an acoustic probability law and at least two syntax models each defining a syntax probability law. The language decoder also comprises an oral instruction construction algorithm implementing the acoustic model and a plurality of active syntax models taken from among the syntax models, a contextualization processor to select, based on the state of the order each contextual element detected by the detection device, at least one syntax model selected from among the plurality of active syntax models, and a processor for determining the oral instruction corresponding to the audio signal.
Opening claim text (preview).
What is claimed is: 1. An automatic speech recognition device comprising: an acquisition unit for acquiring an audio signal, a forming member for forming the audio signal, to divide the audio signal into frames, a detection device, and a language decoder for determining an oral instruction corresponding to the audio signal, the detection device being a gaze detector configured to detect which of a plurality states is represented by a direction of a user's gaze and/or a pointing detector configured to detect which of a plurality states is represented by a position of a pointing member, the language decoder comprising: at least one acoustic model defining an acoustic probability law for calculating, for each phoneme of a sequence of phonemes, an acoustic probability of that phoneme and a corresponding frame of the audio signal matching; at least two different syntax models, each of the syntax models being associated with a respective one of the states of the direction of the user's gaze detected by the gazed detector and/or one of the states of the position of the pointing member detected by the pointing detector or a respective combination of the states, each of the syntax models being definable as active or inactive, each of the active syntax models defining a different respective syntax probability law for calculating, for each phoneme of a sequence of phonemes analyzed using said acoustic model, a different respective syntax probability of that phoneme following the phoneme or group of phonemes preceding said phoneme in the sequence of phonemes; an oral instruction construction algorithm implementing the acoustic model and a plurality of the active syntax models from among the syntax models to build, for each active syntax model, a candidate sequence of phonemes associated with said active syntax model so that the product of the acoustic and the respective different syntax probabilities of the different phonemes making up said candidate sequence of phonemes is maximal; a contextualization processor to select at least one syntax model selected from among the plurality of active syntax models based on the state of the direction of the user's gaze detected by the gazed detector and/or the state of the position of the pointing member detected by the pointing detector; and a determination processor for determining the oral instruction corresponding to the audio signal, to define the candidate sequence of phonemes associated with the selected syntax model or, if several syntax models are selected, the sequence of phonemes, from among the candidate sequences of phonemes associated with the selected acoustic models, for which the product of the acoustic and syntax probabilities of different phonemes making up said sequence of phonemes is maximal, as constituting the oral instruction corresponding to the audio signal. 2. The automatic speech recognition device according to claim 1 , wherein the contextualization processor is configured for: assigning, based on the state of the direction of the user's gaze detected by the gazed detector and/or the state of the position of the pointing member detected by the pointing detector, an order number to each active syntax model, seeking, among the active syntax models, candidate syntax models with which candidate sequences of phonemes are associated for which the product of the acoustic and syntax probabilities of the different phonemes making up said candidate sequences of phonemes is above a predetermined threshold, and selecting the candidate syntax model(s) having the highest order number. 3. The automatic speech recognition device according to claim 1 , wherein the pointing member is a cursor. 4. The automatic speech recognition device as recited in claim 1 wherein the contextualization processor is configured for: assigning, based on the state of the direction of the user's gaze detected by the gazed detector and/or the state of the position of the pointing member detected by the pointing detector, an order number to each active syntax model, seeking, among the active syntax models, candidate syntax models with which candidate sequences of phonemes are associated for which the product of the acoustic and syntax probabilities of the different phonemes making up said candidate sequences of phonemes is above a predetermined threshold, and selecting the candidate syntax model(s) having the highest order number, the automatic speech recognition device further comprising a display device displaying objects, each syntax model being associated with a respective object from among the displayed objects, the contextualization processor being configured for assigning an order number thereof to each syntax model based on the distance between the direction of the user's gaze or the position of the pointer and the displayed object with which said syntax model is associated. 5. An assistance system to assist with the piloting or maintenance of an aircraft, comprising: the automatic speech recognition device according to claim 1 ; and a command execution unit configured to execute the oral instruction corresponding to the audio signal. 6. An automatic speech recognition method comprising: determining an oral instruction corresponding to an audio signal, the determining of the oral instruction being implemented by an automatic speech recognition device comprising: at least one acoustic model defining an acoustic probability law for calculating, for each phoneme of a sequence of phonemes, an acoustic probability of that phoneme and a corresponding frame of the audio signal matching, at least two different syntax models, each of the syntax models being associated with a respective state of a direction of a user's gaze and/or of a position of a pointing member or a respective combination of the states, each of the syntax models being definable as active or inactive, each of the active syntax models defining a different respective syntax probability law for calculating, for each phoneme of a sequence of phonemes analyzed using said acoustic model, a different respective syntax probability of that phoneme following the phoneme or group of phonemes preceding said phoneme in the sequence of phonemes, wherein the determining of the oral instruction comprises: acquiring the audio signal, detecting a detected state represented by a direction of a user's gaze and/or by a position of a pointing member, activating a plurality of syntax models forming active syntax models, forming the audio signal, said forming comprising dividing the audio signal into frames, building, for each active syntax model, using the acoustic model and said active syntax model, a candidate sequence of phonemes associated with said active syntax model so that the product of the acoustic and the respective different syntax probabilities of the different phonemes making up said candidate sequence of phonemes is maximal, selecting at least one syntax model from among the active syntax models based on the detected state of the direction of the user's gaze and/or the detected state of the position of the pointing member, and defining the candidate sequence of phonemes associated with the selected syntax model or, if several syntax models are selected, the sequence of phonemes, from among the candidate sequences of phonemes associated with the selected syntax models, for which the product of the acoustic and syntax probabilities of different phonemes making up said sequence of phonemes is maximal, as constituting the oral instruction corresponding to the audio signal. 7. The automatic speech recognition method according to claim 6 , wherein the selection step comprises the following sub-steps: assigning, based on the detected state of the direction of the user's gaze and/or
Remote controls · CPC title
Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title
of application context · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.