Methods and systems for cockpit speech recognition acoustic model training with multi-level corpus data augmentation

US2020335084A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2020335084-A1
Application numberUS-201916388647-A
CountryUS
Kind codeA1
Filing dateApr 18, 2019
Priority dateApr 18, 2019
Publication dateOct 22, 2020
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for initializing a device for performing acoustic speech recognition (ASR) using an ASR model, by a computer system including at least one processor and a system memory element. The method includes obtaining a plurality of voice data articulations of predetermined phrases, by the at least one processor via a user interface. The plurality of voice data articulations includes a first quantity of audio samples of actual articulated voice data, and each of the plurality of voice data articulations includes one of the audio samples including acoustic frequency components. The method further includes performing a plurality of augmentations to the plurality of voice data articulations of predetermined phrases, to generate a corpus audio data set that includes the first quantity of audio samples and a second quantity of audio samples including augmented versions of the first quantity of audio samples.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for initializing a device for performing acoustic speech recognition (ASR) using an ASR model, by a computer system comprising at least one processor and a system memory element, the method comprising: obtaining a plurality of voice data articulations of predetermined phrases, by the at least one processor via a user interface, wherein the plurality of voice data articulations comprises a first quantity of audio samples of actual articulated voice data, and wherein each of the plurality of voice data articulations comprises one of the audio samples including acoustic frequency components; performing a plurality of augmentations to the plurality of voice data articulations of predetermined phrases, to generate a corpus audio data set that includes the first quantity of audio samples and a second quantity of audio samples comprising augmented versions of the first quantity of audio samples, by: performing a first level augmentation by processing each of the plurality of voice data articulations to enhance a first subset of the acoustic frequency components and to suppress a second subset of the acoustic frequency components, to generate transformed voice data articulations including a plurality of voice transformations; and performing a second level augmentation by processing the transformed voice data articulations, by: combining the transformed voice data articulations with noise-based audio data, to generate combined voice data articulations, by the at least one processor; and adjusting levels of the noise-based audio data for each of the combined voice data articulations to generate the corpus audio data set including various noise levels, by the at least one processor, wherein each audio sample of the corpus audio data set includes one of the plurality of voice transformations and one of the various noise levels; and training the ASR model to perform ASR, using the corpus audio data set, by the at least one processor. 2 . The method of claim 1 , wherein the device is implemented in an aircraft, and wherein obtaining the plurality of voice articulations is performed using a headset comprising a microphone and a speaker that is communicatively coupled with the aircraft. 3 . The method of claim 1 , wherein performing the first level augmentation comprises utilizing a voice random transformation algorithm that selects the first and second subsets at random. 4 . The method of claim 1 , wherein the first subset of the acoustic frequency components comprises frequency components of the same frequency range. 5 . The method of claim 1 , wherein the second subset of the acoustic frequency components comprises frequency components of the same frequency range. 6 . The method of claim 1 , wherein the device is implemented in an aircraft, and wherein adjusting the noise-based audio data comprises adjusting cockpit noise profile data. 7 . The method of claim 1 , wherein the device is implemented in an aircraft, and wherein the method further comprises receiving a flight crew or air traffic control voice communication and automatically recognizing words spoken in the voice communication using the ASR model. 8 . The method of claim 7 , further comprising automatically performing an aircraft function based on the recognized words spoken. 9 . The method of claim 1 , further comprising generating an updated ASR model using a further plurality of voice data articulations of the predetermined phrases subsequently received via the user interface. 10 . The method of claim 1 , further comprising initiating an upload of the ASR model into the device for performing the ASR, by the at least one processor. 11 . A computer system for performing acoustic speech recognition (ASR) using an ASR model, comprising: a system memory element; a user interface; and at least one processor, wherein the at least one processor is configured to: obtain a plurality of voice data articulations of predetermined phrases via the user interface, wherein the plurality of voice data articulations comprises a first quantity of audio samples of actual articulated voice data, and wherein each of the plurality of voice data articulations comprises one of the audio samples including acoustic frequency components; perform a plurality of augmentations to the plurality of voice data articulations of predetermined phrases, to generate a corpus audio data set that includes the first quantity of audio samples and a second quantity of audio samples comprising augmented versions of the first quantity of audio samples, by: performing a first level augmentation by processing each of the plurality of voice data articulations to enhance a first subset of the acoustic frequency components and to suppress a second subset of the acoustic frequency components, to generate transformed voice data articulations including a plurality of voice transformations; and performing a second level augmentation by processing the transformed voice data articulations, by: combining the transformed voice data articulations with noise-based audio data, to generate combined voice data articulations; and adjusting levels of the noise-based audio data for each of the combined voice data articulations to generate the corpus audio data set including various noise levels, wherein each audio sample of the corpus audio data set includes one of the plurality of voice transformations and one of the various noise levels; and train the ASR model to perform ASR, using the corpus audio data set. 12 . The computer system of claim 11 , wherein the computer system in an aircraft, and wherein the at least one processor is configured to obtain the plurality of voice articulations via electronic communication with a headset comprising a microphone and a speaker that is communicatively coupled with the aircraft. 13 . The computer system of claim 11 , wherein the at least one computer processor is configured to perform the first level augmentation comprises utilizing a voice random transformation algorithm that selects the first and second subsets at random. 14 . The computer system of claim 11 , wherein the first subset of the acoustic frequency components comprises frequency components of the same frequency range. 15 . The computer system of claim 11 , wherein the second subset of the acoustic frequency components comprises frequency components of the same frequency range. 16 . The computer system of claim 11 , wherein the computer system is implemented in an aircraft, and wherein the at least one processor is configured to adjust cockpit noise profile data. 17 . The computer system of claim 11 , wherein the computer system is implemented in an aircraft, and wherein the at least one processor is further configured to receive a flight crew or air traffic control voice communication and automatically recognize words spoken in the voice communication using the ASR model. 18 . The computer system of claim 17 , wherein the at least one processor is further configured to command automatically performing an aircraft function based on the recognized words spoken. 19 . The computer system of claim 11 , wherein the at least one processor is further configured to generate an updated ASR model using a further plurality of voice data articulations of the predetermined phrases subsequently received via the user interface. 20 . An aircraft comprising the computer system of claim 11 .

Assignees

Inventors

Classifications

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Execution procedure of a spoken command · CPC title

  • G10L15/063Primary

    Training · CPC title

  • actuated automatically, e.g. responsive to gust detectors · CPC title

  • Arrangements or adaptations of instruments · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020335084A1 cover?
A method for initializing a device for performing acoustic speech recognition (ASR) using an ASR model, by a computer system including at least one processor and a system memory element. The method includes obtaining a plurality of voice data articulations of predetermined phrases, by the at least one processor via a user interface. The plurality of voice data articulations includes a first qua…
Who is the assignee on this patent?
Honeywell Int Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Oct 22 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).