Aircraft speech recognition systems and methods
US-2021233411-A1 · Jul 29, 2021 · US
US11961524B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11961524-B2 |
| Application number | US-202117305913-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 16, 2021 |
| Priority date | May 27, 2021 |
| Publication date | Apr 16, 2024 |
| Grant date | Apr 16, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system for extracting speaker information in an ATC transcription and displaying the speaker information on a graphical display unit is provided. The system is configured to: segment a stream of audio received from an ATC and other aircraft into a plurality of chunks; determine, for each chunk, if the speaker is enrolled in an enrolled speaker database; when the speaker is enrolled in the enrolled speaker database, decode the chunk using a speaker-dependent automatic speech recognition (ASR) model and tag the chunk with a permanent name for the speaker; when the speaker is not enrolled in the enrolled speaker database, assign a temporary name for the speaker, tag the chunk with the temporary name, and decode the chunk using a speaker independent speech recognition model; format the decoded chunk as text; and signal the graphical display unit to display the formatted text along with an identity for the speaker.
Opening claim text (preview).
What is claimed is: 1. A flight deck system for extracting speaker information in an ATC (Air Traffic Controller) conversation and displaying the speaker information on a graphical display unit, the flight deck system comprising a controller configured to: segment a stream of audio received over radio from an ATC and other aircraft into a plurality of chunks, wherein each chunk has a speaker; extract vocal cord and prosody based features from a chunk; generate a plurality of similarity scores for the extracted vocal cord and prosody based features for the chunk, wherein each similarity score of the plurality of similarity scores is based on a comparison of the extracted vocal cord and prosody based features for the chunk with a different model file from a plurality of model files in an enrolled speaker database for a plurality of speakers from the enrolled speaker database, wherein the plurality of model files are associated with different speakers from the enrolled speaker database; when a specific similarity score from the plurality of similarity scores determined based on the comparison of the extracted vocal cord and prosody based features with the plurality of model files in the enrolled speaker database for the plurality of speakers exceeds a threshold level, associate the chunk with a particular speaker from the enrolled speaker database, decode the chunk using a speaker-dependent automatic speech recognition (ASR) model that is specific for the speaker, and tag the chunk with a permanent name for the speaker; when a specific similarity score from the plurality of similarity scores determined based on the comparison of the extracted vocal cord and prosody based features with the plurality of model files in the enrolled speaker database for the plurality of speakers does not exceed a threshold level, assign a temporary name for the speaker of the chunk, tag the chunk with the temporary name, and decode the chunk using a speaker independent speech recognition model; format the decoded chunk as text; signal the graphical display unit to display the formatted text along with an identity for the speaker of the formatted text, the identity comprising the permanent name of the speaker, or the temporary name assigned to the speaker; and enroll a non-enrolled speaker into the speaker database and create a speaker-dependent ASR model for the non-enrolled speaker after a predetermined number of chunks of audio from the non-enrolled speaker are received. 2. The flight deck system of claim 1 , wherein the controller is further configured to: receive flight crew member initiated substitution of a temporary name with a flight crew member inputted replacement name; and signal the graphical display unit to display the identity for the speaker of the formatted text using the replacement name. 3. The flight deck system of claim 1 , wherein when the speaker for a chunk is not enrolled as a speaker in the speaker database, the controller is further configured to: accumulate a predetermined number of chunks of audio that are tagged with a same temporary name; and when the predetermined number of chunks is reached, generate a speaker dependent ASR model for the speaker with the temporary name using the predetermined number of accumulated chunks; and enroll the speaker with the temporary name and the speaker dependent ASR model for the speaker with the temporary name in the speaker database. 4. The flight deck system of claim 1 , wherein: to decode the chunk the controller is configured to decode the chunk as raw text; and to format the decoded chunk the controller is configured to format the raw text as formatted text using natural language processing, an expert system, or a rule-based system. 5. The flight deck system of claim 1 , wherein the controller is further configured to signal the graphical display unit to display, along with the formatted text and the identity for the speaker of the formatted text, extracted information including count and duration of messages for the speaker of the formatted text, number of total speakers during a flight journey, and percentage of messages in the flight journey for the speaker of the formatted text. 6. The flight deck system of claim 1 , wherein the controller is further configured to save the extracted vocal cord and prosody based features for the chunk as a model file for a speaker with a temporary name when none of the plurality of similarity scores for the extracted vocal cord and prosody based features exceeds the threshold level. 7. The flight deck system of claim 6 , wherein to accumulate a predetermined number of chunks of audio that are tagged with a same temporary name the controller is configured to: generate a similarity score for the extracted vocal cord and prosody based features based on a comparison of the extracted vocal cord and prosody based features with the model file for the speaker with the temporary name; and when the similarity score determined based on the comparison of the extracted vocal cord and prosody based features to the model file for the speaker with the temporary name exceeds a threshold level, associate the chunk with the speaker with the temporary name. 8. The flight deck system of claim 7 , wherein the controller is further configured to update the speaker database with the enrollment of unknown speakers and speaker dependent speech recognition models iteratively. 9. The flight deck system of claim 1 , wherein to signal the graphical display unit to display the formatted text along with an identity for the speaker of the formatted text the controller is configured to: filter out ATC conversations with traffic aircraft based on flight crew member inputted filter criteria; and signal the graphical display unit to display the formatted text along with an identity for the speaker of the formatted text for ATC conversations that were not filtered out. 10. A method in a flight deck system for extracting speaker information in an ATC (Air Traffic Controller) conversation and displaying the speaker information on a graphical display unit, the method comprising: segmenting a stream of audio received over radio from an ATC and other aircraft into a plurality of chunks, wherein each chunk has a speaker; extracting vocal cord and prosody based features from a chunk; generating a plurality of similarity scores for the extracted vocal cord and prosody based features for the chunk, wherein each similarity score of the plurality of similarity scores is based on a comparison of the extracted vocal cord and prosody based features for the chunk with a different model file from a plurality of model files in an enrolled speaker database for a plurality of speakers from the enrolled speaker database, wherein the plurality of model files are associated with different speakers from the enrolled speaker database; when a specific similarity score from the plurality of similarity scores determined based on the comparison of the extracted vocal cord and prosody based features with the plurality of model files in the enrolled speaker database for the plurality of speakers exceeds a threshold level, associate the chunk with a particular speaker from the enrolled speaker database, decoding the chunk using a speaker-dependent automatic speech recognition (ASR) model that is specific for the speaker and tagging the chunk with a permanent name for the speaker; when a specific similarity score from the plurality of similarity scores determined based on the comparison of the extracted vocal cord and prosody based features with the plurality of model files in the enrolled speaker database for the plurality of speakers does not exceed a threshold level, assigning a
Related publications grouped by family.
Answers are generated from the same data shown on this page.