System and method for extracting and displaying speaker information in an ATC transcription

US11961524B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11961524-B2
Application numberUS-202117305913-A
CountryUS
Kind codeB2
Filing dateJul 16, 2021
Priority dateMay 27, 2021
Publication dateApr 16, 2024
Grant dateApr 16, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system for extracting speaker information in an ATC transcription and displaying the speaker information on a graphical display unit is provided. The system is configured to: segment a stream of audio received from an ATC and other aircraft into a plurality of chunks; determine, for each chunk, if the speaker is enrolled in an enrolled speaker database; when the speaker is enrolled in the enrolled speaker database, decode the chunk using a speaker-dependent automatic speech recognition (ASR) model and tag the chunk with a permanent name for the speaker; when the speaker is not enrolled in the enrolled speaker database, assign a temporary name for the speaker, tag the chunk with the temporary name, and decode the chunk using a speaker independent speech recognition model; format the decoded chunk as text; and signal the graphical display unit to display the formatted text along with an identity for the speaker.

First claim

Opening claim text (preview).

What is claimed is: 1. A flight deck system for extracting speaker information in an ATC (Air Traffic Controller) conversation and displaying the speaker information on a graphical display unit, the flight deck system comprising a controller configured to: segment a stream of audio received over radio from an ATC and other aircraft into a plurality of chunks, wherein each chunk has a speaker; extract vocal cord and prosody based features from a chunk; generate a plurality of similarity scores for the extracted vocal cord and prosody based features for the chunk, wherein each similarity score of the plurality of similarity scores is based on a comparison of the extracted vocal cord and prosody based features for the chunk with a different model file from a plurality of model files in an enrolled speaker database for a plurality of speakers from the enrolled speaker database, wherein the plurality of model files are associated with different speakers from the enrolled speaker database; when a specific similarity score from the plurality of similarity scores determined based on the comparison of the extracted vocal cord and prosody based features with the plurality of model files in the enrolled speaker database for the plurality of speakers exceeds a threshold level, associate the chunk with a particular speaker from the enrolled speaker database, decode the chunk using a speaker-dependent automatic speech recognition (ASR) model that is specific for the speaker, and tag the chunk with a permanent name for the speaker; when a specific similarity score from the plurality of similarity scores determined based on the comparison of the extracted vocal cord and prosody based features with the plurality of model files in the enrolled speaker database for the plurality of speakers does not exceed a threshold level, assign a temporary name for the speaker of the chunk, tag the chunk with the temporary name, and decode the chunk using a speaker independent speech recognition model; format the decoded chunk as text; signal the graphical display unit to display the formatted text along with an identity for the speaker of the formatted text, the identity comprising the permanent name of the speaker, or the temporary name assigned to the speaker; and enroll a non-enrolled speaker into the speaker database and create a speaker-dependent ASR model for the non-enrolled speaker after a predetermined number of chunks of audio from the non-enrolled speaker are received. 2. The flight deck system of claim 1 , wherein the controller is further configured to: receive flight crew member initiated substitution of a temporary name with a flight crew member inputted replacement name; and signal the graphical display unit to display the identity for the speaker of the formatted text using the replacement name. 3. The flight deck system of claim 1 , wherein when the speaker for a chunk is not enrolled as a speaker in the speaker database, the controller is further configured to: accumulate a predetermined number of chunks of audio that are tagged with a same temporary name; and when the predetermined number of chunks is reached, generate a speaker dependent ASR model for the speaker with the temporary name using the predetermined number of accumulated chunks; and enroll the speaker with the temporary name and the speaker dependent ASR model for the speaker with the temporary name in the speaker database. 4. The flight deck system of claim 1 , wherein: to decode the chunk the controller is configured to decode the chunk as raw text; and to format the decoded chunk the controller is configured to format the raw text as formatted text using natural language processing, an expert system, or a rule-based system. 5. The flight deck system of claim 1 , wherein the controller is further configured to signal the graphical display unit to display, along with the formatted text and the identity for the speaker of the formatted text, extracted information including count and duration of messages for the speaker of the formatted text, number of total speakers during a flight journey, and percentage of messages in the flight journey for the speaker of the formatted text. 6. The flight deck system of claim 1 , wherein the controller is further configured to save the extracted vocal cord and prosody based features for the chunk as a model file for a speaker with a temporary name when none of the plurality of similarity scores for the extracted vocal cord and prosody based features exceeds the threshold level. 7. The flight deck system of claim 6 , wherein to accumulate a predetermined number of chunks of audio that are tagged with a same temporary name the controller is configured to: generate a similarity score for the extracted vocal cord and prosody based features based on a comparison of the extracted vocal cord and prosody based features with the model file for the speaker with the temporary name; and when the similarity score determined based on the comparison of the extracted vocal cord and prosody based features to the model file for the speaker with the temporary name exceeds a threshold level, associate the chunk with the speaker with the temporary name. 8. The flight deck system of claim 7 , wherein the controller is further configured to update the speaker database with the enrollment of unknown speakers and speaker dependent speech recognition models iteratively. 9. The flight deck system of claim 1 , wherein to signal the graphical display unit to display the formatted text along with an identity for the speaker of the formatted text the controller is configured to: filter out ATC conversations with traffic aircraft based on flight crew member inputted filter criteria; and signal the graphical display unit to display the formatted text along with an identity for the speaker of the formatted text for ATC conversations that were not filtered out. 10. A method in a flight deck system for extracting speaker information in an ATC (Air Traffic Controller) conversation and displaying the speaker information on a graphical display unit, the method comprising: segmenting a stream of audio received over radio from an ATC and other aircraft into a plurality of chunks, wherein each chunk has a speaker; extracting vocal cord and prosody based features from a chunk; generating a plurality of similarity scores for the extracted vocal cord and prosody based features for the chunk, wherein each similarity score of the plurality of similarity scores is based on a comparison of the extracted vocal cord and prosody based features for the chunk with a different model file from a plurality of model files in an enrolled speaker database for a plurality of speakers from the enrolled speaker database, wherein the plurality of model files are associated with different speakers from the enrolled speaker database; when a specific similarity score from the plurality of similarity scores determined based on the comparison of the extracted vocal cord and prosody based features with the plurality of model files in the enrolled speaker database for the plurality of speakers exceeds a threshold level, associate the chunk with a particular speaker from the enrolled speaker database, decoding the chunk using a speaker-dependent automatic speech recognition (ASR) model that is specific for the speaker and tagging the chunk with a permanent name for the speaker; when a specific similarity score from the plurality of similarity scores determined based on the comparison of the extracted vocal cord and prosody based features with the plurality of model files in the enrolled speaker database for the plurality of speakers does not exceed a threshold level, assigning a

Assignees

Inventors

Classifications

  • for a single aircraft · CPC title

  • for cruising · CPC title

  • Transmission of traffic-related information between aircraft and ground stations · CPC title

  • located onboard the aircraft · CPC title

  • G10L17/04Primary

    Training, enrolment or model building · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11961524B2 cover?
A system for extracting speaker information in an ATC transcription and displaying the speaker information on a graphical display unit is provided. The system is configured to: segment a stream of audio received from an ATC and other aircraft into a plurality of chunks; determine, for each chunk, if the speaker is enrolled in an enrolled speaker database; when the speaker is enrolled in the enr…
Who is the assignee on this patent?
Honeywell Int Inc
What technology area does this patent fall under?
Primary CPC classification G10L17/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 16 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).