Method and apparatus for dialogue understandability assessment

US12400676B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12400676-B2
Application numberUS-202217846864-A
CountryUS
Kind codeB2
Filing dateJun 22, 2022
Priority dateDec 23, 2019
Publication dateAug 26, 2025
Grant dateAug 26, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method comprises: obtaining a mixed soundtrack that includes dialogue mixed with non-dialogue sound; converting the mixed soundtrack to comparison text; obtaining reference text for the dialogue as a reference for intelligibility of the dialogue; determining a measure of intelligibility of the dialogue of the mixed soundtrack to a listener based on a comparison of the comparison text against the reference text; and reporting the measure of intelligibility of the dialogue.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving an original mixed soundtrack that includes dialogue mixed with non-dialogue sound; acoustically modifying, by an audio-visual device, the original mixed soundtrack with emulated sound effects to produce a mixed soundtrack, wherein the emulated sound effects emulate frequency responses of one or more room acoustics, sound reproduction system playback acoustics, or background noise; converting the mixed soundtrack to comparison text using automatic speech recognition (ASR); obtaining reference text for the dialogue as a reference for intelligibility of the dialogue; determining a measure of intelligibility of the dialogue of the mixed soundtrack to a listener based on a comparison of the comparison text against the reference text, wherein the determining the measure of intelligibility of the dialogue includes; computing individual measures of intelligibility of the dialogue for time slices of the mixed soundtrack based on the comparison by determining differences between segments of the comparison text corresponding to the time slices of the mixed soundtrack and corresponding segments of the reference text; and computing the measure of intelligibility of the dialogue based on the individual measures of intelligibility of the dialogue; determining whether the measure of intelligibility of the dialogue indicates a degraded intelligibility; and in response to the measure of intelligibility of the dialogue indicating the degraded intelligibility, producing a second mixed soundtrack. 2. The method of claim 1 , wherein the reporting includes: displaying the measure of intelligibility of the dialogue and the individual measures of intelligibility of the dialogue. 3. The method of claim 1 , wherein the reporting includes: displaying the measure of intelligibility of the dialogue, the individual measures of Intelligibility of the dialogue, the segments of the comparison text, and the corresponding ones of the segments of the reference text. 4. The method of claim 1 , further comprising: generating metadata configured for a digital reproduction device and that includes at least the individual measures of intelligibility of the dialogue. 5. The method of claim 1 , wherein: the reference text includes chunks of subtitle text that span respective time intervals; and the determining the measure of intelligibility includes determining individual differences between (i) segments of the comparison text corresponding to the time slices of the mixed soundtrack, and (ii) corresponding ones of the chunks of subtitle text that convey common dialogue to the segments of the comparison text. 6. The method of claim 5 , further comprising: matching the segments of the comparison text to the corresponding ones of the chunks of subtitle text using a text matching algorithm that maximizes text similarity between each of the segments of the comparison text and matching ones of the chunks of subtitle text, wherein the determining the individual differences includes determining the individual differences based on results of the matching. 7. The method of claim 1 , wherein the obtaining the reference text includes converting a dialogue-only soundtrack to the reference text. 8. The method of claim 1 , wherein the obtaining the reference text includes receiving text-based subtitles of the dialogue as the reference text. 9. The method of claim 1 , wherein the converting includes: using a machine-learning dialogue extractor, extracting the dialogue from the mixed soundtrack to produce a predominantly dialogue soundtrack; and converting the predominantly dialogue soundtrack to the comparison text. 10. The method of claim 1 , wherein the determining the measure of intelligibility of the dialogue includes computing a difference between the comparison text and the reference text, and computing the measure of intelligibility of the dialogue based on the difference. 11. The method of claim 10 , wherein the computing the difference includes computing the difference as a text distance representative of differences in letters or words, or as a phonetic text distance representative of differences in sound. 12. The method of claim 10 , wherein the computing the difference includes: computing a first difference between the comparison text and the reference text using a first compare algorithm; computing a second difference between the comparison text and the reference text using a second compare algorithm that is different from the first compare algorithm; and computing the difference as a weighted combination of the first difference and the second difference. 13. An apparatus comprising: a processor configured to: receive an original mixed soundtrack that includes dialogue mixed with non-dialogue sound; acoustically modify, by an audio-visual device, the original mixed soundtrack with emulated sound effects to produce a mixed soundtrack, wherein the emulated sound effects emulate frequency responses of one or more room acoustics, sound reproduction system playback acoustics, or background noise; convert the mixed soundtrack to comparison text using automatic speech recognition (ASR); obtain reference text for the dialogue as a reference for intelligibility of the dialogue to a listener; compute individual measures of intelligibility of the dialogue of the mixed soundtrack based on a comparison between the comparison text and the reference text by determining differences between segments of the comparison text corresponding to time slices of the mixed soundtrack and corresponding segments of the reference text; compute an overall measure of intelligibility of the dialogue of the mixed soundtrack based on the individual measures of intelligibility of the dialogue; generate a report including the overall measure of intelligibility of the dialogue; determine whether the measure of intelligibility of the dialogue indicates a degraded intelligibility; and in response to the measure of intelligibility of the dialogue indicating the degraded intelligibility, producing a second mixed soundtrack. 14. The apparatus of claim 13 , wherein the processor is configured to obtain the reference text by receiving text-based subtitles of the dialogue as the reference text. 15. A non-transitory computer readable medium encoded with instructions that, when executed by a processor, cause the processor to: receive an original mixed soundtrack that includes dialogue mixed with non-dialogue sound; acoustically modify, by an audio-visual device, the original mixed soundtrack with emulated sound effects to produce a mixed soundtrack, wherein the emulated sound effects emulate frequency responses of one or more room acoustics, sound reproduction system playback acoustics, or background noise; convert time slices of the mixed soundtrack to comparison text using automatic speech recognition (ASR); obtain reference text for the dialogue as a reference for intelligibility of the dialogue; compute individual measures of intelligibility of the dialogue of the mixed soundtrack for the time slices based on differences between the comparison text and the reference text by determining differences between segments of the comparison text corresponding to the time slices of the mixed soundtrack and corresponding segments of the reference text; compute an overall measure of intelligibility of the dialogue of the mixed soundtrack based on the individual measures of intelligibility of the dialogue; generate a report including the overall measure of intelligibility of the dialogue and the individual measures

Assignees

Inventors

Classifications

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

  • for evaluating synthetic or decoded voice signals · CPC title

  • G10L25/60Primary

    for measuring the quality of voice signals · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12400676B2 cover?
A method comprises: obtaining a mixed soundtrack that includes dialogue mixed with non-dialogue sound; converting the mixed soundtrack to comparison text; obtaining reference text for the dialogue as a reference for intelligibility of the dialogue; determining a measure of intelligibility of the dialogue of the mixed soundtrack to a listener based on a comparison of the comparison text against …
Who is the assignee on this patent?
Dts Inc
What technology area does this patent fall under?
Primary CPC classification G10L25/60. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 26 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).