Scene based audio mixing for generating audio description content

US2025142139A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025142139-A1
Application numberUS-202318536053-A
CountryUS
Kind codeA1
Filing dateDec 11, 2023
Priority dateOct 31, 2023
Publication dateMay 1, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure generally relates to systems and methods for generating an AD content. In some implementation examples, an AD content system obtains and input audio and an AD narration, and normalizes a loudness of a section of the AD narration using a loudness of the input audio during a scene that the section corresponds to for generating a normalized section. Based on a loudness of the normalized section, the AD content system compresses a first audio channel of the input audio during the scene to generate a first compressed audio channel, and mix the normalized section to the first compressed audio channel during the scene to generate a first sound channel of the AD content.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method comprising: under control of a computing system comprising one or more computer processors configured to execute specific instructions, obtaining an input audio comprising one or more audio channels, the input audio being an audio track of a video content item; obtaining an audio description (AD) narration, wherein the AD narration comprises a plurality of AD sections of narration of the video content item between sections of dialogue of the input audio, a first AD section and a second AD section of the plurality of AD sections respectively corresponding to a first scene and a second scene of the input audio; normalizing, using a first loudness level associated with the one or more audio channels during the first scene, a second loudness level of the first AD section to generate a first normalized AD section with a first normalized loudness level; normalizing, using a third loudness level associated with the one or more audio channels during the second scene, a fourth loudness level of the second AD section to generate a second normalized AD section with a second normalized loudness level; compressing, by a first computer processor of the one or more computer processors, a first audio channel of the one or more audio channels during the first scene based at least in part on the first normalized loudness level of the first normalized AD section to generate a first portion of a first compressed audio channel; compressing, by a second computer processor of the one or more computer processors, the first audio channel of the one or more audio channels during the second scene based at least in part on the second normalized loudness level of the second normalized AD section to generate a second portion of the first compressed audio channel; and mixing the first normalized AD section to the first compressed audio channel during the first scene and the second normalized AD section to the first compressed audio channel during the second scene to generate a first sound channel of an AD content, wherein the AD content comprises the video content item and provides narration of the video content item between the sections of dialogue. 2 . The computer-implemented method of claim 1 , further comprising: adjusting a dynamic range of the first audio channel prior to normalizing the second loudness level of the first AD section and the fourth loudness level of the second AD section. 3 . The computer-implemented method of claim 1 , wherein compressing the first audio channel during the first scene is based on a difference between the first normalized loudness level of the first normalized AD section and the first loudness level associated with the one or more audio channels during the first scene, and wherein compressing the first audio channel during the second scene is based on a difference between the second normalized loudness level of the second normalized AD section and the third loudness level associated with the one or more audio channels during the second scene. 4 . The computer-implemented method of claim 3 , further comprising: determining the difference between the first normalized loudness level of the first normalized AD section and the first loudness level associated with the one or more audio channels during the first scene; determining the difference between the second normalized loudness level of the second normalized AD section and the third loudness level associated with the one or more audio channels during the second scene; classifying the difference between the first normalized loudness level of the first normalized AD section and the first loudness level associated with the one or more audio channels during the first scene to a first range of a plurality of ranges; classifying the difference between the second normalized loudness level of the second normalized AD section and the third loudness level associated with the one or more audio channels during the second scene to a second range of the plurality of ranges; generating, based on the first range, a first parameter set for compressing the first audio channel during the first scene; and generating, based on the second range, a second parameter set for compressing the first audio channel during the second scene. 5 . A system for generating an audio description (AD) content, the system comprising: memory that stores computer-executable instructions; and one or more processors in communication with the memory, wherein the computer-executable instructions, when executed by the one or more processors, cause the one or more processors to: obtain an input audio comprising audio for a video content item; obtain an AD narration, wherein the AD narration comprises a plurality of AD sections, a first AD section of the plurality of AD sections corresponding to a first scene of the input audio; modify, using a loudness level associated with the first scene, a loudness level of the first AD section to generate a first modified AD section; modify, based at least in part on a loudness level of the first modified AD section, the first scene of the input audio to generate a first modified scene; and mix the first modified AD section and the first modified scene to generate a first AD content scene. 6 . The system of claim 5 , wherein a second AD section of the plurality of AD sections corresponds to a second scene of the input audio, and wherein the computer-executable instructions, when executed, further cause the one or more processors to: modify, using a loudness level associated with the second scene, a loudness level of the second AD section to generate a second modified AD section; modify, based at least in part on a loudness level of the second modified AD section, the second scene of the input audio to generate a second modified scene; and mix the second modified AD section and the second modified scene to generate a second AD content scene. 7 . The system of claim 6 , wherein the input audio comprises a third scene between the first scene and the second scene, and wherein the third scene of the input audio is unmodified. 8 . The system of claim 7 , wherein the computer-executable instructions, when executed, further cause the one or more processors to: concatenate the first scene of the input audio and the third scene of the input audio. 9 . The system of claim 5 , wherein the input audio comprises one or more audio channels of the audio for the video content item. 10 . The system of claim 9 , wherein the computer-executable instructions, when executed, further cause the one or more processors to: boost the first AD content scene. 11 . The system of claim 5 , wherein the computer-executable instructions, when executed, further cause the one or more processors to: insert the first modified AD section, according to a start time or an end time of the first scene, to a silent audio file to generate a normalized narration file. 12 . The system of claim 11 , wherein a duration of the normalized narration file equals a duration of the input audio. 13 . The system of claim 5 , wherein the computer-executable instructions, when executed, further cause the one or more processors to: generate, based on an AD script, the AD narration. 14 . The system of claim 13 , wherein the AD script is generated by a machine learning (ML) model or a human operator. 15 . The system of claim 13 , wherein the AD narration is generated using a computer synthesized speech voice. 16 . The system of claim 5 , wherein the computer-executable instructions, wh

Assignees

Inventors

Classifications

  • involving reformatting operations of audio signals, e.g. by converting from one coding standard to another (details of audio signal transcoding G10L19/173) · CPC title

  • involving special audio data, e.g. different tracks for different languages · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025142139A1 cover?
The present disclosure generally relates to systems and methods for generating an AD content. In some implementation examples, an AD content system obtains and input audio and an AD narration, and normalizes a loudness of a section of the AD narration using a loudness of the input audio during a scene that the section corresponds to for generating a normalized section. Based on a loudness of th…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification H04N21/2335. Mapped technology areas include Electricity.
When was this patent published?
Publication date Thu May 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).