What technology area does this patent fall under?

Primary CPC classification G06N3/0455. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 08 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for upmixing audiovisual data

US12273697B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12273697-B2
Application number	US-202018042258-A
Country	US
Kind code	B2
Filing date	Aug 26, 2020
Priority date	Aug 26, 2020
Publication date	Apr 8, 2025
Grant date	Apr 8, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method for upmixing audiovisual data can include obtaining audiovisual data including input audio data and video data accompanying the input audio data. Each frame of the video data can depict only a portion of a larger scene. The input audio data can have a first number of audio channels. The computer-implemented method can include providing the audiovisual data as input to a machine-learned audiovisual upmixing model. The audiovisual upmixing model can include a sequence-to-sequence model configured to model a respective location of one or more audio sources within the larger scene over multiple frames of the video data. The computer-implemented method can include receiving upmixed audio data from the audiovisual upmixing model. The upmixed audio data can have a second number of audio channels. The second number of audio channels can be greater than the first number of audio channels.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for upmixing audiovisual data, the computer-implemented method comprising: obtaining, by a computing system comprising one or more computing devices, audiovisual data comprising input audio data and video data accompanying the input audio data, wherein each frame of the video data depicts only a portion of a larger scene, and wherein the input audio data has a first number of audio channels; providing, by the computing system, the audiovisual data as input to a machine-learned audiovisual upmixing model, the audiovisual upmixing model comprising a sequence-to-sequence model configured to model a respective location of one or more audio sources within the larger scene over multiple frames of the video data; and receiving, by the computing system, upmixed audio data from the audiovisual upmixing model, the upmixed audio data having a second number of audio channels, the second number of audio channels greater than the first number of audio channels. 2. The computer-implemented method of claim 1 , wherein the audiovisual upmixing model comprises an encoder-decoder model. 3. The computer-implemented method of claim 1 , wherein the audiovisual upmixing model comprises a transformer model. 4. The computer-implemented method of claim 1 , wherein the audiovisual upmixing model comprises an attention mechanism. 5. The computer-implemented method of claim 4 , wherein the attention mechanism comprises a plurality of context vectors and an alignment model. 6. The computer-implemented method of claim 1 , wherein the audiovisual upmixing model comprises a plurality of input streams, each of the plurality of input streams corresponding to a respective audio channel of the input audio data, and a plurality of output streams, each of the plurality of output streams corresponding to a respective audio channel of the upmixed audio data. 7. The computer-implemented method of claim 1 , wherein the video data comprises two-dimensional video data. 8. The computer-implemented method of claim 1 , wherein the input audio data comprises mono audio data, the mono audio data having a single audio channel. 9. The computer-implemented method of claim 1 , wherein the upmixed audio data comprises stereo audio data, the stereo audio data having a left audio channel and a right audio channel. 10. The computer-implemented method of claim 1 , wherein the input audio data comprises stereo audio data, the stereo audio data having a left audio channel and a right audio channel. 11. The computer-implemented method of claim 1 , wherein the upmixed audio data comprises surround sound audio data, the surround sound audio data having three or more audio channels. 12. The computer-implemented method of claim 1 , wherein training the machine-learned audiovisual upmixing model comprises: obtaining, by the computing system, audiovisual training data comprising video training data and audio training data having the second number of audio channels; downmixing, by the computing system, the audio training data to produce downmixed audio training data comprising the first number of audio channels; providing, by the computing system, the video training data and corresponding downmixed audio training data to the audiovisual upmixing model; obtaining, by the computing system, a predicted upmixed audio data output comprising the second number of audio channels from the audiovisual upmixing model; determining, by the computing system, a difference between the predicted upmixed audio data and the audio training data; and updating one or more parameters of the model based the difference. 13. A computing system configured for upmixing audiovisual data, the computing system comprising: one or more processors; and one or more memory devices storing computer-readable data comprising instructions that, when implemented, cause the one or more processors to perform operations, the operations comprising: obtaining audiovisual data comprising input audio data and video data accompanying the input audio data, the input audio data having a first number of audio channels; providing the audiovisual data as input to a machine-learned audiovisual upmixing model, the audiovisual upmixing model comprising a sequence-to-sequence model; and receiving upmixed audio data from the audiovisual upmixing model, the upmixed audio data having a second number of audio channels, the second number of audio channels greater than the first number of audio channels. 14. The computing system of claim 13 , wherein the audiovisual upmixing model comprises an encoder-decoder model. 15. The computing system of claim 13 , wherein the audiovisual upmixing model comprises a transformer model. 16. The computing system of claim 13 , wherein the audiovisual upmixing model comprises an attention mechanism. 17. The computing system of claim 16 , wherein the attention mechanism comprises a plurality of context vectors and an alignment model. 18. The computing system of claim 13 , wherein the audiovisual upmixing model comprises a plurality of internal state vectors. 19. The computing system of claim 13 , wherein the audiovisual upmixing model comprises a plurality of input streams, each of the plurality of input streams corresponding to a respective audio channel of the input audio data, and a plurality of output streams, each of the plurality of output streams corresponding to a respective audio channel of the upmixed audio data. 20. The computing system of claim 13 , wherein the video data comprises two-dimensional video data.

Assignees

Google Llc

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/0455Primary
Auto-encoder networks; Encoder-decoder networks · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0895
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

Patent family

Related publications grouped by family.

View patent family 72470588

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12273697B2 cover?: A computer-implemented method for upmixing audiovisual data can include obtaining audiovisual data including input audio data and video data accompanying the input audio data. Each frame of the video data can depict only a portion of a larger scene. The input audio data can have a first number of audio channels. The computer-implemented method can include providing the audiovisual data as input…
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G06N3/0455. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 08 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Processing text sequences using neural networks

Spatial-based audio object generation using image information

Wide and deep machine learning models

Enhancing hybrid self-attention structure with relative-position-aware bias for speech synthesis

Generating spatial audio using a predictive model

Frequently asked questions