What technology area does this patent fall under?

Primary CPC classification G10L19/018. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Apr 04 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Generating audio files from text input

US2024112687A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2024112687-A1
Application number	US-202318477859-A
Country	US
Kind code	A1
Filing date	Sep 29, 2023
Priority date	Sep 29, 2022
Publication date	Apr 4, 2024
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and storage media for generating audio data includes receiving a text input. The method also includes receiving a plurality of representative audio sources and encoding the plurality of representative audio sources into a plurality of audio tokens. The method includes encoding the text input into a plurality of text representations. The method comprises mapping each audio tokens of the plurality of audio tokens to a text representation of the plurality of text representations. The method also comprises determining a relationship score based on mapping each audio tokens to the text representation, wherein the relationship score identifies a distribution of audio tokens from the plurality of audio tokens. The method and systems can also comprise decoding the subgroup of audio tokens to yield a reconstructed audio source.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for generating audio data, the method comprising: receiving a text input; receiving a plurality of representative audio sources; encoding the plurality of audio sources into a plurality of audio tokens; encoding the text input into a plurality of text representations; mapping each audio tokens of the plurality of audio tokens to a text representation of the plurality of text representations; determining a relationship score based on mapping each audio tokens to the text representation, wherein the relationship score identifies a distribution of audio tokens from the plurality of audio tokens; in response on the relationship score, determining a subgroup of audio tokens from the distribution of audio tokens; and decoding the subgroup of audio tokens to yield a reconstructed audio source. 2 . The method of claim 1 , wherein decoding the plurality of audio tokens comprises, determining a time domain loss and a frequency domain loss and implementing a Fourier transform to reduce the time domain loss and the frequency domain loss. 3 . The method of claim 1 , wherein decoding the subgroup of audio tokens to yield a reconstructed audio source comprises decompressing the subgroup of audio tokens. 4 . The method of claim 1 wherein encoding the text input is performed by a trained text encoder model. 5 . The method of claim 1 further comprising, transmitting the reconstructed audio source to a virtual reality or augment reality environment. 6 . The method of claim 1 further comprising training a compression and decompression model for the plurality of audio resources based on encoding the plurality of audio resources and decoding the subgroup of audio tokens. 7 . The method of claim 1 further comprising, transmitting the reconstructed audio source to a virtual reality or augment reality environment. 8 . A system for generating audio data, comprising: one or more processors; a memory comprising instructions stored thereon, which when executed by the one or more processors, causes the one or more processors to perform: receiving a text input; receiving a plurality of representative audio sources; encoding the plurality of representative audio sources into a plurality of audio tokens; encoding the text input into a plurality of text representations; mapping each audio tokens of the plurality of audio tokens to a text representation of the plurality of text representations; determining a relationship score based on mapping each audio tokens to the text representation, wherein the relationship score identifies a distribution of audio tokens from the plurality of audio tokens; in response to the relationship score, determining a subgroup of audio tokens from the distribution of audio tokens; and decoding the subgroup of audio tokens to yield a reconstructed audio source, wherein decoding the subgroup of audio tokens to yield a reconstructed audio source comprises decompressing the subgroup of audio tokens. 9 . The system of claim 8 , wherein decoding the plurality of audio tokens comprises, determining a time domain loss and a frequency domain loss and implementing a Fourier transform to reduce the time domain loss and the frequency domain loss. 10 . The system of claim 8 , wherein encoding the text input is performed by a trained text encoder model. 11 . The system of claim 8 , further comprising, transmitting the reconstructed audio source to a virtual reality or augment reality environment. 12 . The system of claim 8 , further comprising training a compression and decompression model for the plurality of audio resources based on encoding the plurality of audio resources and decoding the subgroup of audio tokens. 13 . The system of claim 8 , further comprising, transmitting the reconstructed audio source to a virtual reality or augment reality environment. 14 . A non-transitory storage medium comprising instructions stored thereon, which when executed by one or more processors, cause the one or more processors to perform operations for generating audio: receiving a text input; receiving a plurality of representative audio sources; encoding the plurality of representative audio sources into a plurality of audio tokens; encoding the text input into a plurality of text representations; mapping each audio tokens of the plurality of audio tokens to a text representation of the plurality of text representations; determining a relationship score based on mapping each audio tokens to the text representation, wherein the relationship score identifies a distribution of audio tokens from the plurality of audio tokens; in response to the relationship score, determining a subgroup of audio tokens from the distribution of audio tokens; and decoding the subgroup of audio tokens to yield a reconstructed audio source. 15 . The non-transitory storage medium of claim 14 , wherein decoding the plurality of audio tokens comprises, determining a time domain loss and a frequency domain loss and implementing a Fourier transform to reduce the time domain loss and the frequency domain loss. 16 . The non-transitory storage medium of claim 14 , wherein decoding the subgroup of audio tokens to yield a reconstructed audio source comprises decompressing the subgroup of audio tokens. 17 . The non-transitory storage medium of claim 14 , wherein encoding the text input is performed by a trained text encoder model. 18 . The non-transitory storage medium of claim 14 , further comprising stored sequences of instructions, which when executed by the one or more processors, cause the one or more processors to perform, transmitting the reconstructed audio source to a virtual reality or augment reality environment. 19 . The non-transitory storage medium of claim 14 further comprising stored sequences of instructions, which when executed by the one or more processors, cause the one or more processors to perform training a compression and decompression model for the plurality of audio resources based on encoding the plurality of audio resources and decoding the subgroup of audio tokens. 20 . The non-transitory storage medium of claim 14 further comprising stored sequences of instructions, which when executed by the one or more processors, cause the one or more processors to perform, transmitting the reconstructed audio source to a virtual reality or augment reality environment.

Assignees

Meta Platforms Tech Llc

Inventors

Classifications

G10L19/02
using spectral analysis, e.g. transform vocoders or subband vocoders · CPC title
G10L19/018Primary
Audio watermarking, i.e. embedding inaudible data in the audio signal · CPC title
G10L19/0204
using subband decomposition · CPC title
G10L13/08Primary
Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination · CPC title
G10L13/047
Architecture of speech synthesisers · CPC title

Patent family

Related publications grouped by family.

View patent family 90471160

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024112687A1 cover?: Methods, systems, and storage media for generating audio data includes receiving a text input. The method also includes receiving a plurality of representative audio sources and encoding the plurality of representative audio sources into a plurality of audio tokens. The method includes encoding the text input into a plurality of text representations. The method comprises mapping each audio toke…
Who is the assignee on this patent?: Meta Platforms Tech Llc
What technology area does this patent fall under?: Primary CPC classification G10L19/018. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Apr 04 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Three-dimensional audio signal processing method and apparatus

Generative neural network model for processing audio samples in a filter-bank domain

Audio processing of missing audio information

Methods and systems for augmenting audio content

Frequently asked questions