What technology area does this patent fall under?

Primary CPC classification G10L21/01. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 17 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Media segment prediction for media generation

US12170094B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12170094-B2
Application number	US-202218047572-A
Country	US
Kind code	B2
Filing date	Oct 18, 2022
Priority date	Oct 18, 2022
Publication date	Dec 17, 2024
Grant date	Dec 17, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device includes one or more processors configured to input one or more segments of an input media stream into a feature extractor. The one or more processors are further configured to pass an output of the feature extractor into an utterance classifier to produce at least one representation of at least one utterance class of a plurality of utterance classes. The one or more processors are further configured to pass the output of the feature extractor and the at least one representation into a segment matcher to produce a media output segment identifier.

First claim

Opening claim text (preview).

What is claimed is: 1. A device comprising: one or more processors configured to: input one or more segments of an input media stream into a feature extractor; pass an output of the feature extractor into an utterance classifier to produce at least one representation of at least one utterance class of a plurality of utterance classes; and pass the output of the feature extractor and the at least one representation into a segment matcher to determine a media output segment identifier; pass the media output segment identifier into one or more memory units, wherein each of the one or more memory units includes a set of weights representing a respective media segment. 2. The device of claim 1 , wherein the segment matcher is configured to obtain, based on the at least one representation, data representing one or more candidate frames of a media output segment and perform a comparison of the data representing the one or more candidate frames to the output of the feature extractor, and wherein the media output segment identifier is determined based on a result of the comparison. 3. The device of claim 1 , wherein the segment matcher is configured to: obtain data representing a plurality of candidate frames, each candidate frame of the plurality of candidate frames corresponding to a portion of a respective media output segment; and determine frame match scores for the plurality of candidate frames, wherein a frame match score of a particular candidate frame indicates an estimate of similarity of the particular candidate frame to an input frame represented by the output of the feature extractor, wherein the media output segment identifier is determined based, at least partially, on the frame match score. 4. The device of claim 3 , wherein the segment matcher is configured to determine the frame match score for the particular candidate frame by passing the output of the feature extractor and data representing the particular candidate frame into a trained machine-learning model to cause the trained machine-learning model to output the frame match score for the particular candidate frame. 5. The device of claim 3 , wherein the output of the feature extractor includes one or more speech parameter values for the input frame, and wherein the segment matcher is configured to determine the frame match score for the particular candidate frame based on comparison of speech parameter values of the one or more speech parameter values for the input frame and one or more corresponding speech parameter values for the particular candidate frame. 6. The device of claim 3 , wherein the segment matcher is configured to: determine, based on the one or more frame match scores, one or more candidate segments; and for each of the one or more candidate segments, determine a segment match score, wherein the segment match score of a particular candidate segment indicates an estimate of similarity of the particular candidate segment to an input segment represented by the output of the feature extractor, wherein the media output segment identifier is determined based, at least partially, on the one or more segment match scores. 7. The device of claim 6 , wherein the media output segment identifier identifies a media output segment associated with a largest segment match score among the one or more candidate segments. 8. The device of claim 6 , wherein the segment matcher is configured to determine the segment match score of the particular candidate segment based on dynamic time warping of data representing the particular candidate segment and data representing the input segment. 9. The device of claim 6 , wherein the segment matcher is configured to, for each input frame of an input segment, determine multiple frame match scores with respect to the input frame, wherein the input segment includes multiple input frames, and wherein the segment match score of the particular candidate segment is based on frame match scores for candidate frames as compared to different ones of the multiple input frames and is further based on memory locations associated with the candidate frames. 10. The device of claim 6 , wherein the segment match score of the particular candidate segment is further based on the at least one representation of at least one utterance class. 11. The device of claim 1 , wherein the segment matcher is configured to: obtain data representing one or more candidate frames of a media output segment; perform a comparison of the data representing the one or more candidate frames to the output of the feature extractor to identify one or more candidate segments; and determine, based on the at least one representation, a best match media output segment of the one or more candidate segments, and wherein the media output segment identifier identifies the best match media output segment. 12. The device of claim 1 , wherein the one or more processors are further configured to pass one or more constraints to the segment matcher to determine the media output segment identifier. 13. The device of claim 12 , wherein the one or more constraints include a talker identifier. 14. The device of claim 1 , wherein the media output segment identifier identifies a recorded media segment corresponding to at least one phoneme. 15. The device of claim 1 , further comprising a modem coupled to the one or more processors, the modem configured to transmit data indicating the media output segment identifier to another device. 16. The device of claim 1 , further comprising one or more receivers configured to receive the input media stream over a communication channel. 17. The device of claim 1 , wherein the one or more processors are further configured to determine that a particular media segment of the input media stream is not available for playout, and wherein the media output segment identifier corresponds to an estimate of the particular media segment. 18. The device of claim 17 , wherein the one or more processors are further configured to concatenate the estimate of the particular media segment with one or more media segments of the input media stream to generate an audio stream. 19. The device of claim 1 , wherein the input media stream includes audio representing speech of at least one first person, and the media output segment identifier enables output of corresponding speech of at least one second person. 20. The device of claim 1 , wherein the input media stream includes audio representing first speech having a first accent, and the media output segment identifier enables output of corresponding second speech having a second accent. 21. The device of claim 1 , wherein the input media stream includes audio representing speech and first noise, and the media output segment identifier enables output of corresponding speech without the first noise. 22. The device of claim 1 , wherein the input media stream includes audio representing speech of at least one first person, and the media output segment identifier enables output of corresponding anonymized speech. 23. The device of claim 1 , wherein the one or more processors are further configured to concatenate a media segment associated with the media output segment identifier with one or more additional media segments to generate an audio stream. 24. The device of claim 1 , further comprising one or more microphones coupled to the one or more processors, the one or more microphones configured to receive audio data and to generate the i

Assignees

Qualcomm Inc

Inventors

Classifications

G10L15/26
Speech to text systems (G10L15/08 takes precedence) · CPC title
G10L13/06
Elementary speech units used in speech synthesisers; Concatenation rules · CPC title
G10L25/54
for retrieval · CPC title
G10L17/02
Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title
G10L15/08
Speech classification or search · CPC title

Patent family

Related publications grouped by family.

View patent family 88689859

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12170094B2 cover?: A device includes one or more processors configured to input one or more segments of an input media stream into a feature extractor. The one or more processors are further configured to pass an output of the feature extractor into an utterance classifier to produce at least one representation of at least one utterance class of a plurality of utterance classes. The one or more processors are fur…
Who is the assignee on this patent?: Qualcomm Inc
What technology area does this patent fall under?: Primary CPC classification G10L21/01. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 17 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Text-to-speech (TTS) processing with transfer of vocal characteristics

Text-to-speech processing using previously speech processed data

Voice profile management and speech signal generation

Frequently asked questions