Media segment representation using fixed weights

US12300233B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12300233-B2
Application numberUS-202218047562-A
CountryUS
Kind codeB2
Filing dateOct 18, 2022
Priority dateOct 18, 2022
Publication dateMay 13, 2025
Grant dateMay 13, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device includes a memory configured to store a collection of sets of weights, each of the sets of weights representing a respective media segment. The device also includes one or more processors configured to generate data representing the detected first input speech segment and to pass the data representing the detected first input speech segment into a collection of memory units. Each memory unit of the collection of memory units includes a set of weights from the collection of sets of weights. The one or more processors are also configured to generate a first estimate of an associated media segment that represents the detected first input speech segment. The associated media segment corresponds to a first memory unit in the collection of memory units.

First claim

Opening claim text (preview).

What is claimed is: 1. A device comprising: a memory configured to store a collection of sets of weights, each of the sets of weights representing a respective media segment; one or more processors configured to: detect a first input speech segment; generate data representing the detected first input speech segment; pass the data representing the detected first input speech segment into a collection of memory units, each memory unit of the collection of memory units including a set of weights from the collection of sets of weights, wherein each of the sets of weights represent one or more media parameters of the respective media segment associated with that set of weights, and wherein the one or more media parameters include at least one of: speech parameters including pulse code modulated (PCM) sample values associated with a respective memory unit, compressed representations of the PCM sample values associated with the respective memory unit, or acoustic features associated with the respective memory unit; and generate a first estimate of an associated media segment that represents the detected first input speech segment, the associated media segment corresponding to a first memory unit in the collection of memory units. 2. The device of claim 1 , wherein the first estimate is part of a reconstructed media representation of the detected first input speech segment. 3. The device of claim 1 , wherein the one or more media parameters include at least one of: pixel values of a video frame associated with a respective memory unit, visual landmarks of the video frame associated with the respective memory unit, a head pose vector, or a body skeleton vector. 4. The device of claim 1 , wherein the first estimate additionally includes the one or more media parameters of the associated media segment. 5. The device of claim 1 , wherein the collection of memory units represent nodes of one or more layers of a network. 6. The device of claim 5 , wherein the network is a neural network. 7. The device of claim 1 , wherein the one or more processors are further configured to: detect a second input speech segment; pass second data representing the detected second input speech segment into the collection of memory units; and generate a second estimate of a second associated media segment that represents the detected second input speech segment, the second associated media segment corresponding to a second memory unit in the collection of memory units. 8. The device of claim 7 , wherein the one or more processors are configured to receive the detected first input speech segment and the detected second input speech segment over a communication channel, and wherein the first estimate and the second estimate are part of a reconstructed speech representation of the detected first input speech segment and the detected second input speech segment. 9. The device of claim 8 , wherein the first estimate corresponds to a best match for the detected first input speech segment, and wherein the second estimate does not correspond to a best match for the detected second input speech segment. 10. The device of claim 7 , wherein the one or more processors are further configured to: generate multiple estimates of associated media segments that represent the detected second input speech segment; and select the second estimate from among the multiple estimates based on the first estimate. 11. The device of claim 1 , wherein the one or more processors are configured to process the detected first input speech segment using a first stage neural network to generate the data representing the detected first input speech segment. 12. The device of claim 11 , wherein the one or more processors are configured to, as part of a training operation: perform a comparison of the first estimate to a target estimate for the detected first input speech segment; and update the first stage neural network based on the comparison. 13. The device of claim 11 , wherein the one or more processors are configured to, as part of a training operation: determine target media parameters based on features of the detected first input speech segment; perform a comparison of the target media parameters with media parameters of the media segment that is associated with the first estimate; and update the first stage neural network based on the comparison. 14. The device of claim 11 , wherein the one or more processors are configured to, as part of a training operation: determine a target media segment based on a target estimate for the detected first input speech segment; determine target media parameters of the target media segment; perform a comparison of the target media parameters with media parameters of the media segment that is associated with the first estimate; and update the first stage neural network based on the comparison. 15. The device of claim 1 , further comprising a modem configured to send the first estimate to a second device via a communication channel. 16. The device of claim 1 , further comprising one or more microphones configured to generate audio data that includes the detected first input speech segment. 17. The device of claim 1 , further comprising one or more speakers configured to play out audio data corresponding to the associated media segment. 18. A method comprising: detecting, at a device, a first input speech segment; passing data representing the detected first input speech segment into a collection of memory units, where each memory unit includes a set of weights representing one or more media parameters of a respective media segment associated with that set of weights, and wherein the one or more media parameters include at least one of: speech parameters including pulse code modulated (PCM) sample values associated with a respective memory unit, compressed representations of the PCM sample values associated with the respective memory unit, or acoustic features associated with the respective memory unit; and outputting a first estimate of an associated media segment that represents the detected first input speech segment, the associated media segment corresponding to a first memory unit in the collection of memory units. 19. The method of claim 18 , further comprising sending the first estimate over a communication channel to another device. 20. The method of claim 19 , wherein the first estimate is part of a reconstructed media representation of the detected first input speech segment. 21. The method of claim 18 , wherein the one or more media parameters include at least one of: pixel values of a video frame associated with a respective memory unit, visual landmarks of the video frame associated with the respective memory unit, a head pose vector, or a body skeleton vector. 22. The method of claim 21 , wherein the first estimate includes the one or more media parameters. 23. The method of claim 18 , wherein the detected first input speech segment is received over a communication channel. 24. The method of claim 18 , further comprising: detecting a second input speech segment; passing second data representing the detected second input speech segment into the collection of memory units; and outputting a second estimate of a second associated speech segment that represents the detected second input speech segment, the second associated speech segment corresponding to a second memory unit in the collection of memory units.

Assignees

Inventors

Classifications

  • Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title

  • using artificial neural networks · CPC title

  • Segmentation; Word boundary detection · CPC title

  • Elementary speech units used in speech synthesisers; Concatenation rules · CPC title

  • for retrieval · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12300233B2 cover?
A device includes a memory configured to store a collection of sets of weights, each of the sets of weights representing a respective media segment. The device also includes one or more processors configured to generate data representing the detected first input speech segment and to pass the data representing the detected first input speech segment into a collection of memory units. Each memor…
Who is the assignee on this patent?
Qualcomm Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 13 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).