Sound event detection learning

US11664044B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11664044-B2
Application numberUS-202017102797-A
CountryUS
Kind codeB2
Filing dateNov 24, 2020
Priority dateNov 25, 2019
Publication dateMay 30, 2023
Grant dateMay 30, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device includes a processor configured to receive audio data samples and provide the audio data samples to a first neural network to generate a first output corresponding to a first set of sound classes. The processor is further configured to provide the audio data samples to a second neural network to generate a second output corresponding to a second set of sound classes. A second count of classes of the second set of sound classes is greater than a first count of classes of the first set of sound classes. The processor is also configured to provide the first output to a neural adapter to generate a third output corresponding to the second set of sound classes. The processor is further configured to provide the second output and the third output to a merger adapter to generate sound event identification data based on the audio data samples.

First claim

Opening claim text (preview).

What is claimed is: 1. A device comprising: a processor configured to: receive audio data samples including a first audio data sample; provide the audio data samples to a first neural network trained to generate a first output corresponding to a first count of classes of a first set of sound classes; provide the audio data samples to a second neural network to generate a second output corresponding to a second count of classes of a second set of sound classes, the second count of classes greater than the first count of classes, the first output and the second output generated based on the first audio data sample; provide the first output to a neural adapter to generate a third output corresponding to the second count of classes of the second set of sound classes; and provide the second output and the third output to a merger adapter to generate sound event identification data based on the audio data samples. 2. The device of claim 1 , wherein the first neural network has a base topology and a first output layer and the second neural network has the base topology and a second output layer, and wherein the first output layer includes a first count of nodes, the second output layer includes a second count of nodes, and the second count of nodes is greater than the first count of nodes. 3. The device of claim 2 , wherein the neural adapter has an input layer including the first count of nodes and an output layer including the second count of nodes. 4. The device of claim 1 , wherein the merger adapter is configured to merge the second output and the third output, element-by-element, to form a merged output. 5. The device of claim 4 , wherein the merger adapter is configured to generate output data including the sound event identification data based on the merged output. 6. The device of claim 1 , wherein the audio data samples include features extracted from audio data. 7. The device of claim 1 , wherein the audio data samples include Mel spectrum features extracted from audio data. 8. The device of claim 1 , further comprising one or more microphones coupled to the processor and configured to capture audio data to generate the audio data samples. 9. The device of claim 8 , wherein the processor and the one or more microphones are integrated within a mobile computing device and the audio data represents an acoustic environment of the mobile computing device. 10. The device of claim 8 , wherein the processor and the one or more microphones are integrated within a vehicle. 11. The device of claim 8 , wherein the processor and the one or more microphones are integrated within a wearable device and the audio data represents an acoustic environment of the wearable device. 12. The device of claim 8 , wherein the processor and the one or more microphones are integrated within a headset and the audio data represents an acoustic environment of the headset. 13. The device of claim 1 , wherein the processor is included in an integrated circuit. 14. A method comprising: receiving audio data samples including a first audio data sample; providing, by a processor, the audio data samples to a first neural network trained to generate a first output corresponding to a first count of classes of a first set of sound classes; providing, by the processor, the audio data samples to a second neural network to generate a second output corresponding to a second count of classes of a second set of sound classes, the second count of classes greater than the first count of classes, the first output and the second output generated based on the first audio data sample; providing, by the processor, the first output to a neural adapter to generate a third output corresponding to the second count of classes of the second set of sound classes; and providing, by the processor, the second output and the third output to a merger adapter to generate sound event identification data based on the audio data samples. 15. The method of claim 14 , wherein the first neural network has a base topology and a first output layer and the second neural network has the base topology and a second output layer, and wherein the first output layer includes a first count of nodes, the second output layer includes a second count of nodes, and the second count of nodes is greater than the first count of nodes. 16. The method of claim 15 , wherein the neural adapter has an input layer including the first count of nodes and an output layer including the second count of nodes. 17. The method of claim 14 , wherein the merger adapter merges the second output and the third output, element-by-element, to form a merged output. 18. The method of claim 17 , wherein the merger adapter generates output data including the sound event identification data based on the merged output. 19. The method of claim 14 , further comprising generating the audio data samples including extracting features from audio data representing an acoustic environment. 20. The method of claim 14 , further comprising capturing audio data at one or more microphones coupled to the processor, wherein the audio data samples are generated based on the captured audio data. 21. The method of claim 14 , further comprising performing an action responsive to the sound event identification data. 22. A non-transitory computer-readable storage medium, the computer-readable storage medium including instructions that when executed by a processor, cause the processor to: provide audio data samples to a first neural network trained to generate a first output corresponding to a first count of classes of a first set of sound classes, the audio data samples including a first audio data sample; provide the audio data samples to a second neural network to generate a second output corresponding to a second count of classes of a second set of sound classes, the second count of classes greater than the first count of classes, the first output and the second output generated based on the first audio data sample; provide the first output to a neural adapter to generate a third output corresponding to the second count of classes of the second set of sound classes; and provide the second output and the third output to a merger adapter to generate sound event identification data based on the audio data samples. 23. The non-transitory computer-readable storage medium of claim 22 , wherein the first neural network has a base topology and a first output layer and the second neural network has the base topology and a second output layer, and wherein the first output layer includes a first count of nodes, the second output layer includes a second count of nodes, and the second count of nodes is greater than the first count of nodes. 24. The non-transitory computer-readable storage medium of claim 22 , wherein the instructions when executed by the processor further cause the processor to perform an action responsive to the sound event identification data. 25. The non-transitory computer-readable storage medium of claim 22 , wherein the merger adapter generates the sound event identification data based on merged output based on element-by-element merger of the third output and the second output. 26. A device comprising: means for generating a first output based on audio data samples, the first output corresponding to a first count of classes of a first set of sound classes, the audio data samples including a first audio data s

Assignees

Inventors

Classifications

  • Transfer learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • Activation functions · CPC title

  • for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11664044B2 cover?
A device includes a processor configured to receive audio data samples and provide the audio data samples to a first neural network to generate a first output corresponding to a first set of sound classes. The processor is further configured to provide the audio data samples to a second neural network to generate a second output corresponding to a second set of sound classes. A second count of …
Who is the assignee on this patent?
Qualcomm Inc
What technology area does this patent fall under?
Primary CPC classification G10L25/51. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 30 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).