Audio source identification

US10748554B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10748554-B2
Application numberUS-201916249654-A
CountryUS
Kind codeB2
Filing dateJan 16, 2019
Priority dateJan 16, 2019
Publication dateAug 18, 2020
Grant dateAug 18, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments facilitating audio source identification are provided. A computer-implemented method comprises: receiving, by a device operatively coupled to one or more processors, an audio signal under inspection; generating, by the device, an image of time-frequency spectrum of low frequency component and high frequency component of the audio signal; and identifying, by the device, a source of the audio signal based on the generated image and one or more patterns of time-frequency spectrum, wherein each of the one or more patterns is corresponding to low frequency feature and high frequency feature of a specific audio source.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: receiving, by a device operatively coupled to one or more processors, an audio signal under inspection; generating, by the device, a merged image of a time-frequency spectrum of a low frequency component and a high frequency component of the audio signal; identifying, by the device, a source that recorded the audio signal based on the generated image and one or more source patterns of the time-frequency spectrum, wherein each of the one or more source patterns is corresponding to a low frequency feature and a high frequency feature of a specific audio source; and detecting, by the device, a brand and a type of the source that recorded the audio signal. 2. The computer-implemented method of claim 1 , wherein the generating comprises: conducting Fourier transformation, by the device, on the audio signal to generate a full time-frequency spectral image of the audio signal; extracting, by the device, the low frequency component and the high frequency component of the source time-frequency spectral image; and combining, by the device, an extracted low frequency component and an extracted high frequency component to generate the merged image of time-frequency spectrum of the audio signal. 3. The computer-implemented method of claim 1 , wherein the identifying comprises: comparing, by the device, the generated image with the one or more patterns through image pixel matching to identify the source of the audio signal. 4. The computer-implemented method of claim 1 , wherein the identifying comprises: feeding, by the device, the generated image to a convolutional neural network (CNN); extracting, by the device, the low frequency feature and the high frequency feature from the generated image through the CNN; comparing, by the device, the extracted low frequency feature and the high frequency feature with the one or more patterns to determine the matched pattern through the CNN; and identifying, by the device, an audio source that corresponds to the matched pattern as the source of the audio signal under inspection through the CNN. 5. The computer-implemented method of claim 1 , further comprising: obtaining, by the device, the one or more patterns corresponding to one or more specific audio sources respectively. 6. The computer-implemented method of claim 5 , wherein the obtaining comprises: receiving, by the device, multiple sample audio signals recorded through the specific audio source; conducting Fourier transformation, by the device, on the sample audio signals to generate multiple template full time-frequency spectral images of the sample audio signals; extracting, by the device, low frequency component and high frequency component of each of the template full time-frequency spectral images of the multiple sample audio signals; merging, by the device, the low frequency component and the high frequency component of each of the template full time-frequency spectral images to generate multiple template images corresponding to the specific audio source; and training, by the device, a convolutional neural network with the multiple template images to learn a hidden pattern of the specific audio source that is related to features of low frequency and high frequency corresponding to the specific audio source. 7. The computer-implemented method of claim 1 , wherein the low frequency refers to frequency below 200 Hz, and the high frequency refers to frequency above 5 KHz. 8. The computer-implemented method of claim 1 , wherein the specific audio source has a frequency response characteristic in low frequency and high frequency that is distinct from other audio sources. 9. The computer-implemented method of claim 1 , wherein the image of time-frequency spectrum contains information of time, frequency and decibel (dB) of the audio signal. 10. A system, comprising: a memory that stores computer executable components; and a processing unit operably coupled to the memory, and that executes the computer executable components stored in the memory, wherein the computer executable components comprise: at least one computer-executable component that: receives an audio signal under inspection; generates a merged image of a time-frequency spectrum of a low frequency component and a high frequency component of the audio signal; identifies a source that recorded the audio signal based on the generated image and one or more source patterns of the time-frequency spectrum, wherein each of the one or more source patterns corresponds to a low frequency feature and a high frequency feature of a specific audio source; and detects a brand and a type of the source that recorded the audio signal. 11. The system of claim 10 , wherein the generation comprises: a conducting of Fourier transformation on the audio signal to generate a full time-frequency spectral image of the audio signal; extraction of the low frequency component and the high frequency component of the source time-frequency spectral image; and a combining of the extracted low frequency component and the extracted high frequency component to generate the merged image of time-frequency spectrum of the audio signal. 12. The system of claim 10 , wherein the identification comprises: a comparing of the generated image with the one or more patterns through image pixel matching to identify the source of the audio signal. 13. The system of claim 10 , wherein the identification comprises: a feeding of the generated image to a convolutional neural network (CNN); extraction of the low frequency feature and the high frequency feature from the generated image through the CNN; a comparing of the low frequency feature and the high frequency feature with the one or more patterns to determine the matched pattern through the CNN; and identification of an audio source that corresponds to the matched pattern as the source of the audio signal under inspection through the CNN. 14. The system of claim 10 , wherein the at least one computer-executable component also: obtains the one or more patterns corresponding to one or more specific audio sources respectively. 15. The system of claim 14 , wherein the obtaining comprises: a receiving of multiple sample audio signals recorded through the specific audio source; and a conducting of Fourier transformation on the sample audio signals to generate multiple template full time-frequency spectral images of the sample audio signals; extraction of low frequency component and high frequency component of each of the template full time-frequency spectral images of the multiple sample audio signals; a merging of the low frequency component and the high frequency component of each of the template full time-frequency spectral images to generate multiple template images corresponding to the specific audio source; and a training of a convolutional neural network with the multiple template images to learn a hidden pattern of the specific audio source that is related to features of low frequency and high frequency corresponding to the specific audio source. 16. The system of claim 10 , wherein the low frequency refers to frequency below 200 Hertz (Hz), and the high frequency refers to frequency above 5 Kilohertz (KHz). 17. The system of claim 10 , wherein the specific audio source has a frequency response characteristic in low frequency and high frequency that is distinctive from other audio sources. 18. The system of claim 10 , wherein the image of time-frequency spectrum contains information of tim

Assignees

Inventors

Classifications

  • using classification, e.g. of video objects · CPC title

  • the extracted parameters being spectral information of each sub-band · CPC title

  • characterised by the analysis technique · CPC title

  • G10L25/51Primary

    for comparison or discrimination · CPC title

  • using neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10748554B2 cover?
Embodiments facilitating audio source identification are provided. A computer-implemented method comprises: receiving, by a device operatively coupled to one or more processors, an audio signal under inspection; generating, by the device, an image of time-frequency spectrum of low frequency component and high frequency component of the audio signal; and identifying, by the device, a source of t…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G10L25/51. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 18 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).