Multi-channel audio video fingerprinting

US9367887B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9367887-B1
Application numberUS-201514880762-A
CountryUS
Kind codeB1
Filing dateOct 12, 2015
Priority dateSep 5, 2013
Publication dateJun 14, 2016
Grant dateJun 14, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Implementations are provided herein relating to audiovisual matching. Audio and video channel data is merged to create a single multi-channel fingerprint used to match media content. Audio channel data is used to generate audio fingerprints. Video channel data is used to generate a video fingerprints. Multi-channel fingerprints can then be generated based on the audio channel fingerprints and video channel fingerprints. In this sense, entropy can be increased while the multi-channel fingerprint can be less resistant to noise.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for generating multi-channel fingerprints, the method comprising: receiving audio channel data and video channel data associated with a video; generating a set of audio fingerprints based on the audio channel data; generating a set of mean frames of the video based on a sliding time window applied to the video channel data; generating a set of video fingerprints based on the set of mean frames of the video; and generating a set of multi-channel fingerprints based on both the set of audio fingerprints and the set of video fingerprints. 2. The computer-implemented method of claim 1 , further comprising: generating an audio spectrogram based on the audio channel data; and generating a downscaled audio spectrogram based on the audio spectrogram, wherein the set of audio fingerprints is generated based on the downscaled audio spectrogram and wherein audio fingerprints in the set of audio fingerprints are min-hashes. 3. The computer-implemented method of claim 2 , further comprising: generating a set of wavelet min-hashes based on the set of mean frames, wherein the set of video fingerprints is generated based on the set of wavelet min-hashes and wherein video fingerprints in the set of video fingerprints are min-hashes. 4. The computer-implemented method of claim 3 , wherein generating the set of multi-channel fingerprints is based on concatenating min-hashes of audio fingerprints from the set of audio fingerprints and the min-hashes of video fingerprints from the set of video fingerprints. 5. The computer-implemented method of claim 4 , wherein generating the set of multi-channel fingerprints is based on a consistent output rate. 6. The computer-implemented method of claim 3 , further comprising: generating a set of weighted audio min-hashes based on the set of audio fingerprints, an aggregate hash time window, and an audio channel identifier; generating a set of weighted video min-hashes based on the set of video fingerprints, the aggregate hash time window, and a video channel identifier; and generating a set of concatenated pairs based on the set of weighted audio min-hashes and the set of weighted video min-hashes wherein generating the set of multi-channel fingerprints is based on the set of concatenated pairs. 7. The computer-implemented method of claim 6 , wherein concatenated pairs in the set concatenated pairs are comprised of at least one weighted audio min-hash from the set of weighted audio min-hashes and at least one weighted video min-hash from the set of weighted video min-hashes. 8. The computer-implemented method of claim 2 , further comprising: generating a set of interest points based on the audio spectrogram; and generating a set of descriptors based on the set of interest points; wherein the set of audio fingerprints is generated based on the set of descriptors. 9. The computer-implemented method of claim 8 , further comprising: generating a set of pairs wherein each pair in the set of pairs contains an anchor interest point and a paired interest point; generating a third point for each pair in the set of pairs based on a search path wherein the third point is a time-frequency point of a maxima along the search path; generating a set of triples wherein respective triples in the set of triples contain the anchor interest point, the paired interest point and the third point; determining a binary bit associated with each triple in the set of triples based on whether the third point lies on a first half of the search path or a second half of the search path; and wherein generating descriptors in the set of descriptors is based on a triple in the set of triples and contains a quantized frequency of the anchor interest point, a first quantized frequency ratio of a frequency of the paired interest point and a frequency of the anchor interest point, a second quantized frequency ratio of a frequency of the third point and the frequency of the anchor interest point, a time span between the anchor interest point and the paired interest point, and the binary bit associated with the triple. 10. The computer-implemented method of claim 8 , further comprising: generating a set of video interest points based on the set of mean frames; and generating a set of quantized video interest points based on the set of video interest points wherein the set of video fingerprints is generated based on the set of quantized video interest points. 11. The computer-implemented method of claim 10 , wherein generating the set of multi-channel fingerprints comprises: combining an audio fingerprint from the set of audio fingerprints and a video fingerprint from the set of video fingerprints based on at least one of a common time offset, a closest in time offset, or a spatial similarity. 12. A computer program product comprising a non-transitory computer-readable storage medium storing executable code for generating multi-channel fingerprints, the code when executed by a computer processor cause the computer processor to perform steps comprising: receiving audio channel data and video channel data associated with a video; generating a set of audio fingerprints based on the audio channel data; generating a set of mean frames of the video based on a sliding time window applied to the video channel data; generating a set of video fingerprints based on the set of mean frames of the video; and generating a set of multi-channel fingerprints based on both the set of audio fingerprints and the set of video fingerprints. 13. The computer program product of claim 1 , wherein the code when executed by the computer processor causes the computer processor to perform further steps comprising: generating an audio spectrogram based on the audio channel data; and generating a downscaled audio spectrogram based on the audio spectrogram, wherein the set of audio fingerprints is generated based on the downscaled audio spectrogram and wherein audio fingerprints in the set of audio fingerprints are min-hashes. 14. The computer program product of claim 13 , wherein the code when executed by the computer processor causes the computer processor to perform further steps comprising: generating a set of wavelet min-hashes based on the set of mean frames, wherein the set of video fingerprints is generated based on the set of wavelet min-hashes and wherein video fingerprints in the set of video fingerprints are min-hashes. 15. The computer program product of claim 14 , wherein generating the set of multi-channel fingerprints is based on concatenating min-hashes of audio fingerprints from the set of audio fingerprints and the min-hashes of video fingerprints from the set of video fingerprints. 16. The computer program product of claim 15 , wherein generating the set of multi-channel fingerprints is based on a consistent output rate. 17. The computer program product of claim 14 , wherein the code when executed by the computer processor causes the computer processor to perform further steps comprising: generating a set of weighted audio min-hashes based on the set of audio fingerprints, an aggregate hash time window, and an audio channel identifier; generating a set of weighted video min-hashes based on the set of video fingerprints, the aggregate hash time window, and a video channel identifier; and generating a set of concatenated pairs based on the set of weighted audio min-hashes and the set of weighted video min-hashes wherein generating the set of multi-channel fingerprints is based on the set of concatenated pairs.

Assignees

Inventors

Classifications

  • G06T1/0021Primary

    Image watermarking · CPC title

  • G06V20/46Primary

    Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title

  • using low-level visual features of the video content · CPC title

  • using audio features · CPC title

  • using metadata automatically derived from the content · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9367887B1 cover?
Implementations are provided herein relating to audiovisual matching. Audio and video channel data is merged to create a single multi-channel fingerprint used to match media content. Audio channel data is used to generate audio fingerprints. Video channel data is used to generate a video fingerprints. Multi-channel fingerprints can then be generated based on the audio channel fingerprints and v…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06T1/0021. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 14 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).