Example-based cross-modal denoising

US9576587B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9576587-B2
Application numberUS-201414301676-A
CountryUS
Kind codeB2
Filing dateJun 11, 2014
Priority dateJun 12, 2013
Publication dateFeb 21, 2017
Grant dateFeb 21, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for cross-modal signal denoising, the method comprising using at least one hardware processor for: providing a first multi-modal signal comprising at least two relatively clear modalities; correlating features exhibited simultaneously in the at least two relatively clear modalities of the first multi-modal signal; providing a second multi-modal signal comprising at least one relatively noisy modality and at least one relatively clear modality; and denoising the at least one relatively noisy modality of the second multi-modal signal by associating between (a) features exhibited in the at least one relatively noisy modality of the second multi-modal signal and (b) the features of the first multi-modal signal.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for cross-modal signal denoising, the method comprising using at least one hardware processor for: providing a first multi-modal signal comprising at least two relatively clear modalities; correlating features exhibited simultaneously in the at least two relatively clear modalities of the first multi-modal signal; providing a second multi-modal signal comprising at least one relatively noisy modality and at least one relatively clear modality; and denoising the at least one relatively noisy modality of the second multi-modal signal by associating between (a) features exhibited in the at least one relatively noisy modality of the second multi-modal signal and (b) the correlated features of the first multi-modal signal. 2. The method according to claim 1 , wherein said denoising comprises replacing the features exhibited in the at least one relatively noisy modality of the second multi-modal signal with the features exhibited in one of the at least two relatively clear modalities of the first multi-modal signal. 3. The method according to claim 2 , wherein said replacing is based on a statistical analysis of the features of: one of the at least two relatively clear modalities of the first multi-modal signal; and features exhibited in the at least one relatively clear modality of the second multi-modal signal. 4. The method according to claim 2 , wherein said replacing is based on a pattern recognition of the features of: one of the at least two relatively clear modalities of the first multi-modal signal, and features exhibited in the at least one relatively clear modality of the second multi-modal signal. 5. The method according to claim 1 , wherein: the at least two relatively clear modalities of the first multi-modal signal are an audio modality and a video modality; the at least one relatively noisy modality of the second multi-modal signal is an audio modality; and the at least one relatively clear modality of the second multi-modal signal is a video modality. 6. The method according to claim 1 , further comprising dividing one of the at least two relatively clear modalities of the first multi-modal signal into a plurality of temporal segments. 7. The method according to claim 6 , wherein each of the plurality of temporal segments is between 0.2 and 0.4 seconds long. 8. An apparatus comprising: an image sensor configured for video capture; a microphone; a non-transient memory having stored thereon correlated features exhibited simultaneously in a relatively clear video modality and in a relatively clear audio modality both belonging to a first multi-modal signal; and at least one hardware processor configured to: (a) receive a second multi-modal signal comprising a relatively clear video modality from said image sensor and a relatively noisy audio modality from said microphone, and (b) denoise the relatively noisy audio modality of the second multi-modal signal by associating between (i) features exhibited in the relatively noisy audio modality of the second multi-modal signal and (ii) the correlated features of the first multi-modal signal. 9. The apparatus according to claim 8 , wherein said at least one hardware processor is further configured to replace the features exhibited in the relatively noisy audio modality of the second multi-modal signal with the features exhibited in the relatively clear audio modality of the first multi-modal signal. 10. The apparatus according to claim 9 , wherein said replace is based on a statistical analysis of the features of: the relatively clear video modality of the first multi-modal signal; and the relatively clear video modality of the second multi-modal signal. 11. The apparatus according to claim 9 , wherein said replace is based on a pattern recognition of the features of: the relatively clear video modality of the first multi-modal signal; and the relatively clear video modality of the second multi-modal signal. 12. The apparatus according to claim 9 , wherein said at least one hardware processor is further configured to divide the relatively clear audio modality of the first multi-modal signal into a plurality of temporal segments. 13. The apparatus according to claim 12 , wherein each of the plurality of temporal segments is between 0.2 and 0.4 seconds long. 14. A computer program product for cross-modal signal denoising, comprising a non-transitory computer-readable storage medium having program code embodied therewith, the program code executable by at least one hardware processor to: provide a first multi-modal signal comprising at least two relatively clear modalities; correlate features exhibited simultaneously in the at least two relatively clear modalities of the first multi-modal signal; provide a second multi-modal signal comprising at least one relatively noisy modality and at least one relatively clear modality; and denoise the at least one relatively noisy modality of the second multi-modal signal by associating between (a) features exhibited in the at least one relatively noisy modality of the second multi-modal signal and (b) the correlated features of the first multi-modal signal. 15. The computer program product according to claim 14 , wherein said denoise comprises replacing the features exhibited in the at least one relatively noisy modality of the second multi-modal signal with the features exhibited in one of the at least two relatively clear modalities of the first multi-modal signal. 16. The computer program product according to claim 15 , wherein said replacing is based on a statistical analysis of the features of: one of the at least two relatively clear modalities of the first multi-modal signal; and features exhibited in the at least one relatively clear modality of the second multi-modal signal. 17. The computer program product according to claim 16 , wherein said replacing is based on a pattern recognition of the features of: one of the at least two relatively clear modalities of the first multi-modal signal, and features exhibited in the at least one relatively clear modality of the second multi-modal signal. 18. The computer program product according to claim 14 , wherein: the at least two relatively clear modalities of the first multi-modal signal are an audio modality and a video modality; the at least one relatively noisy modality of the second multi-modal signal is an audio modality; and the at least one relatively clear modality of the second multi-modal signal is a video modality. 19. The computer program product according to claim 14 , wherein said program code is further executable to divide one of the at least two relatively clear modalities of the first multi-modal signal into a plurality of temporal segments. 20. The computer program product according to claim 19 , wherein each of the plurality of temporal segments is between 0.2 and 0.4 seconds long. 21. A method for cross-modal signal denoising, the method comprising using at least one hardware processor for: providing correlated features exhibited simultaneously in a relatively clear video modality and in a relatively clear audio modality both belonging to a first multi-modal signal; providing a second multi-modal signal comprising at least one relatively noisy modality and at least one relatively clear modality; and denoising the at least one relatively noisy modality of the second multi-modal signal by associating between (a) features exhibited in the at least one relati

Assignees

Inventors

Classifications

  • using position of the lips, movement of the lips or face analysis · CPC title

  • Processing of audio elementary streams · CPC title

  • Noise filtering · CPC title

  • for processing of video signals · CPC title

  • the noise being separate speech, e.g. cocktail party · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9576587B2 cover?
A method for cross-modal signal denoising, the method comprising using at least one hardware processor for: providing a first multi-modal signal comprising at least two relatively clear modalities; correlating features exhibited simultaneously in the at least two relatively clear modalities of the first multi-modal signal; providing a second multi-modal signal comprising at least one relatively…
Who is the assignee on this patent?
Technion Res & Dev Foundation, Technion Res & Dev Foundation
What technology area does this patent fall under?
Primary CPC classification G10L21/0208. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 21 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).