Example-based audio inpainting

US9583111B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9583111-B2
Application numberUS-201414332913-A
CountryUS
Kind codeB2
Filing dateJul 16, 2014
Priority dateJul 17, 2013
Publication dateFeb 28, 2017
Grant dateFeb 28, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for packet loss concealment, that includes: continuously receiving a digital audio stream; extracting audio features from the digital audio stream while the digital audio stream is unharmed; and upon detecting a gap in the digital audio stream, filling the gap with one or more previous segments of the digital audio stream, wherein the filling is based on a matching of the one or more of the extracted audio features with one or more audio features adjacent to the gap.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for packet loss concealment, comprising: continuously receiving a digital audio stream that comprises speech of a user; extracting audio features from the digital audio stream while the digital audio stream is unharmed; and upon detecting a gap in the digital audio stream, filling the gap with one or more previous segments of the digital audio stream, to produce a perceptually-acceptable digital audio output having a mean opinion score (MOS) of 2.5 or more, wherein said filling is based on a matching of extracted audio features of the one or more previous segments with one or more audio features adjacent to the gap. 2. The method according to claim 1 , wherein said matching is based on prior statistics of the digital audio stream. 3. The method according to claim 2 , wherein the prior statistics comprise a probability distribution of temporal syllable sequences. 4. The method according to claim 1 , further comprising dividing the digital audio stream into consecutive segments. 5. The method according to claim 4 , wherein the consecutive segments are partially-overlapping. 6. The method according to claim 4 , further comprising clustering the consecutive segments, wherein said clustering is to a number of clusters based syllable types. 7. The method according to claim 6 , wherein the number of clusters is between 250 and 350. 8. The method according to claim 4 , further comprising dividing each of the consecutive segments into audio tiles according to mel frequency cepstral coefficients (MFCC). 9. The method according to claim 1 , wherein said filling of the gap comprises synthesizing a restored digital audio signal by adjusting pitch and gain values of the one or more audio features adjacent to the gap. 10. The method according to claim 9 , wherein said synthesizing further comprises preventing discontinuities in the restored digital audio signal by synthesizing a gradual transition at edges of the gap. 11. The method according to claim 1 , wherein: said digital audio stream is comprised within a digital video stream; the method further comprises extracting visual features from the digital video stream; and said filling is further based on a matching of the one or more of the extracted visual features with one or more of the extracted audio features. 12. An apparatus comprising: a speaker; a network interface module; and at least one hardware processor configured to: (a) continuously receive a digital audio stream using said network interface module, wherein the digital audio stream comprises speech of a user, (b) extract audio features from the digital audio stream while the digital audio stream is unharmed, (c) upon detecting a gap in the digital audio stream, synthesize a restored digital audio signal by filling the gap with one or more previous segments of the digital audio stream wherein said filling is based on a matching of the extracted audio features of the one or more previous segments with one or more audio features adjacent to the gap, and (d) sounding the restored digital audio signal using said speaker, wherein the restored digital audio signal is perceptually-acceptable and has a mean opinion score (MOS) of 2.5 or more. 13. The apparatus according to claim 12 , wherein said matching is based on prior statistics of the digital audio stream, the prior statistics comprising a probability distribution of temporal syllable sequence. 14. The apparatus according to claim 12 , wherein said at least one hardware processor is further configured to divide the digital audio stream into partially-overlapping consecutive segments. 15. The apparatus according to claim 14 , wherein said at least one hardware processor is further configured to cluster the consecutive segments, wherein the cluster is to a number of clusters based syllable types. 16. The apparatus according to claim 12 , wherein said filling of the gap comprises synthesizing the restored digital audio signal also by adjusting pitch and gain values of the one or more audio features adjacent to the gap, and preventing discontinuities in the restored digital audio signal by synthesizing a gradual transition at edges of the gap. 17. The apparatus according to claim 12 , wherein said digital audio stream is comprised within a digital video stream; said at least one hardware processor is further configured to extract visual features from the digital video stream; and said filling is further based on a matching of the one or more of the extracted visual features with one or more of the extracted audio features. 18. A method for packet loss concealment, comprising using at least one hardware processor for filling a gap in a digital audio stream with previously received audio of the digital audio stream, to produce a perceptually-acceptable digital audio output that has a mean opinion score (MOS) of 2.5 or more, wherein said filling is based on feature matching between audio adjacent to the gap and the previously received audio, and wherein the digital audio stream comprises speech of a user.

Assignees

Inventors

Classifications

  • using orthogonal transformation · CPC title

  • G10L19/005Primary

    Correction of errors induced by the transmission channel, if related to the coding algorithm · CPC title

  • Feature extraction for speech recognition; Selection of recognition unit · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9583111B2 cover?
A method for packet loss concealment, that includes: continuously receiving a digital audio stream; extracting audio features from the digital audio stream while the digital audio stream is unharmed; and upon detecting a gap in the digital audio stream, filling the gap with one or more previous segments of the digital audio stream, wherein the filling is based on a matching of the one or more o…
Who is the assignee on this patent?
Technion Res & Dev Foundation, Technion Res & Dev Foundation
What technology area does this patent fall under?
Primary CPC classification G10L19/005. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).