Obfuscating audio samples for health privacy contexts

US11929063B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11929063-B2
Application numberUS-202117534396-A
CountryUS
Kind codeB2
Filing dateNov 23, 2021
Priority dateNov 23, 2021
Publication dateMar 12, 2024
Grant dateMar 12, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A supervised discriminator for detecting bio-markers in an audio sample dataset is trained and a denoising autoencoder is trained to learn a latent space that is used to reconstruct an output audio sample with a same fidelity as an input audio sample of the audio sample dataset. A conditional auxiliary generative adversarial network (GAN) trained to generate the output audio sample with the same fidelity as the input audio sample, wherein the output audio sample is void of the bio-markers. The conditional auxiliary generative adversarial network (GAN), the corresponding supervised discriminator, and the corresponding denoising autoencoder are deployed in an audio processing system.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: training, using at least one processor, a supervised discriminator to detect bio-markers in an audio sample dataset; training, using the at least one processor, a denoising autoencoder to learn a latent space that is used to reconstruct an output audio sample with a same fidelity as an input audio sample of the audio sample dataset; training, using the at least one processor, a conditional auxiliary generative adversarial network (GAN) to generate the output audio sample with the same fidelity as the input audio sample, wherein the output audio sample is void of the bio-markers; and deploying the conditional auxiliary generative adversarial network (GAN), the supervised discriminator, and the denoising autoencoder in an audio processing system. 2. The method of claim 1 , further comprising minimizing a classification generalization error during the training of the supervised discriminator. 3. The method of claim 1 , wherein the training of the denoising autoencoder to learn the latent space that is used to reconstruct the output audio sample is performed by minimizing a KL-divergence based reconstruction error loss plus a fidelity term. 4. The method of claim 3 , wherein the KL-divergence based reconstruction error loss plus the fidelity term is based on one or more of a frequency response, a distortion, noise, and time-based errors. 5. The method of claim 1 , further comprising using a discriminator function as the supervised discriminator in the conditional auxiliary generative adversarial network (GAN), and the denoising autoencoder as a generator, the conditional auxiliary generative adversarial network (GAN) being trained such that the discriminator function attempts to maximize an entropy that clean samples pass through the discriminator and minimize an entropy that a denoised representation of bad samples containing the bio-markers pass through the supervised discriminator. 6. The method of claim 5 , further comprising freezing the generator and backpropagating through the discriminator function using a gradient from the generative adversarial network loss. 7. The method of claim 6 , further comprising freezing the discriminator function and propagating through the generator using the gradient from the generative adversarial network loss combined with a decaying constant times a reconstruction error loss of the generator. 8. The method of claim 1 , further comprising iterating the training of the denoising autoencoder and the training of the conditional auxiliary generative adversarial network until convergence. 9. The method of claim 1 , wherein the supervised discriminator comprises a convolutional neural network that inputs mel-frequency cepstral coefficients (MFCC) representations of the audio sample dataset and classifies a presence of the bio-marker, where a first classification represents the presence of the corresponding bio-marker and a second classification represents an absence of the corresponding bio-marker. 10. The method of claim 1 , further comprising creating the supervised discriminator via model distillation from a black box teacher model. 11. The method of claim 1 , wherein the training of the supervised discriminator is based on extracted features from a mel-representation of the audio sample dataset. 12. The method of claim 1 , wherein the denoising autoencoder comprises a convolutional neural network that inputs MFCC representations of the audio sample dataset and produces a denoised version of the MFCC representations. 13. The method of claim 1 , further comprising obfuscating one or more bio-markers of speech of a human subject using the conditional auxiliary generative adversarial network (GAN), the supervised discriminator, and the denoising autoencoder so that the audio processing system has access to an intelligible version of the speech but does not have access to the one or more bio-markers of the human subject. 14. An apparatus comprising: a memory; and at least one processor, coupled to said memory, and operative to perform operations comprising: training a supervised discriminator to detect bio-markers in an audio sample dataset; training a denoising autoencoder to learn a latent space that is used to reconstruct an output audio sample with a same fidelity as an input audio sample of the audio sample dataset; training a conditional auxiliary generative adversarial network (GAN) to generate the output audio sample with the same fidelity as the input audio sample, wherein the output audio sample is void of the bio-markers; and deploying the conditional auxiliary generative adversarial network (GAN), the supervised discriminator, and the denoising autoencoder in an audio processing system. 15. The apparatus of claim 14 , the operations further comprising minimizing a classification generalization error during the training of the supervised discriminator. 16. The apparatus of claim 14 , wherein the training of the denoising autoencoder to learn the latent space that is used to reconstruct the output audio sample is performed by minimizing a KL-divergence based reconstruction error loss plus a fidelity term. 17. The apparatus of claim 14 , wherein the operations further comprise using a discriminator function as the supervised discriminator in the conditional auxiliary generative adversarial network (GAN), and the denoising autoencoder as a generator, the conditional auxiliary generative adversarial network (GAN) being trained such that the discriminator function attempts to maximize an entropy that clean samples pass through the discriminator and minimize an entropy that a denoised representation of bad samples containing the bio-markers pass through the supervised discriminator. 18. The apparatus of claim 17 , the operations further comprising freezing the generator and backpropagating through the discriminator function using a gradient from the generative adversarial network loss. 19. The apparatus of claim 18 , the operations further comprising freezing the discriminator function and propagating through the generator using the gradient from the generative adversarial network loss combined with a decaying constant times a reconstruction error loss of the generator. 20. A computer program product for federated learning, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform operations comprising: training a supervised discriminator to detect bio-markers in an audio sample dataset; training a denoising autoencoder to learn a latent space that is used to reconstruct an output audio sample with a same fidelity as an input audio sample of the audio sample dataset; training a conditional auxiliary generative adversarial network (GAN) to generate the output audio sample with the same fidelity as the input audio sample, wherein the output audio sample is void of the bio-markers; and deploying the conditional auxiliary generative adversarial network (GAN), the supervised discriminator, and the denoising autoencoder in an audio processing system.

Assignees

Inventors

Classifications

  • G10L15/16Primary

    using artificial neural networks · CPC title

  • Combinations of networks · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • using neural networks · CPC title

  • Detection of discrete points within a voice signal · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11929063B2 cover?
A supervised discriminator for detecting bio-markers in an audio sample dataset is trained and a denoising autoencoder is trained to learn a latent space that is used to reconstruct an output audio sample with a same fidelity as an input audio sample of the audio sample dataset. A conditional auxiliary generative adversarial network (GAN) trained to generate the output audio sample with the sam…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 12 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).