System and method for continuous privacy-preserved audio collection
US-2020258535-A1 · Aug 13, 2020 · US
US11929063B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11929063-B2 |
| Application number | US-202117534396-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 23, 2021 |
| Priority date | Nov 23, 2021 |
| Publication date | Mar 12, 2024 |
| Grant date | Mar 12, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A supervised discriminator for detecting bio-markers in an audio sample dataset is trained and a denoising autoencoder is trained to learn a latent space that is used to reconstruct an output audio sample with a same fidelity as an input audio sample of the audio sample dataset. A conditional auxiliary generative adversarial network (GAN) trained to generate the output audio sample with the same fidelity as the input audio sample, wherein the output audio sample is void of the bio-markers. The conditional auxiliary generative adversarial network (GAN), the corresponding supervised discriminator, and the corresponding denoising autoencoder are deployed in an audio processing system.
Opening claim text (preview).
What is claimed is: 1. A method comprising: training, using at least one processor, a supervised discriminator to detect bio-markers in an audio sample dataset; training, using the at least one processor, a denoising autoencoder to learn a latent space that is used to reconstruct an output audio sample with a same fidelity as an input audio sample of the audio sample dataset; training, using the at least one processor, a conditional auxiliary generative adversarial network (GAN) to generate the output audio sample with the same fidelity as the input audio sample, wherein the output audio sample is void of the bio-markers; and deploying the conditional auxiliary generative adversarial network (GAN), the supervised discriminator, and the denoising autoencoder in an audio processing system. 2. The method of claim 1 , further comprising minimizing a classification generalization error during the training of the supervised discriminator. 3. The method of claim 1 , wherein the training of the denoising autoencoder to learn the latent space that is used to reconstruct the output audio sample is performed by minimizing a KL-divergence based reconstruction error loss plus a fidelity term. 4. The method of claim 3 , wherein the KL-divergence based reconstruction error loss plus the fidelity term is based on one or more of a frequency response, a distortion, noise, and time-based errors. 5. The method of claim 1 , further comprising using a discriminator function as the supervised discriminator in the conditional auxiliary generative adversarial network (GAN), and the denoising autoencoder as a generator, the conditional auxiliary generative adversarial network (GAN) being trained such that the discriminator function attempts to maximize an entropy that clean samples pass through the discriminator and minimize an entropy that a denoised representation of bad samples containing the bio-markers pass through the supervised discriminator. 6. The method of claim 5 , further comprising freezing the generator and backpropagating through the discriminator function using a gradient from the generative adversarial network loss. 7. The method of claim 6 , further comprising freezing the discriminator function and propagating through the generator using the gradient from the generative adversarial network loss combined with a decaying constant times a reconstruction error loss of the generator. 8. The method of claim 1 , further comprising iterating the training of the denoising autoencoder and the training of the conditional auxiliary generative adversarial network until convergence. 9. The method of claim 1 , wherein the supervised discriminator comprises a convolutional neural network that inputs mel-frequency cepstral coefficients (MFCC) representations of the audio sample dataset and classifies a presence of the bio-marker, where a first classification represents the presence of the corresponding bio-marker and a second classification represents an absence of the corresponding bio-marker. 10. The method of claim 1 , further comprising creating the supervised discriminator via model distillation from a black box teacher model. 11. The method of claim 1 , wherein the training of the supervised discriminator is based on extracted features from a mel-representation of the audio sample dataset. 12. The method of claim 1 , wherein the denoising autoencoder comprises a convolutional neural network that inputs MFCC representations of the audio sample dataset and produces a denoised version of the MFCC representations. 13. The method of claim 1 , further comprising obfuscating one or more bio-markers of speech of a human subject using the conditional auxiliary generative adversarial network (GAN), the supervised discriminator, and the denoising autoencoder so that the audio processing system has access to an intelligible version of the speech but does not have access to the one or more bio-markers of the human subject. 14. An apparatus comprising: a memory; and at least one processor, coupled to said memory, and operative to perform operations comprising: training a supervised discriminator to detect bio-markers in an audio sample dataset; training a denoising autoencoder to learn a latent space that is used to reconstruct an output audio sample with a same fidelity as an input audio sample of the audio sample dataset; training a conditional auxiliary generative adversarial network (GAN) to generate the output audio sample with the same fidelity as the input audio sample, wherein the output audio sample is void of the bio-markers; and deploying the conditional auxiliary generative adversarial network (GAN), the supervised discriminator, and the denoising autoencoder in an audio processing system. 15. The apparatus of claim 14 , the operations further comprising minimizing a classification generalization error during the training of the supervised discriminator. 16. The apparatus of claim 14 , wherein the training of the denoising autoencoder to learn the latent space that is used to reconstruct the output audio sample is performed by minimizing a KL-divergence based reconstruction error loss plus a fidelity term. 17. The apparatus of claim 14 , wherein the operations further comprise using a discriminator function as the supervised discriminator in the conditional auxiliary generative adversarial network (GAN), and the denoising autoencoder as a generator, the conditional auxiliary generative adversarial network (GAN) being trained such that the discriminator function attempts to maximize an entropy that clean samples pass through the discriminator and minimize an entropy that a denoised representation of bad samples containing the bio-markers pass through the supervised discriminator. 18. The apparatus of claim 17 , the operations further comprising freezing the generator and backpropagating through the discriminator function using a gradient from the generative adversarial network loss. 19. The apparatus of claim 18 , the operations further comprising freezing the discriminator function and propagating through the generator using the gradient from the generative adversarial network loss combined with a decaying constant times a reconstruction error loss of the generator. 20. A computer program product for federated learning, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform operations comprising: training a supervised discriminator to detect bio-markers in an audio sample dataset; training a denoising autoencoder to learn a latent space that is used to reconstruct an output audio sample with a same fidelity as an input audio sample of the audio sample dataset; training a conditional auxiliary generative adversarial network (GAN) to generate the output audio sample with the same fidelity as the input audio sample, wherein the output audio sample is void of the bio-markers; and deploying the conditional auxiliary generative adversarial network (GAN), the supervised discriminator, and the denoising autoencoder in an audio processing system.
Related publications grouped by family.
Answers are generated from the same data shown on this page.