Machine-learning based detection and classification of personally identifiable information
US-10585989-B1 · Mar 10, 2020 · US
US12039995B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12039995-B2 |
| Application number | US-202217667370-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 8, 2022 |
| Priority date | Jan 2, 2020 |
| Publication date | Jul 16, 2024 |
| Grant date | Jul 16, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
This application discloses an audio signal processing method performed by an electronic device. According to this application, embedding processing is performed on a mixed audio signal by mapping the mixed audio signal to an embedding space, to obtain an embedding feature of the mixed audio signal in the embedding space; and generalized feature extraction is performed on the embedding feature, so that a generalized feature of a target component in the mixed audio signal can be obtained through extraction. The generalized feature of the target component has good generalization capability and expression capability, and can be used for different scenarios. Audio signal processing is performed on the mixed audio signal based on the generalized feature of the target component to obtain information of the audio signal of the target object, thereby improving the robustness and generalization of an audio signal processing process, and improving the accuracy of audio signal processing.
Opening claim text (preview).
What is claimed is: 1. An audio signal processing method performed by an electronic device, the method comprising: performing embedding processing on a mixed audio signal by mapping the mixed audio signal from a low-dimensional space to a high-dimensional embedding space using an encoder network, to obtain an embedding feature of the mixed audio signal in the embedding space; performing generalized feature extraction on the embedding feature using an abstractor network, to obtain a generalized feature of a target component in the mixed audio signal, the target component corresponding to an audio signal of a target object in the mixed audio signal, wherein a dimension of the generalized feature is lower than a dimension of embedding feature of the mixed audio signal; and performing audio signal processing on the mixed audio signal based on the generalized feature of the target component to obtain information of the audio signal of the target object in the mixed audio signal used for separating the audio signal of the target object from the mixed audio signal, wherein the encoder network and the abstractor network are obtained by collaboratively training on a teacher model and a student model through unsupervised machine learning using unlabeled sample mixed signals in multiple iterations, wherein the student model comprises a first encoder network and a first abstractor network, the teacher model comprises a second encoder network and a second abstractor network, an output of the first encoder network is used as an input of the first abstractor network, and an output of the second encoder network is used as an input of the second abstractor network, and the teacher model in each iteration process is obtained by weighting the teacher model in a previous iteration process and the student model in the current iteration process. 2. The method according to claim 1 , wherein the performing generalized feature extraction on the embedding feature, to obtain a generalized feature of a target component in the mixed audio signal comprises: performing recursive weighting processing on the embedding feature, to obtain the generalized feature of the target component. 3. The method according to claim 1 , wherein the abstractor network is an autoregressive model, and the inputting the embedding feature into an abstractor network, and performing generalized feature extraction on the embedding feature by using the abstractor network, to obtain the generalized feature of the target component in the mixed audio signal comprises: inputting the embedding feature into the autoregressive model, and performing recursive weighting processing on the embedding feature by using the autoregressive model, to obtain the generalized feature of the target component. 4. The method according to claim 1 , wherein the collaboratively training on a teacher model and a student model based on an unlabeled sample mixed signal, to obtain the encoder network and the abstractor network comprises: obtaining, in any iteration process, the teacher model in the current iteration process based on the student model in the current iteration process and the teacher model in a previous iteration process; respectively inputting the unlabeled sample mixed signal into the teacher model and the student model in the current iteration process, and respectively outputting a teacher generalized feature and a student generalized feature of a target component in the sample mixed signal; obtaining a loss function value of the current iteration process based on at least one of the sample mixed signal, the teacher generalized feature, or the student generalized feature; adjusting, when the loss function value does not meet a training end condition, a parameter of the student model to obtain the student model in a next iteration process, and performing the next iteration process based on the student model in the next iteration process; and obtaining the encoder network and the abstractor network based on the student model or the teacher model in the current iteration process when the loss function value meets the training end condition. 5. The method according to claim 4 , wherein the obtaining a loss function value of the current iteration process based on at least one of the sample mixed signal, the teacher generalized feature, or the student generalized feature comprises: obtaining a mean squared error (MSE) between the teacher generalized feature and the student generalized feature; obtaining a mutual information (MI) value between the sample mixed signal and the student generalized feature; and determining at least one of the MSE or the MI value as the loss function value of the current iteration process. 6. The method according to claim 5 , wherein the training end condition is that the MSE does not decrease in a first target quantity of consecutive iteration processes; or the training end condition is that the MSE is less than or equal to a first target threshold and the MI value is greater than or equal to a second target threshold; or the training end condition is that a quantity of iterations reaches a second target quantity. 7. The method according to claim 4 , wherein the obtaining the teacher model in the current iteration process based on the student model in the current iteration process and the teacher model in a previous iteration process comprises: multiplying a parameter set of the teacher model in the previous iteration process by a first smoothing coefficient, to obtain a first parameter set; multiplying the student model in the current iteration process by a second smoothing coefficient, to obtain a second parameter set, a value obtained by adding the first smoothing coefficient and the second smoothing coefficient being 1; determining a sum of the first parameter set and the second parameter set as a parameter set of the teacher model in the current iteration process; and performing parameter update on the teacher model in the previous iteration process based on the parameter set of the teacher model in the current iteration process, to obtain the teacher model in the current iteration process. 8. The method according to claim 4 , wherein the obtaining the encoder network and the abstractor network based on the student model or the teacher model in the current iteration process comprises: respectively determining the first encoder network and the first abstractor network in the student model in the current iteration process as the encoder network and the abstractor network; or respectively determining the second encoder network and the second abstractor network in the teacher model in the current iteration process as the encoder network and the abstractor network. 9. The method according to claim 1 , wherein the performing audio signal processing on the mixed audio signal based on the generalized feature of the target component to obtain information of the audio signal of the target object in the mixed audio signal comprises: performing speech-to-text conversion on the audio signal of the target object based on the generalized feature of the target component, and outputting text information corresponding to the audio signal of the target object; or performing voiceprint recognition on the audio signal of the target object based on the generalized feature of the target component, and outputting a voiceprint recognition result corresponding to the audio signal of the target object; or generating a response speech corresponding to the audio signal of the target object based on the generalized feature of the target component, and outputting the response speech. 10. An electronic device, comprising one or more processors and one or more memorie
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
the noise being separate speech, e.g. cocktail party · CPC title
Processing in the frequency domain · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.