Systems and methods for generating labeled data to facilitate configuration of network microphone devices
US-11551670-B1 · Jan 10, 2023 · US
US11854572B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11854572-B2 |
| Application number | US-202117302981-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 18, 2021 |
| Priority date | May 18, 2021 |
| Publication date | Dec 26, 2023 |
| Grant date | Dec 26, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Computer-implemented methods, computer program products, and computer systems for mitigating frequency loss may include one or more processors configured for receiving first audio data corresponding to unobstructed user utterances, receiving second audio data corresponding to first obstructed user utterances, generating a frequency loss (FL) model representing frequency loss between the first audio data and the second audio data, receiving third audio data corresponding to one or more second obstructed user utterances, processing the third audio data using the FL model to generate fourth audio data corresponding to a frequency loss mitigated version of the second obstructed user utterances, and transmitting the fourth audio data to a recipient computing device. The first obstructed user utterances are obstructed by a facemask and the one or more second obstructed user utterances is obstructed by the facemask. The FL model may be executed as an audio plugin in a web conferencing program.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: receiving, by one or more processors, first audio data corresponding to one or more unobstructed utterances of a user, wherein the received first audio data is associated with a received age of the user; receiving, by one or more processors, second audio data corresponding to one or more first obstructed utterances of the user; generating and storing, by one or more processors, a frequency loss model (FLM) for the user representing frequency loss between the first audio data and the second audio data at the received age of the user; adjusting, by one or more processors, the stored FLM for the user based on changes to the user's age and stored measurements for a given age; receiving, by one or more processors, third audio data corresponding to one or more second obstructed user utterances; processing, by one or more processors, the third audio data using the tuned FLM to generate fourth audio data corresponding to a frequency loss mitigated version of the one or more second obstructed user utterances; and transmitting, by one or more processors, the fourth audio data to a recipient computing device. 2. The computer-implemented method of claim 1 , wherein the first audio data and the second audio data are captured via a microphone of a computing device. 3. The computer-implemented method of claim 1 , wherein generating the FLM further comprises: converting, by one or more processors, the first audio data and the second audio data to frequency domains; determining, by one or more processors, frequency deltas for one or more of a range of frequencies in the frequency domains; determining, by one or more processors, attenuation values for one or more of the range of frequencies for the first audio data and the second audio data; and mapping, by one or more processors, the frequency deltas and the attenuation values in the frequency domains for the first audio data and the second audio data in a matrix representing the FLM. 4. The computer-implemented method of claim 3 , further comprising: generating, by one or more processors, a graphical display of the FLM on a user interface of a computing device, wherein the graphical display comprises visualizations of time domain images and frequency domain images of: the first audio data, the second audio data, the attenuation values, and the frequency deltas. 5. The computer-implemented method of claim 1 , wherein the recipient computing device is a wearable device configured to process and reproduce the fourth audio data as an audio signal via a speaker of the wearable device. 6. The computer-implemented method of claim 1 , wherein the FLM is executed as an audio plugin in a web conferencing program. 7. The computer-implemented method of claim 1 , wherein generating the fourth audio data utilizes an adjustment selected from the group consisting of: bell-shaped equalizer response and Brickwall. 8. A computer program product comprising: one or more computer readable storage media and program instructions collectively stored on the one or more computer readable storage media, the stored program instructions comprising: program instructions to receive first audio data corresponding to one or more unobstructed utterances of a user, wherein the received first audio data is associated with a received age of the user; program instructions to receive second audio data corresponding to one or more first obstructed utterances of the user; program instructions to generate and store a frequency loss model (FLM) for the user representing frequency loss between the first audio data and the second audio data at the received age of the user; program instructions to adjust the stored FLM for the user based on changes to the user's age and stored measurements for a given age; program instructions to receive third audio data corresponding to one or more second obstructed user utterances; program instructions to process the third audio data using the tuned FLM to generate fourth audio data corresponding to a frequency loss mitigated version of the one or more second obstructed user utterances; and program instructions to transmit the fourth audio data to a recipient computing device. 9. The computer program product of claim 8 , wherein the first audio data and the second audio data are captured via a microphone of a computing device. 10. The computer program product of claim 8 , wherein generating the FLM further comprises: program instructions to convert the first audio data and the second audio data to frequency domains; program instructions to determine frequency deltas for each of a range of frequencies in the frequency domains; program instructions to determine attenuation values for each of the range of frequencies for the first audio data and the second audio data; and program instructions to map the frequency deltas and the attenuation values in the frequency domains for the first audio data and the second audio data in a matrix representing the FLM. 11. The computer program product of claim 10 , further comprising: program instructions to generate a graphical display of the FLM on a user interface of a computing device, wherein the graphical display comprises visualizations of time domain images and frequency domain images of: the first audio data, the second audio data, the attenuation values, and the frequency deltas. 12. The computer program product of claim 8 , wherein the recipient computing device is a wearable device configured to process and reproduce the fourth audio data as an audio signal via a speaker of the wearable device. 13. The computer program product of claim 8 , wherein the one or more first obstructed user utterances is obstructed by a facemask and the one or more second obstructed user utterances is obstructed by the facemask. 14. The computer program product of claim 8 , wherein the FLM is executed as an audio plugin in a web conferencing program. 15. A computer system comprising: one or more computer processors; one or more computer readable storage media; program instructions collectively stored on the one or more computer readable storage media for execution by at least one of the one or more computer processors, the stored program instructions comprising: program instructions to receive first audio data corresponding to one or more unobstructed utterances of a user, wherein the received first audio data is associated with a received age of the user; program instructions to receive second audio data corresponding to one or more first obstructed utterances of the user; program instructions to generate and store a frequency loss model (FLM) for the user representing frequency loss between the first audio data and the second audio data at the received age of the user; program instructions to adjust the stored FLM for the user based on changes to the user's age and stored measurements for a given age; program instructions to receive third audio data corresponding to one or more second obstructed user utterances; program instructions to process the third audio data using the tuned FLM to generate fourth audio data corresponding to a frequency loss mitigated version of the one or more second obstructed user utterances; and program instructions to transmit the fourth audio data to a recipient computing device. 16. The computer system of claim 15 , wherein the first audio data and the second audio data are captured via a microphone of a computing device, and the FL model is executed as an audio plugin in a web conferencing program.
the extracted parameters being spectral information of each sub-band · CPC title
using distance or distortion measures between unknown speech and reference templates · CPC title
using statistical models, e.g. Hidden Markov Models [HMMs] (G10L15/18 takes precedence) · CPC title
Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing · CPC title
using subband decomposition · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.