Enhanced audio file generator
US-2024105203-A1 · Mar 28, 2024 · US
US12505830B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12505830-B2 |
| Application number | US-202218046137-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 12, 2022 |
| Priority date | Oct 12, 2022 |
| Publication date | Dec 23, 2025 |
| Grant date | Dec 23, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A voice morphing model can transform diverse voices to one or a small number of target voices. Speech recognition on diverse voices can be performed by morphing it to a target voice and then performing recognition on audio with the target voice. A source of requests for speech recognition can pass audio and a voiceprint with requests. Speech recognition can run with improved accuracy by biasing an acoustic model for the voice in the audio using the voiceprint. The audio can be used to calculate a new voiceprint, which can be used to update the voiceprint included with the audio. The updated voiceprint can be sent back to the source and then used with future speech recognition requests.
Opening claim text (preview).
The invention claimed is: 1 . A computer-implemented method of training an acoustic model, the method comprising: obtaining a voiceprint calculator that calculates a score for the distance between a voice in speech audio and a target voice; training a voice morphing model to morph speech audio to the target voice, the training using speech audio of multiple distinct voices with a loss function dependent on the score; training an acoustic model on transcribed speech in the target voice; and tuning the voice morphing model and acoustic model by backpropagation of error reduction based on a measurement of the error rate of phoneme inference, wherein the acoustic model can infer phonemes from audio morphed by the voice morphing model. 2 . The method of claim 1 wherein the transcribed speech in the target voice is from a single speaker without morphing. 3 . The method of claim 1 wherein the transcribed speech in the target voice is generated by morphing speech audio of multiple distinct voices. 4 . The method of claim 3 further comprising: finetuning the voice morphing model with a loss function dependent on an error rate of the acoustic model when run on the morphed audio of transcribed speech. 5 . The method of claim 1 further comprising tuning the voice morphing model while keeping the acoustic model fixed. 6 . The method of claim 1 further comprising tuning the acoustic model while keeping the voice morphing model fixed. 7 . The method of claim 1 further comprising measuring the amount of noise in the morphed speech audio, wherein the loss function further depends on the amount of noise. 8 . A computer implemented method of phoneme inference, the method comprising: calculating a plurality of scores for the distances between a voice in speech audio from multiple distinct voices and a target voice; training a voice morphing model to morph speech audio to the target voice, the training using speech audio of the multiple distinct voices with a loss function dependent on the scores; morphing audio of sampled speech to a target voice using the voice morphing model to generate morphed audio; and inferring a sequence of phonemes from the morphed audio using an acoustic model, wherein the acoustic model has an accuracy bias in favor of the target voice. 9 . The method of claim 8 wherein the acoustic model is conditioned by a choice of the target voice from among a plurality of target voices. 10 . A computer-implemented method of training an acoustic model, the method comprising: obtaining a voiceprint calculator that calculates a plurality of scores for the distances between each voice of multiple distinct voices in speech audio and a target voice; training a voice morphing model to morph speech audio to the target voice, the training using speech audio of the multiple distinct voices with a loss function dependent on the scores; and training an acoustic model on transcribed speech in the target voice, wherein the acoustic model can infer phonemes from audio morphed by the voice morphing model. 11 . The method of claim 1 wherein the transcribed speech in the target voice is from a single speaker without morphing. 12 . The method of claim 1 wherein the transcribed speech in the target voice is generated by morphing speech audio of multiple distinct voices. 13 . The method of claim 12 further comprising: finetuning the voice morphing model with a loss function dependent on an error rate of the acoustic model when run on the morphed audio of transcribed speech. 14 . The method of claim 1 further comprising tuning the voice morphing model while keeping the acoustic model fixed. 15 . The method of claim 1 further comprising tuning the acoustic model while keeping the voice morphing model fixed. 16 . The method of claim 1 further comprising tuning the voice morphing model and acoustic model by backpropagation of error reduction based on a measurement of the error rate of phoneme inference. 17 . The method of claim 1 further comprising measuring the amount of noise in the morphed speech audio, wherein the loss function further depends on the amount of noise.
Training · CPC title
Voice conversion or morphing · CPC title
Feature extraction for speech recognition; Selection of recognition unit · CPC title
using natural language modelling · CPC title
characterised by the process used · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.