Who is the assignee on this patent?

Soundhound Inc, Soundhound Ai Ip Llc

What technology area does this patent fall under?

Primary CPC classification G10L15/18. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 23 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Automatic speech recognition with voice personalization and generalization

US12505830B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12505830-B2
Application number	US-202218046137-A
Country	US
Kind code	B2
Filing date	Oct 12, 2022
Priority date	Oct 12, 2022
Publication date	Dec 23, 2025
Grant date	Dec 23, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A voice morphing model can transform diverse voices to one or a small number of target voices. Speech recognition on diverse voices can be performed by morphing it to a target voice and then performing recognition on audio with the target voice. A source of requests for speech recognition can pass audio and a voiceprint with requests. Speech recognition can run with improved accuracy by biasing an acoustic model for the voice in the audio using the voiceprint. The audio can be used to calculate a new voiceprint, which can be used to update the voiceprint included with the audio. The updated voiceprint can be sent back to the source and then used with future speech recognition requests.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A computer-implemented method of training an acoustic model, the method comprising: obtaining a voiceprint calculator that calculates a score for the distance between a voice in speech audio and a target voice; training a voice morphing model to morph speech audio to the target voice, the training using speech audio of multiple distinct voices with a loss function dependent on the score; training an acoustic model on transcribed speech in the target voice; and tuning the voice morphing model and acoustic model by backpropagation of error reduction based on a measurement of the error rate of phoneme inference, wherein the acoustic model can infer phonemes from audio morphed by the voice morphing model. 2 . The method of claim 1 wherein the transcribed speech in the target voice is from a single speaker without morphing. 3 . The method of claim 1 wherein the transcribed speech in the target voice is generated by morphing speech audio of multiple distinct voices. 4 . The method of claim 3 further comprising: finetuning the voice morphing model with a loss function dependent on an error rate of the acoustic model when run on the morphed audio of transcribed speech. 5 . The method of claim 1 further comprising tuning the voice morphing model while keeping the acoustic model fixed. 6 . The method of claim 1 further comprising tuning the acoustic model while keeping the voice morphing model fixed. 7 . The method of claim 1 further comprising measuring the amount of noise in the morphed speech audio, wherein the loss function further depends on the amount of noise. 8 . A computer implemented method of phoneme inference, the method comprising: calculating a plurality of scores for the distances between a voice in speech audio from multiple distinct voices and a target voice; training a voice morphing model to morph speech audio to the target voice, the training using speech audio of the multiple distinct voices with a loss function dependent on the scores; morphing audio of sampled speech to a target voice using the voice morphing model to generate morphed audio; and inferring a sequence of phonemes from the morphed audio using an acoustic model, wherein the acoustic model has an accuracy bias in favor of the target voice. 9 . The method of claim 8 wherein the acoustic model is conditioned by a choice of the target voice from among a plurality of target voices. 10 . A computer-implemented method of training an acoustic model, the method comprising: obtaining a voiceprint calculator that calculates a plurality of scores for the distances between each voice of multiple distinct voices in speech audio and a target voice; training a voice morphing model to morph speech audio to the target voice, the training using speech audio of the multiple distinct voices with a loss function dependent on the scores; and training an acoustic model on transcribed speech in the target voice, wherein the acoustic model can infer phonemes from audio morphed by the voice morphing model. 11 . The method of claim 1 wherein the transcribed speech in the target voice is from a single speaker without morphing. 12 . The method of claim 1 wherein the transcribed speech in the target voice is generated by morphing speech audio of multiple distinct voices. 13 . The method of claim 12 further comprising: finetuning the voice morphing model with a loss function dependent on an error rate of the acoustic model when run on the morphed audio of transcribed speech. 14 . The method of claim 1 further comprising tuning the voice morphing model while keeping the acoustic model fixed. 15 . The method of claim 1 further comprising tuning the acoustic model while keeping the voice morphing model fixed. 16 . The method of claim 1 further comprising tuning the voice morphing model and acoustic model by backpropagation of error reduction based on a measurement of the error rate of phoneme inference. 17 . The method of claim 1 further comprising measuring the amount of noise in the morphed speech audio, wherein the loss function further depends on the amount of noise.

Assignees

Inventors

Mohajer Keyvan

Classifications

G10L15/063
Training · CPC title
G10L2021/0135
Voice conversion or morphing · CPC title
G10L15/02
Feature extraction for speech recognition; Selection of recognition unit · CPC title
G10L15/18Primary
using natural language modelling · CPC title
G10L21/007Primary
characterised by the process used · CPC title

Patent family

Related publications grouped by family.

View patent family 90626742

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12505830B2 cover?: A voice morphing model can transform diverse voices to one or a small number of target voices. Speech recognition on diverse voices can be performed by morphing it to a target voice and then performing recognition on audio with the target voice. A source of requests for speech recognition can pass audio and a voiceprint with requests. Speech recognition can run with improved accuracy by biasing…
Who is the assignee on this patent?: Soundhound Inc, Soundhound Ai Ip Llc
What technology area does this patent fall under?: Primary CPC classification G10L15/18. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 23 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Enhanced audio file generator

Systems and methods of pre-processing of speech signals for improved speech recognition

Generating and using text-to-speech data for speech recognition models

System and Method for Voice Morphing

Generation of voice data as data augmentation for acoustic model training

Dynamic pitch adjustment of inbound audio to improve speech recognition

Device and method for privacy-preserving vocal interaction

Frequently asked questions