Voice morphing apparatus having adjustable parameters

US11600284B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11600284-B2
Application numberUS-202016740440-A
CountryUS
Kind codeB2
Filing dateJan 11, 2020
Priority dateJan 11, 2020
Publication dateMar 7, 2023
Grant dateMar 7, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A voice morphing apparatus having adjustable parameters is described. The disclosed system and method include a voice morphing apparatus that morphs input audio to mask a speaker's identity. Parameter adjustment uses evaluation of an objective function that is based on the input audio and output of the voice morphing apparatus. The voice morphing apparatus includes objectives that are based adversarially on speaker identification and positively on audio fidelity. Thus, the voice morphing apparatus is adjusted to reduce identifiability of speakers while maintaining fidelity of the morphed audio. The voice morphing apparatus may be used as part of an automatic speech recognition system.

First claim

Opening claim text (preview).

The invention claimed is: 1. A voice morphing apparatus comprising: a neural network architecture to map input audio data to output audio data, the input audio data comprising a representation of speech from a speaker, the neural network architecture including a set of parameters, the set of parameters being trained to maximize a speaker identification distance from the input audio data to a set of speaker identification vectors and to optimize a speaker intelligibility score for the output audio data. 2. The voice morphing apparatus of claim 1 further comprising a noise filter to pre-process the input audio data. 3. The voice morphing apparatus of claim 2 , wherein the noise filter removes a noise component from the input audio data and the voice morphing apparatus adds the noise component to the set of speaker identification vectors from the neural network architecture. 4. The voice morphing apparatus of claim 1 , wherein the neural network architecture comprises one or more recurrent connections. 5. The voice morphing apparatus of claim 1 , wherein the voice morphing apparatus is configured to output time-series audio waveform data based on the set of speaker identification vectors from the neural network architecture. 6. A non-transitory computer-readable storage medium for storing instructions that, when executed by at least one processor, cause the at least one processor to: load input audio data from a data source; input the input audio data to a voice morphing apparatus, the voice morphing apparatus including a set of trainable parameters; process the input audio data using the voice morphing apparatus to generate morphed audio data; apply a speaker identification system to at least the morphed audio data to output a measure of speaker identification; apply an audio fidelity system to the morphed audio data and the input audio data to output a measure of audio fidelity; evaluate an objective function based on the measure of speaker identification and the measure of audio fidelity; and adjust the set of trainable parameters for the voice morphing apparatus based on a gradient of the objective function, wherein the objective function is configured to adjust the set of trainable parameters to optimize the measure of audio fidelity between the morphed audio data and the input audio data and to reduce the measure of speaker identification while maintaining speech intelligibility. 7. A method for optimizing training parameters, the method comprising: loading input audio data from a data source; inputting the input audio data to a voice morphing apparatus, the voice morphing apparatus including a set of trainable parameters; processing the input audio data using the voice morphing apparatus to generate morphed audio data; applying a speaker identification system to at least the morphed audio data to output a measure of speaker identification; applying an audio fidelity system to the morphed audio data and the input audio data to output a measure of audio fidelity; evaluating an objective function based on the measure of speaker identification and the measure of audio fidelity; and adjusting the set of trainable parameters for the voice morphing apparatus based on a gradient of the objective function, wherein the objective function is configured to adjust the set of trainable parameters to optimize the measure of audio fidelity between the morphed audio data and the input audio data and to reduce the measure of speaker identification while maintaining speech intelligibility.

Assignees

Inventors

Classifications

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • Generative networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11600284B2 cover?
A voice morphing apparatus having adjustable parameters is described. The disclosed system and method include a voice morphing apparatus that morphs input audio to mask a speaker's identity. Parameter adjustment uses evaluation of an objective function that is based on the input audio and output of the voice morphing apparatus. The voice morphing apparatus includes objectives that are based adv…
Who is the assignee on this patent?
Soundhound Inc
What technology area does this patent fall under?
Primary CPC classification G10L21/013. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).