What technology area does this patent fall under?

Primary CPC classification G06N20/20. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Apr 17 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Speaker recognition with quality indicators

US2025124945A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2025124945-A1
Application number	US-202418989690-A
Country	US
Kind code	A1
Filing date	Dec 20, 2024
Priority date	Aug 21, 2020
Publication date	Apr 17, 2025
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments described herein provide for a machine-learning architecture for modeling quality measures for enrollment signals. Modeling these enrollment signals enables the machine-learning architecture to identify deviations from expected or ideal enrollment signal in future test phase calls. These differences can be used to generate quality measures for the various audio descriptors or characteristics of audio signals. The quality measures can then be fused at the score-level with the speaker recognition's embedding comparisons for verifying the speaker. Fusing the quality measures with the similarity scoring essentially calibrates the speaker recognition's outputs based on the realities of what is actually expected for the enrolled caller and what was actually observed for the current inbound caller.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method comprising: extracting from an inbound audio signal for an inbound speaker, by a computer, a feature vector for one or more acoustic features; generating, by the computer, one or more quality measures and an overall quality measure for the inbound audio signal, by executing a machine-learning architecture using as input the feature vector for the one or more acoustic features, the one or more quality measures corresponding to a similarity between one or more expected quality descriptors and one or more quality descriptors for call audio of the inbound audio signal; generating, by the computer, a final similarity score for verifying the inbound speaker by combining an initial similarity score with the one or more quality measures or the overall quality measure; and verifying, by the computer, the inbound speaker as an enrolled speaker based upon comparing the final similarity score against a verification threshold. 2 . The method according to claim 1 , further comprising generating, by the computer, the initial similarity score by executing a second machine-learning architecture using as input an inbound speaker embedding and an enrolled voiceprint for the enrolled speaker. 3 . The method according to claim 2 , further comprising generating, by the computer, the enrolled voiceprint by combining a plurality of enrollee embeddings. 4 . The method according to claim 3 , further comprising generating, by the computer, the plurality of enrollee embeddings by executing the second machine-learning architecture using as input a plurality of enrollee audio samples, wherein the inbound speaker embedding is generated by executing the second machine-learning architecture using as input the feature vector for the one or more acoustic features of the inbound audio signal. 5 . The method according to claim 1 , wherein generating the one or more quality measures for the inbound audio signal includes generating, by the computer, the overall quality measure based upon each of the quality measures. 6 . The method according to claim 1 , wherein generating the one or more quality measures includes: generating, by the computer, a plurality of speech segments from the inbound audio signal; and determining, by the computer, a total duration of speech based upon the plurality of speech segments. 7 . The method according to claim 1 , wherein generating a quality measure includes determining, by the computer, a level of similarity between an inbound speaker embedding and a corresponding enrolled speaker embedding for an enrolled audio signal. 8 . The method according to claim 1 , further comprising: receiving, by the computer, one or more clean enrollment audio signals for the enrolled speaker; generating, by the computer, one or more degraded enrollment audio signals corresponding to the one or more clean enrollment audio signals according to a type of degradation; and extracting, by the computer, one or more enrolled quality embeddings for the enrolled speaker by applying a second machine-learning architecture on the one or more clean enrollment audio signals and the one or more degraded enrollment audio signals. 9 . The method according to claim 8 , further comprising enabling, by the computer, classification layers and loss layers of the second machine-learning architecture in a training phase of the second machine-learning architecture. 10 . The method according to claim 8 , further comprising disabling, by the computer, classification layers and loss layers of the second machine-learning architecture in a deployment phase of the second machine-learning architecture. 11 . A system comprising: a database configured store an enrolled voiceprint for an enrolled speaker; and a server comprising a processor configured to: generate one or more quality measures and an overall quality measure for an inbound audio signal, by executing a machine-learning architecture using as input a feature vector for one or more acoustic features, the one or more quality measures corresponding to a similarity between one or more expected quality descriptors and one or more quality descriptors for call audio of the inbound audio signal; generate a final similarity score for verifying the inbound speaker by combining an initial similarity score with the one or more quality measures or the overall quality measure; and verify the inbound speaker as an enrolled speaker based upon comparing the final similarity score against a verification threshold. 12 . The system according to claim 11 , wherein the processor is further configured to generate the initial similarity score by executing a second machine-learning architecture using as input an inbound speaker embedding and an enrolled voiceprint for the enrolled speaker. 13 . The system according to claim 12 , wherein the processor is further configured to generate the enrolled voiceprint by combining a plurality of enrollee embeddings. 14 . The system according to claim 13 , wherein the processor is further configured to generate the plurality of enrollee embeddings by executing the second machine-learning architecture using as input a plurality of enrollee audio samples, wherein the inbound speaker embedding is generated by executing the second machine-learning architecture using as input the feature vector for the one or more acoustic features of the inbound audio signal. 15 . The system according to claim 11 , wherein the processor is further configured to generate the one or more quality measures for the inbound audio signal by generating the overall quality measure based upon each of the quality measures. 16 . The system according to claim 11 , wherein the processor is further configured to generate the one or more quality measures by: generating a plurality of speech segments from the inbound audio signal; and determining a total duration of speech based upon the plurality of speech segments. 17 . The system according to claim 11 , wherein the processor is further configured to generate a quality measure by determining a level of similarity between an inbound speaker embedding and a corresponding enrolled speaker embedding for an enrolled audio signal. 18 . The system according to claim 11 , wherein the processor is further configured to: receive one or more clean enrollment audio signals for the enrolled speaker; generate one or more degraded enrollment audio signals corresponding to the one or more clean enrollment audio signals according to a type of degradation; and extract one or more enrolled quality embeddings for the enrolled speaker by applying a second machine-learning architecture on the one or more clean enrollment audio signals and the one or more degraded enrollment audio signals. 19 . The system according to claim 18 , wherein the processor is further configured to enable classification layers and loss layers of the second machine-learning architecture in a training phase of the second machine-learning architecture. 20 . The system according to claim 19 , wherein the processor is further configured to disable the classification layers and loss layers of the second machine-learning architecture in a deployment phase of the second machine-learning architecture.

Assignees

Pindrop Security Inc

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/09
Supervised learning · CPC title
G10L15/02
Feature extraction for speech recognition; Selection of recognition unit · CPC title
G06N20/20Primary
Ensemble learning · CPC title
G06N3/045
Combinations of networks · CPC title

Patent family

Related publications grouped by family.

View patent family 80269005

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025124945A1 cover?: Embodiments described herein provide for a machine-learning architecture for modeling quality measures for enrollment signals. Modeling these enrollment signals enables the machine-learning architecture to identify deviations from expected or ideal enrollment signal in future test phase calls. These differences can be used to generate quality measures for the various audio descriptors or charac…
Who is the assignee on this patent?: Pindrop Security Inc
What technology area does this patent fall under?: Primary CPC classification G06N20/20. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Apr 17 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).