What technology area does this patent fall under?

Primary CPC classification H04M3/2281. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Mar 31 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Active voice liveness detection system

US12592239B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12592239-B2
Application number	US-202418646310-A
Country	US
Kind code	B2
Filing date	Apr 25, 2024
Priority date	Apr 28, 2023
Publication date	Mar 31, 2026
Grant date	Mar 31, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed are systems and methods including software processes executed by a server that detect audio-based synthetic speech (“deepfakes”) in a call conversation. Embodiments include systems and methods for detecting fraudulent presentation attacks using multiple functional engines that implement various fraud-detection techniques, to produce calibrated scores and/or fused scores. A computer may, for example, evaluate the audio quality of speech signals within audio signals, where speech signals contain the speech portions having speaker utterances.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for detecting machine-based speech in calls, comprising: obtaining, by a computer, an inbound audio signal comprising a speech signal containing response content as an utterance of the speaker, wherein the response content in the speech signal purportedly matches to challenge content of a verification prompt; extracting, by the computer, a text embedding using a first set of features extracted for text of the challenge content, a spoken content embedding using a second set of features extracted for the speech signal, and a fakeprint using a third set one or more features extracted for one or more fraud artifacts of the speech signal; generating, by the computer, a content verification score based upon a distance between the text embedding and the spoken content embedding; executing, by the computer, a passive liveness detector to generate a passive liveness score for the inbound audio signal, the passive liveness detector having a set of layers of a machine-learning architecture trained to classify and score the input audio signal based upon the fakeprint extracted for the fraud artifacts of the inbound audio signal; generating, by the computer, a fused liveness score based upon the content verification score and the passive liveness score; and identifying, by the computer, the inbound audio signal as genuine or fraudulent based upon comparing the fused liveness score against an overall risk threshold. 2 . The method according to claim 1 , further comprising: extracting, by the computer, an inbound voiceprint for the speech signal using a fourth set of one or more features extracted for one or more acoustic features of the speech signal of the inbound audio signal; and generating, by the computer, a speaker verification score for the speech signal indicating a speaker recognition likelihood that the speaker is an enrolled user based upon a second distance between the inbound voiceprint and an enrolled voiceprint. 3 . The method according to claim 2 , wherein the computer generates the fused liveness score further using the speaker verification score. 4 . The method according to claim 1 , further comprising: generating, by the computer, one or more acoustic parameters corresponding to one or more types of degradation in the speech signal of the inbound audio signal; and generating, by the computer, a speech quality score for the speech signal based upon the one or more acoustic parameters. 5 . The method according to claim 4 , wherein generating the content verification score includes: calibrating, by the computer, the content verification score based upon the speech quality score. 6 . The method according to claim 4 , further comprising: determining, by the computer, that the speech quality score for the speech signal fails a speech quality threshold; and transmitting, by the computer, to the user device a request for an improved speech signal for the caller. 7 . The method according to claim 1 , further comprising: extracting, by the computer, an inbound audioprint using one or more features extracted from the audio signal; generating, by the computer, an audio replay score for the inbound audio signal indicating an audio recording recognition likelihood that the inbound audio signal matches a prior audio signal based upon a distance between the inbound audioprint and a stored audioprint for the prior audio signal. 8 . The method according to claim 7 , further comprising identifying, by the computer, the inbound audio signal as fraudulent, in response to determining that the audio replay score satisfies a replay detection threshold value. 9 . The method according to claim 7 , further comprising storing, by the computer, the inbound audioprint into a database as a new stored audioprint. 10 . The method according to claim 1 , further comprising generating, by the computer, a verification prompt including the challenge content for display at a user interface of the user device associated with the caller. 11 . A system for detecting machine-based speech in calls, comprising: a computer having at least one processor, configured to: obtain an inbound audio signal comprising a speech signal containing response content as an utterance of the speaker, wherein the response content in the speech signal purportedly matches to challenge content of a verification prompt; extract a text embedding using a first set of features extracted for text of the challenge content, a spoken content embedding using a second set of features extracted for the speech signal, and a fakeprint using a third set one or more features extracted for one or more fraud artifacts of the speech signal; generate a content verification score based upon a distance between the text embedding and the spoken content embedding; execute a passive liveness detector having a set of layers of a machine-learning architecture to generate a passive liveness score for the inbound audio signal, the passive liveness detector trained to classify and score the input audio signal based upon the fakeprint extracted for the fraud artifacts of the inbound audio signal; generate a fused liveness score based upon the content verification score and the passive liveness score; and identify the inbound audio signal as genuine or fraudulent based upon comparing the fused liveness score against an overall risk threshold. 12 . The system according to claim 11 , wherein the computer is further configured to: extract an inbound voiceprint for the speech signal using a fourth set of one or more features extracted for one or more acoustic features of the speech signal of the inbound audio signal; and generate a speaker verification score for the speech signal indicating a speaker recognition likelihood that the speaker is an enrolled user based upon a second distance between the inbound voiceprint and an enrolled voiceprint. 13 . The system according to claim 12 , wherein the computer generates the fused liveness score further using the speaker verification score. 14 . The system according to claim 11 , wherein the computer is further configured to: generate one or more acoustic parameters corresponding to one or more types of degradation in the speech signal of the inbound audio signal; and generate a speech quality score for the speech signal based upon the one or more acoustic parameters. 15 . The system according to claim 14 , wherein when generating the content verification score the computer is further configured to calibrate the content verification score based upon the speech quality score. 16 . The system according to claim 14 , wherein the computer is further configured to: determine that the speech quality score for the speech signal fails a speech quality threshold; and transmit to the user device a request for an improved speech signal for the caller. 17 . The system according to claim 11 , wherein the computer is further configured to: extract an inbound audioprint using one or more features extracted from the audio signal; generate an audio replay score for the inbound audio signal indicating an audio recording recognition likelihood that the inbound audio signal matches a prior audio signal based upon a distance between the inbound audioprint and a stored audioprint for the prior audio signal. 18 . The system according to claim 17 , wherein the computer is further configured to identify the inbound audio signal as fraudulent, in response to determining that the audio replay score satis

Assignees

Pindrop Security Inc

Inventors

Classifications

H04M3/42221
Conversation recording systems (at the subscriber's set H04M1/656) · CPC title
G10L25/51
for comparison or discrimination · CPC title
H04M3/5175
Call or contact centers supervision arrangements · CPC title
G10L17/18
Artificial neural networks; Connectionist approaches · CPC title
G06F21/32
using biometric data, e.g. fingerprints, iris scans or voiceprints · CPC title

Patent family

Related publications grouped by family.

View patent family 93215773

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12592239B2 cover?: Disclosed are systems and methods including software processes executed by a server that detect audio-based synthetic speech (“deepfakes”) in a call conversation. Embodiments include systems and methods for detecting fraudulent presentation attacks using multiple functional engines that implement various fraud-detection techniques, to produce calibrated scores and/or fused scores. A computer ma…
Who is the assignee on this patent?: Pindrop Security Inc
What technology area does this patent fall under?: Primary CPC classification H04M3/2281. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Mar 31 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).