Voice modification detection using physical models of speech production

US11495244B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11495244-B2
Application numberUS-201916375785-A
CountryUS
Kind codeB2
Filing dateApr 4, 2019
Priority dateApr 4, 2018
Publication dateNov 8, 2022
Grant dateNov 8, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer may train a single-class machine learning using normal speech recordings. The machine learning model or any other model may estimate the normal range of parameters of a physical speech production model based on the normal speech recordings. For example, the computer may use a source-filter model of speech production, where voiced speech is represented by a pulse train and unvoiced speech by a random noise and a combination of the pulse train and the random noise is passed through an auto-regressive filter that emulates the human vocal tract. The computer leverages the fact that intentional modification of human voice introduces errors to source-filter model or any other physical model of speech production. The computer may identify anomalies in the physical model to generate a voice modification score for an audio signal. The voice modification score may indicate a degree of abnormality of human voice in the audio signal.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: segmenting, by the computer, an audio signal from an incoming phone call into a plurality of audio frames; extracting, by the computer, a pitch parameter, a set of formant parameters, and a residual parameter for an audio frame of the plurality of audio frames based on a source-filter model; generating, by the computer, a pitch parameter statistic based upon the pitch parameter of the audio frame and respective pitch parameters of other audio frames of the plurality of audio frames; generating, by the computer, formant parameters statistics based upon the set of formant parameters for the audio frame and respective sets of formant parameters of other audio frames of the plurality of audio frames; generating, by the computer, a residual parameter statistic based upon the residual parameter of the audio frame and respective residual parameters of the other audio frames of the plurality of audio frames; calculating, by the computer executing a machine-learning model on one or more parameter statistics, a voice modification score for the audio signal based upon comparing the pitch parameter statistic with a normal human speech pitch parameter statistic, comparing the formant parameters statistics with corresponding normal human speech formant parameter statistics, and comparing the residual parameter statistic with a normal human speech residual parameter statistic, the voice modification score indicating probability of the audio signal containing a modified human speech; and determining, by the computer, whether the incoming phone call is fraudulent based upon the voice modification score. 2. The computer-implemented method of claim 1 , wherein the pitch parameter statistic includes at least one of an average pitch value and pitch consistency. 3. The computer-implemented method of claim 1 , wherein the format parameters statistics include at least one of average formant values and inter-formant consistency. 4. The computer-implemented method of claim 1 , wherein the residual parameter includes residual kurtosis and the residual parameter statistic includes residual kurtosis consistency. 5. The computer-implemented method of claim 1 , wherein the residual parameter indicates at least one of glottal closure instances, glottal opening instances, and a model of glottal activity. 6. The computer-implemented method of claim 1 , further comprising: extracting, by the computer, a linear predictive coding (LPC) model order parameter from the audio frame; generating, by the computer, an LPC model order statistic for the audio signal based upon the LPC model order parameter of the audio frame and respective model order parameters of the other audio frames; and calculating, by the computer, the voice modification score for the audio signal based upon comparing the LPC model order statistic with a normal human speech LPC model order statistic. 7. The computer-implemented method of claim 6 , wherein the LPC model order statistic includes model order consistency. 8. The computer-implemented method of claim 1 , wherein the machine learning model is a single-class model trained on normal speech recordings. 9. A system comprising: a non-transitory storage medium storing a plurality of computer program instructions; a processor electrically coupled to the non-transitory storage medium and configured to execute the plurality of computer program instructions to: segment an audio signal from an incoming phone call into a plurality of audio frames; extract a pitch parameter, a set of formant parameters, and a residual parameter for an audio frame of the plurality of audio frames based on a source-filter model; generate a pitch parameter statistic based upon the pitch parameter of the audio frame and respective pitch parameters of other audio frames of the plurality of audio frames; generate formant parameters statistics based upon the set of formant parameters for the audio frame and respective sets of formant parameters of other audio frames of the plurality of audio frames; generate a residual parameter statistic based upon the residual parameter of the audio frame and respective residual parameters of the other audio frames of the plurality of the audio frames; calculate, by executing a machine-learning model on one or more parameter statistics, a voice modification score for the audio signal based upon comparing the pitch parameter statistic with a normal human speech pitch parameter statistic, comparing the formant parameters statistics with corresponding normal human speech formant parameter statistics, and comparing the residual parameter statistic with a normal human speech residual parameter statistic, the voice modification score indicating probability of the audio signal containing a modified human speech; and determine whether the incoming phone call is fraudulent based upon the voice modification score. 10. The system of claim 9 , wherein the pitch parameter statistic includes at least one of an average pitch value and pitch consistency. 11. The system of claim 9 , wherein the format parameters statistics include at least one of average formant values and inter-formant consistency. 12. The system of claim 9 , wherein the residual parameter includes residual kurtosis and the residual parameter statistic includes residual kurtosis consistency. 13. The system of claim 9 , wherein the residual parameter indicates at least one of glottal closure instances, glottal opening instances, and a model of glottal activity. 14. The system of claim 9 , wherein the processor is configured to further execute the computer program instructions to: extract a linear predictive coding (LPC) model order parameter from the audio frame of the plurality of audio frames; generate an LPC model order statistic for the audio signal based upon the LPC model order parameter of the audio frame and respective model order parameters of the other audio frames; and calculate the voice modification score for the audio signal based upon comparing the LPC model order statistic with a normal human speech LPC model order statistic. 15. The system of claim 14 , wherein the LPC model order statistic includes model order consistency. 16. The system of claim 9 , wherein the machine learning model is a single-class model trained on normal speech recordings. 17. A computer-implemented method comprising: extracting, by a computer, frame level parameters from an audio signal of an incoming phone call based upon a physical model of human speech; generating, by the computer, parameter statistics for the audio signal from the frame level parameters; executing, by the computer, a single-class machine learning model trained on normal human speech recordings on the parameter statistics to generate a voice modification score based upon comparisons of the parameter statistics; and determining, by the computer, whether the incoming phone call is fraudulent based upon the voice modification score. 18. The computer-implemented method of claim 17 , wherein the physical model is a source-filter model.

Assignees

Inventors

Classifications

  • G10L25/51Primary

    for comparison or discrimination · CPC title

  • Fraud preventions · CPC title

  • Pitch determination of speech signals · CPC title

  • Training · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11495244B2 cover?
A computer may train a single-class machine learning using normal speech recordings. The machine learning model or any other model may estimate the normal range of parameters of a physical speech production model based on the normal speech recordings. For example, the computer may use a source-filter model of speech production, where voiced speech is represented by a pulse train and unvoiced sp…
Who is the assignee on this patent?
Pindrop Security Inc
What technology area does this patent fall under?
Primary CPC classification G10L25/51. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 08 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).