System and method for anomaly detection and extraction
US-9786275-B2 · Oct 10, 2017 · US
US11495244B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11495244-B2 |
| Application number | US-201916375785-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 4, 2019 |
| Priority date | Apr 4, 2018 |
| Publication date | Nov 8, 2022 |
| Grant date | Nov 8, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer may train a single-class machine learning using normal speech recordings. The machine learning model or any other model may estimate the normal range of parameters of a physical speech production model based on the normal speech recordings. For example, the computer may use a source-filter model of speech production, where voiced speech is represented by a pulse train and unvoiced speech by a random noise and a combination of the pulse train and the random noise is passed through an auto-regressive filter that emulates the human vocal tract. The computer leverages the fact that intentional modification of human voice introduces errors to source-filter model or any other physical model of speech production. The computer may identify anomalies in the physical model to generate a voice modification score for an audio signal. The voice modification score may indicate a degree of abnormality of human voice in the audio signal.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: segmenting, by the computer, an audio signal from an incoming phone call into a plurality of audio frames; extracting, by the computer, a pitch parameter, a set of formant parameters, and a residual parameter for an audio frame of the plurality of audio frames based on a source-filter model; generating, by the computer, a pitch parameter statistic based upon the pitch parameter of the audio frame and respective pitch parameters of other audio frames of the plurality of audio frames; generating, by the computer, formant parameters statistics based upon the set of formant parameters for the audio frame and respective sets of formant parameters of other audio frames of the plurality of audio frames; generating, by the computer, a residual parameter statistic based upon the residual parameter of the audio frame and respective residual parameters of the other audio frames of the plurality of audio frames; calculating, by the computer executing a machine-learning model on one or more parameter statistics, a voice modification score for the audio signal based upon comparing the pitch parameter statistic with a normal human speech pitch parameter statistic, comparing the formant parameters statistics with corresponding normal human speech formant parameter statistics, and comparing the residual parameter statistic with a normal human speech residual parameter statistic, the voice modification score indicating probability of the audio signal containing a modified human speech; and determining, by the computer, whether the incoming phone call is fraudulent based upon the voice modification score. 2. The computer-implemented method of claim 1 , wherein the pitch parameter statistic includes at least one of an average pitch value and pitch consistency. 3. The computer-implemented method of claim 1 , wherein the format parameters statistics include at least one of average formant values and inter-formant consistency. 4. The computer-implemented method of claim 1 , wherein the residual parameter includes residual kurtosis and the residual parameter statistic includes residual kurtosis consistency. 5. The computer-implemented method of claim 1 , wherein the residual parameter indicates at least one of glottal closure instances, glottal opening instances, and a model of glottal activity. 6. The computer-implemented method of claim 1 , further comprising: extracting, by the computer, a linear predictive coding (LPC) model order parameter from the audio frame; generating, by the computer, an LPC model order statistic for the audio signal based upon the LPC model order parameter of the audio frame and respective model order parameters of the other audio frames; and calculating, by the computer, the voice modification score for the audio signal based upon comparing the LPC model order statistic with a normal human speech LPC model order statistic. 7. The computer-implemented method of claim 6 , wherein the LPC model order statistic includes model order consistency. 8. The computer-implemented method of claim 1 , wherein the machine learning model is a single-class model trained on normal speech recordings. 9. A system comprising: a non-transitory storage medium storing a plurality of computer program instructions; a processor electrically coupled to the non-transitory storage medium and configured to execute the plurality of computer program instructions to: segment an audio signal from an incoming phone call into a plurality of audio frames; extract a pitch parameter, a set of formant parameters, and a residual parameter for an audio frame of the plurality of audio frames based on a source-filter model; generate a pitch parameter statistic based upon the pitch parameter of the audio frame and respective pitch parameters of other audio frames of the plurality of audio frames; generate formant parameters statistics based upon the set of formant parameters for the audio frame and respective sets of formant parameters of other audio frames of the plurality of audio frames; generate a residual parameter statistic based upon the residual parameter of the audio frame and respective residual parameters of the other audio frames of the plurality of the audio frames; calculate, by executing a machine-learning model on one or more parameter statistics, a voice modification score for the audio signal based upon comparing the pitch parameter statistic with a normal human speech pitch parameter statistic, comparing the formant parameters statistics with corresponding normal human speech formant parameter statistics, and comparing the residual parameter statistic with a normal human speech residual parameter statistic, the voice modification score indicating probability of the audio signal containing a modified human speech; and determine whether the incoming phone call is fraudulent based upon the voice modification score. 10. The system of claim 9 , wherein the pitch parameter statistic includes at least one of an average pitch value and pitch consistency. 11. The system of claim 9 , wherein the format parameters statistics include at least one of average formant values and inter-formant consistency. 12. The system of claim 9 , wherein the residual parameter includes residual kurtosis and the residual parameter statistic includes residual kurtosis consistency. 13. The system of claim 9 , wherein the residual parameter indicates at least one of glottal closure instances, glottal opening instances, and a model of glottal activity. 14. The system of claim 9 , wherein the processor is configured to further execute the computer program instructions to: extract a linear predictive coding (LPC) model order parameter from the audio frame of the plurality of audio frames; generate an LPC model order statistic for the audio signal based upon the LPC model order parameter of the audio frame and respective model order parameters of the other audio frames; and calculate the voice modification score for the audio signal based upon comparing the LPC model order statistic with a normal human speech LPC model order statistic. 15. The system of claim 14 , wherein the LPC model order statistic includes model order consistency. 16. The system of claim 9 , wherein the machine learning model is a single-class model trained on normal speech recordings. 17. A computer-implemented method comprising: extracting, by a computer, frame level parameters from an audio signal of an incoming phone call based upon a physical model of human speech; generating, by the computer, parameter statistics for the audio signal from the frame level parameters; executing, by the computer, a single-class machine learning model trained on normal human speech recordings on the parameter statistics to generate a voice modification score based upon comparisons of the parameter statistics; and determining, by the computer, whether the incoming phone call is fraudulent based upon the voice modification score. 18. The computer-implemented method of claim 17 , wherein the physical model is a source-filter model.
for comparison or discrimination · CPC title
Fraud preventions · CPC title
Pitch determination of speech signals · CPC title
Training · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.