System and Method for Performing Caller Identity Verification Using Multi-Step Voice Analysis
US-2018130473-A1 · May 10, 2018 · US
US10529328B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10529328-B2 |
| Application number | US-201615739085-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 22, 2016 |
| Priority date | Jun 22, 2015 |
| Publication date | Jan 7, 2020 |
| Grant date | Jan 7, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
This document describes a data processing system for processing a speech signal for voice-based profiling. The data processing system segments the speech signal into a plurality of segments, with each segment representing a portion of the speech signal. For each segment, the data processing system generates a feature vector comprising data indicative of one or more features of the portion of the speech signal represented by that segment and determines whether the feature vector comprises data indicative of one or more features with a threshold amount of confidence. For each of a subset of the generated feature vectors, the system processes data in that feature vector to generate a prediction of a value of a profile parameter and transmits an output responsive to machine executable code that generates a visual representation of the prediction of the value of the profile parameter.
Opening claim text (preview).
What is claimed is: 1. A data processing system for processing a speech signal, the data processing system comprising: a detection system front end server that routes data representing a speech signal to one or more processing devices that generate a response to the speech signal; and a segmentation server that includes the one or more processing devices and that processes the speech signal to segment the data into a plurality of segments, with each segment representing a portion of the speech signal, and with the segmentation server further performing operations comprising: for each segment, generating a feature vector comprising data indicative of one or more features of the portion of the speech signal represented by that segment; determining confidence values for the one or more features of the feature vector; and comparing the confidence values for the one or more features to respective threshold values; for each of a subset of the generated feature vectors, processing data in that feature vector to generate a prediction of a value of a profile parameter, with the subset comprising one or more feature vectors determined to have respective features that have confidence values exceeding the respective threshold values; wherein the segmentation server generates and transmits an output responsive to machine executable code that generates a visual representation of the prediction of the value of the profile parameter, and wherein the detection system front end server uses the output responsive to the machine executable code to remotely update a display of a client device that submitted a request to present the visual representation of the prediction of the value of the profile parameter; and wherein the segmentation server executes a first selected prediction algorithm on detection data processed in accordance with a second selected prediction algorithm, wherein the detection data represents two or more features having a predetermined correlation or a predetermined dependency between the two or more features, the two or more features each having respective confidence values that exceed the respective threshold values. 2. The data processing system of claim 1 , wherein the output comprises the value of the profile parameter and forensic profile data through one or more application interfaces. 3. The data processing system of claim 1 , wherein processing data in the feature vector to generate a prediction comprises selecting a predictor algorithm, based on the data indicative of the one or more features with the confidence value that exceeds the respective threshold, for processing the data in each of the subset of the generated feature vectors. 4. The data processing system of claim 1 , wherein the profile parameter comprises one or more of a bio-relevant parameter, a socio-personal parameter, and an environmental parameter. 5. The data processing system of claim 4 , wherein the bio-relevant parameter comprises one of a physical parameter, a physiological parameter, a medical parameter, or a psychological parameter. 6. The data processing system of claim 4 , wherein the socio-personal parameter comprises one of a behavioral parameter, a demographic parameter, or a sociological parameter. 7. The data processing system of claim 1 , wherein one or more features comprise one or more micro-properties of the speech signal, the micro-properties comprising one or more of formants, pitch, hamonicity, jitter, shimmer, formant bandwidths, harmonic bandwidths, voicing onset and offset times, glottal pulse shape, pitch onset pattern, aphonicity, biphonicity, flutter, wobble, breathiness, and resonance. 8. The data processing system of claim 1 , wherein the one or more features comprise a spectral feature characterizing time-frequency characteristics of the signal, the time-frequency characteristics comprising one or more of short-time Fourier transforms, segmental cepstral features and power-normalized cepstra. 9. The data processing system of claim 1 , wherein the one or more features comprise a trend feature, the trend feature comprising a modulation feature, long-term formant statistics, and a formant trajectory feature. 10. The data processing system of claim 1 , wherein the one or more features comprise one or more of phonetic and linguistic features, the phonetic and linguistic features comprising phoneme durations and timing patterns. 11. The data processing system of claim 1 , the segmentation server further performing operations comprising: generating, based on data of a feature vector of the subset, a category for the data segment associated with that feature vector; and assigning the category to a forensic profile. 12. The data processing system of claim 1 , the segmentation server further performing operations comprising: comparing the speech signal to an additional speech signal by comparing one or more feature vectors of the subset of the generated feature vectors to one or more feature vectors of an additional subset of generated feature vectors of the additional speech signal, the additional subset comprising one or more additional feature vectors determined to have one or more features having respective confidence values that exceed the respective threshold values. 13. The data processing system of claim 1 , wherein generating the prediction of the value comprises executing a machine learning algorithm to determine a strength of an association between the feature vector and the profile parameter. 14. The data processing system of claim 1 , the segmentation server further performing operations comprising determining which of the one or more features having respective confidence values that exceed the respective threshold values in the feature vector represents a masking-invariant pattern in a segment. 15. The data processing system of claim 1 , the segmentation server further performing operations comprising recovering data in a segment by modifying the segment. 16. The data processing system of claim 1 , wherein the value of the profile parameter is determined in real-time or near real-time based on execution of a predictive algorithm. 17. The data processing system of claim 1 , the segmentation server further performing operations comprising identifying a source based on the value of the profile parameter. 18. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising: processing a speech signal to segment the speech signal into a plurality of segments, with each segment of the plurality of segments representing a portion of the speech signal; for each segment, generating a feature vector comprising data indicative of one or more features of the portion of the speech signal represented by that segment; determining confidence values for the one or more features of the feature vector; and comparing the confidence values for the one or more features to respective threshold values; for each of a subset of the generated feature vectors, processing data in that feature vector to generate a prediction of a value of a profile parameter, with the subset comprising one or more feature vectors determined to have respective features that have confidence values exceeding the respective threshold values; and generating an output responsive to machine executable code as a program-returned response including the prediction of the value of the profile parameter; and executing a first selected prediction
for estimating an emotional state · CPC title
Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title
Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices · CPC title
for comparison or discrimination · CPC title
for extracting parameters related to health condition (detecting or measuring for diagnostic purposes A61B5/00) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.