What technology area does this patent fall under?

Primary CPC classification G10L25/51. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 27 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Processing speech signals in voice-based profiling

US11538472B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11538472-B2
Application number	US-201916700712-A
Country	US
Kind code	B2
Filing date	Dec 2, 2019
Priority date	Jun 22, 2015
Publication date	Dec 27, 2022
Grant date	Dec 27, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This document describes a data processing system for processing a speech signal for voice-based profiling. The data processing system segments the speech signal into a plurality of segments, with each segment representing a portion of the speech signal. For each segment, the data processing system generates a feature vector comprising data indicative of one or more features of the portion of the speech signal represented by that segment and determines whether the feature vector comprises data indicative of one or more features with a threshold amount of confidence. For each of a subset of the generated feature vectors, the system processes data in that feature vector to generate a prediction of a value of a profile parameter and transmits an output responsive to machine executable code that generates a visual representation of the prediction of the value of the profile parameter.

First claim

Opening claim text (preview).

What is claimed is: 1. A data processing system for processing a speech signal, the data processing system comprising: an interface configured to receive a speech signal; and at least one processor configured to execute a predictor algorithm of a predictor module, the predictor module comprising logic for processing the speech signal received from the interface, wherein at least one processor is configured to perform operations comprising: measuring at least one signal characteristic of the speech signal to generate feature data, the at least one signal characteristic comprising a signal frequency, a signal spectrum, or a combination of the signal frequency and the signal spectrum; selecting a predictor module for analyzing the feature data based on a confidence value associated with the feature data, the predictor module comprising one or more predictor algorithms being trained to process the feature data differently than predictor algorithms of one or more other available predictor modules, the predictor module being configured, based on data derived from statistical ensembles, for processing features represented in the feature data associated with the confidence value; executing a predictor algorithm of the predictor module, the predictor algorithm receiving the feature data as input data, the predictor algorithm configured to generate a prediction value for a profile parameter that describes a speaker represented in the speech signal; and based on the prediction value for the profile parameter, generating a forensic profile of the speaker that includes the profile parameter, the forensic profile configured for providing a representation of the speaker based on profile parameters included in the forensic profile. 2. The data processing system of claim 1 , wherein the predictor module is one of a plurality of predictor modules, and wherein each of the predictor modules comprises at least one unique predictor algorithm configured to assign a prediction value to one or more profile parameters uniquely associated with that predictor module. 3. The data processing system of claim 2 , wherein each unique predictor algorithm is trained using training data that is different to other training data for training other unique predictor algorithms, and wherein the training data for the unique predictor algorithm corresponds to the one or more profile parameters uniquely associated with the predictor module that comprises that predictor algorithm. 4. The data processing system of claim 2 , wherein at least two of the plurality of predictor modules are configured to exchange data with one another to generate correlated outputs. 5. The data processing system of claim 2 , wherein at least two profile parameters are generated in parallel. 6. The data processing system of claim 1 , wherein the predictor module comprises at least one machine learning sub-module, the at least one machine learning sub-module including a probability-based algorithm, a regression-based algorithm, a knowledge-based algorithm, or any combination thereof. 7. The data processing system of claim 1 , wherein the feature data comprises at least one of Mel-frequency, cepstral coefficients, power-normalized cepstral coefficients, modulation features, glottal features, or a combination thereof. 8. The data processing system of claim 1 , wherein the at least one signal characteristic comprises one of a phoneme enunciation signal signature, a speech cadence signal signature, a fundamental frequency, a voice onset time, long-term average spectra, a format frequency, a format trajectory, long-term format distributions (LTF), format frequency dispersion, a vowel format frequency, a high-range spectral energy, an output-cost ratio, spectra of nasal phonemes, prosody, vocal range, signal to noise ratio (SNR), temporal resolution, and a resonance level. 9. The data processing system of claim 1 , wherein the profile parameter of the speaker comprises a physical parameter representing a height of the speaker, a weight of the speaker, a body-shape of the speaker, or a facial structure of the speaker. 10. The data processing system of claim 1 , wherein the profile parameter of the speaker includes a physiological parameter of the speaker representing a presence or absence of medications being taken by the speaker. 11. The data processing system of claim 1 , wherein the profile parameter of the speaker includes a medical parameter of the speaker representing a presence or absence of a disease in the speaker, a state of physical health of the speaker, a state of metal health of the speaker, presence of an intoxicating substance in the speaker, or the presence of a disability of the speaker. 12. The data processing system of claim 1 , wherein the profile parameter of the speaker includes a socio-personal parameter representing a behavioral aggression of the speaker, a level of education of the speaker, a race of the speaker, a geographical origin of the speaker, or an income of the speaker. 13. The data processing system of claim 1 , wherein the profile parameter of the speaker includes an environmental parameter of the speaker including a location of the speaker when the speech signal is recorded or a presence of an object in the environment of the speaker. 14. The data processing system of claim 1 , wherein executing the predictor algorithm comprises: selecting, based on the feature data, a first prediction algorithm; executing, the first prediction algorithm on additional feature data processed in accordance with a second prediction algorithm, wherein the additional feature data represents two or more features having a predetermined correlation or a predetermined dependency between the two or more features; and generating, based on executing the first prediction algorithm, the prediction value for the profile parameter that describes the speaker represented in the speech signal. 15. The data processing system of claim 1 , wherein the prediction value generated by the predictor algorithm is included in an input to second predictor algorithm, and wherein the predictor algorithm is pipelined with the second predictor algorithm. 16. The data processing system of claim 1 , wherein the operations further comprise: segmenting the speech signal into a plurality of segments; for each segment, generating a feature vector comprising data indicative of one or more features of the portion of the speech signal represented by that segment; determining confidence values for the one or more features of the feature vector; and comparing the confidence values for the one or more features to respective threshold values; and executing the predictor algorithm for segments comprising one or more features having confidence values satisfying the respective threshold values.

Assignees

Univ Carnegie Mellon

Inventors

Singh Rita

Classifications

G10L15/02
Feature extraction for speech recognition; Selection of recognition unit · CPC title
G10L25/66
for extracting parameters related to health condition (detecting or measuring for diagnostic purposes A61B5/00) · CPC title
G10L15/30
Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title
G10L15/04
Segmentation; Word boundary detection · CPC title
G10L25/51Primary
for comparison or discrimination · CPC title

Patent family

Related publications grouped by family.

View patent family 57586621

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11538472B2 cover?: This document describes a data processing system for processing a speech signal for voice-based profiling. The data processing system segments the speech signal into a plurality of segments, with each segment representing a portion of the speech signal. For each segment, the data processing system generates a feature vector comprising data indicative of one or more features of the portion of th…
Who is the assignee on this patent?: Univ Carnegie Mellon
What technology area does this patent fall under?: Primary CPC classification G10L25/51. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 27 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).