Systems and methods for estimating age of a speaker based on speech

US10269356B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10269356-B2
Application numberUS-201615243635-A
CountryUS
Kind codeB2
Filing dateAug 22, 2016
Priority dateAug 22, 2016
Publication dateApr 23, 2019
Grant dateApr 23, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

There is provided a system comprising a microphone, configured to receive an input speech from an individual, an analog-to-digital (A/D) converter to convert the input speech to digital form and generate a digitized speech, a memory storing an executable code and an age estimation database, a hardware processor executing the executable code to receive the digitized speech, identify a plurality of boundaries in the digitized speech delineating a plurality of phonemes in the digitized speech, extract a plurality of formant-based feature vectors from each phoneme in the digitized speech based on at least one of a formant position, a formant bandwidth, and a formant dispersion, compare the plurality of formant-based feature vectors with age determinant formant-based feature vectors of the age estimation database, determine the age of the individual when the comparison finds a match in the age estimation database, and communicate an age-appropriate response to the individual.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a microphone configured to receive an input speech from an individual; an analog-to-digital (A/D) converter configured to convert the input speech from an analog form to a digital form and generate a digitized speech; a memory storing an executable code and an age estimation database including a plurality of age determinant formant-based feature vectors; a hardware processor executing the executable code to: receive the digitized speech from the A/D converter; identify a plurality of boundaries between a plurality of phonemes in the digitized speech; extract a plurality of formant-based feature vectors from one or more phonemes of the plurality of phonemes delineated by the plurality of boundaries, based on a formant position, a formant bandwidth, and a formant dispersion, wherein the formant dispersion is a geometric mean of the formant spacing; compare the plurality of formant-based feature vectors with the age determinant formant-based feature vectors of the age estimation database; estimate the age of the individual when the comparison finds a match in the age estimation database; and communicate an age-appropriate response to the individual based on the estimated age of the individual. 2. The system of claim 1 , wherein estimating the age of the individual includes a weighted combination of two or more age determinant formant-based feature vectors of the plurality of age determinant formant-based feature vectors. 3. The system of claim 1 , wherein the input speech is one of a predetermined sequence of phonemes and natural speech. 4. The system of claim 1 , wherein the age determinant formant-based feature vectors of the age estimation database include a plurality of formant-based feature vectors corresponding to a plurality of most predictive phonemes, wherein each of the plurality of most predictive phonemes corresponds to a different age. 5. The system of claim 1 , wherein the digitized speech includes at least one of a silence and a filled pause. 6. The system of claim 1 , wherein the input speech includes a plurality of formants where each formant of the plurality of formants is a resonance of a vocal tract of the individual. 7. The system of claim 1 , wherein the input speech is one of English and a language that is not English. 8. The system of claim 1 , wherein the age of the individual is estimated probabilistically. 9. A method for use with a system having a microphone, an analog-to-digital (A/D) converter, a memory storing an executable code, and a hardware processor, the method comprising: receiving, using the hardware processor, a digitized speech from the A/D converter; identifying, using the hardware processor, a plurality of boundaries between a plurality of phonemes in the digitized speech; extracting, using the hardware processor, a plurality of formant-based feature vectors from one or more phonemes of the plurality of phonemes delineated by the plurality of boundaries, based on a formant position, a formant bandwidth, and a formant dispersion, wherein the formant dispersion is a geometric mean of the formant spacing; comparing, using the hardware processor, the plurality of formant-based feature vectors with the age determinant formant-based feature vectors of the age estimation database; estimating, using the hardware processor, the age of the individual when the comparison finds a match in the age estimation database; and communicating, using the hardware processor, an age-appropriate response to the individual based on the estimated age of the individual. 10. The method of claim 9 , wherein estimating the age of the individual includes a weighted combination of two or more age determinant formant-based feature vectors of the plurality of age determinant formant-based feature vectors. 11. The method of claim 9 , wherein the input speech is one of a predetermined sequence of phonemes and natural speech. 12. The method of claim 9 , wherein the age determinant formant-based feature vectors of the age estimation database include a plurality of formant-based feature vectors corresponding to a plurality of most predictive phonemes, wherein each of the plurality of most predictive phonemes corresponds to a different age. 13. The method of claim 9 , wherein the digitized speech includes at least one of a silence and a filled pause. 14. The method of claim 9 , wherein the input speech includes a plurality of formants where each formant of the plurality of formants is a resonance of a vocal tract of the individual. 15. The method of claim 9 , wherein the input speech is one of English and a language that is not English. 16. The method of claim 9 , wherein the age of the individual is estimated probabilistically. 17. The system of claim 1 , wherein prior to the extracting of the plurality of formant-based feature vectors, the hardware processor executes the executable code to identify a segment of one or more phonemes of the plurality of phonemes delineated by the plurality of boundaries, and wherein the extracting extracts the plurality of formant-based feature vectors from the identified segment of the one or more phonemes of the plurality of phonemes delineated by the plurality of boundaries. 18. The method of claim 9 , wherein prior to the extracting of the plurality of formant-based feature vectors, the method further comprises: identifying a segment of one or more phonemes of the plurality of phonemes delineated by the plurality of boundaries; wherein the extracting extracts the plurality of formant-based feature vectors from the identified segment of the one or more phonemes of the plurality of phonemes delineated by the plurality of boundaries.

Assignees

Inventors

Classifications

  • Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title

  • using distance or distortion measures between unknown speech and reference templates · CPC title

  • G10L17/26Primary

    Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices · CPC title

  • Speech classification or search · CPC title

  • for comparison or discrimination · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10269356B2 cover?
There is provided a system comprising a microphone, configured to receive an input speech from an individual, an analog-to-digital (A/D) converter to convert the input speech to digital form and generate a digitized speech, a memory storing an executable code and an age estimation database, a hardware processor executing the executable code to receive the digitized speech, identify a plurality …
Who is the assignee on this patent?
Disney Entpr Inc
What technology area does this patent fall under?
Primary CPC classification G10L17/26. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 23 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).