Voice search device, voice search method, and non-transitory recording medium

US9431007B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9431007-B2
Application numberUS-201514597958-A
CountryUS
Kind codeB2
Filing dateJan 15, 2015
Priority dateMar 5, 2014
Publication dateAug 30, 2016
Grant dateAug 30, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In a voice search device, a processor acquires a search word, converts the search word into a phoneme sequence, acquires, for each frame, an output probability of a feature quantity of a target voice signal being output from each phoneme included in the phoneme sequence, and executes relative calculation of the output probability acquired from each phoneme, based on an output probability acquired from another phoneme included in the phoneme sequence. In addition, the processor successively designates likelihood acquisition zones, acquires a likelihood indicating how likely a designated likelihood acquisition zone is a zone in which voice corresponding to the search word is spoken, and identifies from the target voice signal an estimated zone for which the voice corresponding to the search word is estimated to be spoken, based on the acquired likelihood.

First claim

Opening claim text (preview).

What is claimed is: 1. A voice search device comprising: a processor; and a memory storing instructions that, when executed by the processor, control the processor to: convert a search word into a phoneme sequence; acquire, for each of frames in a target voice signal, a plurality of relative values between (i) a base phoneme selected for the frame from among a plurality of base phonemes each of which is selected for a respective different one of the frames in the target voice signal, and (ii) phonemes included in the phoneme sequence, wherein each of the frames has a time length; designate a plurality of zones in the target voice signal, each of the zones having a time length; acquire, using the plurality of relative values, a plurality of likelihoods each indicating how likely a respective zone from among the plurality of zones is a zone in which voice corresponding to the search word is spoken; and specify a zone corresponding to the search word from among the plurality of zones, based on the plurality of likelihoods. 2. The voice search device according to claim 1 , wherein the instructions, when executed by the processor, further control the processor to: acquire, for each frame, an output probability of a feature quantity of the target voice signal being output from each phoneme included in the phoneme sequence; and wherein, for each frame in the target voice signal, the relative values are calculated based on (i) value based on the output probability in each frame obtained from each phoneme included in the phoneme sequence and (ii) a value based on the output probability in each frame obtained from the base phoneme. 3. The voice search device according to claim 2 , wherein the instructions, when executed by the processor, further control the processor to: acquire, for each frame, an output probability of a feature quantity of the target voice signal being output from a silent phoneme, and wherein each of the selected base phonemes is a phoneme with a maximum output probability in each frame from among the phonemes included in the phoneme sequence and the silent phoneme. 4. The voice search device according to claim 1 , wherein the instructions, when executed by the processor, further control the processor to: search, based on the plurality of relative values, a correspondence between each frame in a respective one of the plurality of zones and each phoneme included in the phoneme sequence by dynamic programming; and wherein the plurality of likelihoods are acquired based on a result of the search. 5. The voice search device according to claim 4 , wherein the instructions, when executed by the processor, further control the processor to: execute a normalizing calculation, based on a number of frames corresponded with each phoneme, to each of the plurality of likelihoods, thereby calculating a normalized likelihood that normalizes each of the plurality of likelihoods; and wherein the zone is specified based on the normalized likelihood. 6. The voice search device according to claim 5 , wherein the normalized likelihood is calculated by taking the relative values, normalizing each relative value using the number of frames corresponded with each phoneme, and summing normalized values. 7. The voice search device according to claim 1 , wherein the instructions, when executed by the processor, further control the processor to: select the plurality of base phonemes, the base phonemes being selected from among the phonemes included in the phoneme sequence. 8. A voice search method comprising: converting a search word into a phoneme sequence; acquiring, for each of frames in a target voice signal, a plurality of relative values between (i) a base phoneme selected for the frame from among a plurality of base phonemes each of which is selected for a respective different one of the frames in the target voice signal, and (ii) phonemes included in the phoneme sequence, wherein each of the frames has a time length; designating a plurality of zones in the target voice signal, each of the zones having a time length; acquiring, using the plurality of relative values, a plurality of likelihoods each indicating how likely a respective zone from among the plurality of zones is a zone in which voice corresponding to the search word is spoken; and specifying a zone corresponding to the search word from among the plurality of zones, based on the plurality of likelihoods. 9. The voice search method according to claim 8 , further comprising: acquiring, for each frame, an output probability of a feature quantity of the target voice signal being output from each phoneme included in the phoneme sequence, wherein, for each frame in the target voice signal, the relative values are calculated based on (i) a value based on the output probability in each frame obtained from each phoneme included in the phoneme sequence and (ii) a value based on the output probability in each frame obtained from the base phoneme. 10. The voice search method according to claim 9 , further comprising: acquiring, for each frame, an output probability of a feature quantity of the target voice signal being output from a silent phoneme, wherein each of the selected base phonemes is a phoneme with a maximum output probability in each frame from among the phonemes included in the phoneme sequence and the silent phoneme. 11. The voice search method according to claim 8 , further comprising: searching, based on the plurality of relative values, a correspondence between each frame in a respective one of the plurality of zones and each phoneme included in the phoneme sequence by dynamic programming, wherein the plurality of likelihoods are acquired based on a result of the searching. 12. The voice search method according to claim 11 , further comprising: executing a normalizing calculation, based on a number of frames corresponded with each phoneme, to each of the plurality of likelihoods, thereby calculating a normalized likelihood that normalizes each of the plurality of likelihoods; wherein the zone is specified based on the normalized likelihood. 13. The voice search method according to claim 12 , wherein the normalized likelihood is calculated by taking the relative values, normalizing each relative value using the number of frames corresponded with each phoneme, and summing normalized values. 14. The voice search method according to claim 8 , further comprising: selecting the plurality of base phonemes, the base phonemes being selected from among the phonemes included in the phoneme sequence. 15. A non-transitory recording medium having a program recorded thereon that is executable to control a computer to: convert a search word into a phoneme sequence; acquire, for each of frames in a target voice signal, a plurality of relative values between (i) a base phoneme selected for the frame from among a plurality of base phonemes each of which is selected for a respective different one of the frames in the target voice signal, and (ii) phonemes included in the phoneme sequence, wherein each of the frames has a time length; designate a plurality of zones in the target voice signal, each of the zones having a time length; acquire, using the plurality of relative values, a plurality of likelihoods each indicating how likely a respective zone from among the plurality of zones is a zone in which voice corresponding to the search word is spoken; and specify a zone corresponding to the search word from among the plurality of zones, based on the plurality of likelihoods. 16. The non-transitory recording medium

Assignees

Inventors

Classifications

  • Phonemes, fenemes or fenones being the recognition units · CPC title

  • Query formulation · CPC title

  • Word spotting · CPC title

  • G10L25/87Primary

    Detection of discrete points within a voice signal · CPC title

  • using natural language modelling · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9431007B2 cover?
In a voice search device, a processor acquires a search word, converts the search word into a phoneme sequence, acquires, for each frame, an output probability of a feature quantity of a target voice signal being output from each phoneme included in the phoneme sequence, and executes relative calculation of the output probability acquired from each phoneme, based on an output probability acquir…
Who is the assignee on this patent?
Casio Computer Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L25/87. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 30 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).