Chronic pulmonary disease prediction from audio input based on short-winded breath determination using artificial intelligence
US-2024062902-A1 · Feb 22, 2024 · US
US9431007B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9431007-B2 |
| Application number | US-201514597958-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 15, 2015 |
| Priority date | Mar 5, 2014 |
| Publication date | Aug 30, 2016 |
| Grant date | Aug 30, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In a voice search device, a processor acquires a search word, converts the search word into a phoneme sequence, acquires, for each frame, an output probability of a feature quantity of a target voice signal being output from each phoneme included in the phoneme sequence, and executes relative calculation of the output probability acquired from each phoneme, based on an output probability acquired from another phoneme included in the phoneme sequence. In addition, the processor successively designates likelihood acquisition zones, acquires a likelihood indicating how likely a designated likelihood acquisition zone is a zone in which voice corresponding to the search word is spoken, and identifies from the target voice signal an estimated zone for which the voice corresponding to the search word is estimated to be spoken, based on the acquired likelihood.
Opening claim text (preview).
What is claimed is: 1. A voice search device comprising: a processor; and a memory storing instructions that, when executed by the processor, control the processor to: convert a search word into a phoneme sequence; acquire, for each of frames in a target voice signal, a plurality of relative values between (i) a base phoneme selected for the frame from among a plurality of base phonemes each of which is selected for a respective different one of the frames in the target voice signal, and (ii) phonemes included in the phoneme sequence, wherein each of the frames has a time length; designate a plurality of zones in the target voice signal, each of the zones having a time length; acquire, using the plurality of relative values, a plurality of likelihoods each indicating how likely a respective zone from among the plurality of zones is a zone in which voice corresponding to the search word is spoken; and specify a zone corresponding to the search word from among the plurality of zones, based on the plurality of likelihoods. 2. The voice search device according to claim 1 , wherein the instructions, when executed by the processor, further control the processor to: acquire, for each frame, an output probability of a feature quantity of the target voice signal being output from each phoneme included in the phoneme sequence; and wherein, for each frame in the target voice signal, the relative values are calculated based on (i) value based on the output probability in each frame obtained from each phoneme included in the phoneme sequence and (ii) a value based on the output probability in each frame obtained from the base phoneme. 3. The voice search device according to claim 2 , wherein the instructions, when executed by the processor, further control the processor to: acquire, for each frame, an output probability of a feature quantity of the target voice signal being output from a silent phoneme, and wherein each of the selected base phonemes is a phoneme with a maximum output probability in each frame from among the phonemes included in the phoneme sequence and the silent phoneme. 4. The voice search device according to claim 1 , wherein the instructions, when executed by the processor, further control the processor to: search, based on the plurality of relative values, a correspondence between each frame in a respective one of the plurality of zones and each phoneme included in the phoneme sequence by dynamic programming; and wherein the plurality of likelihoods are acquired based on a result of the search. 5. The voice search device according to claim 4 , wherein the instructions, when executed by the processor, further control the processor to: execute a normalizing calculation, based on a number of frames corresponded with each phoneme, to each of the plurality of likelihoods, thereby calculating a normalized likelihood that normalizes each of the plurality of likelihoods; and wherein the zone is specified based on the normalized likelihood. 6. The voice search device according to claim 5 , wherein the normalized likelihood is calculated by taking the relative values, normalizing each relative value using the number of frames corresponded with each phoneme, and summing normalized values. 7. The voice search device according to claim 1 , wherein the instructions, when executed by the processor, further control the processor to: select the plurality of base phonemes, the base phonemes being selected from among the phonemes included in the phoneme sequence. 8. A voice search method comprising: converting a search word into a phoneme sequence; acquiring, for each of frames in a target voice signal, a plurality of relative values between (i) a base phoneme selected for the frame from among a plurality of base phonemes each of which is selected for a respective different one of the frames in the target voice signal, and (ii) phonemes included in the phoneme sequence, wherein each of the frames has a time length; designating a plurality of zones in the target voice signal, each of the zones having a time length; acquiring, using the plurality of relative values, a plurality of likelihoods each indicating how likely a respective zone from among the plurality of zones is a zone in which voice corresponding to the search word is spoken; and specifying a zone corresponding to the search word from among the plurality of zones, based on the plurality of likelihoods. 9. The voice search method according to claim 8 , further comprising: acquiring, for each frame, an output probability of a feature quantity of the target voice signal being output from each phoneme included in the phoneme sequence, wherein, for each frame in the target voice signal, the relative values are calculated based on (i) a value based on the output probability in each frame obtained from each phoneme included in the phoneme sequence and (ii) a value based on the output probability in each frame obtained from the base phoneme. 10. The voice search method according to claim 9 , further comprising: acquiring, for each frame, an output probability of a feature quantity of the target voice signal being output from a silent phoneme, wherein each of the selected base phonemes is a phoneme with a maximum output probability in each frame from among the phonemes included in the phoneme sequence and the silent phoneme. 11. The voice search method according to claim 8 , further comprising: searching, based on the plurality of relative values, a correspondence between each frame in a respective one of the plurality of zones and each phoneme included in the phoneme sequence by dynamic programming, wherein the plurality of likelihoods are acquired based on a result of the searching. 12. The voice search method according to claim 11 , further comprising: executing a normalizing calculation, based on a number of frames corresponded with each phoneme, to each of the plurality of likelihoods, thereby calculating a normalized likelihood that normalizes each of the plurality of likelihoods; wherein the zone is specified based on the normalized likelihood. 13. The voice search method according to claim 12 , wherein the normalized likelihood is calculated by taking the relative values, normalizing each relative value using the number of frames corresponded with each phoneme, and summing normalized values. 14. The voice search method according to claim 8 , further comprising: selecting the plurality of base phonemes, the base phonemes being selected from among the phonemes included in the phoneme sequence. 15. A non-transitory recording medium having a program recorded thereon that is executable to control a computer to: convert a search word into a phoneme sequence; acquire, for each of frames in a target voice signal, a plurality of relative values between (i) a base phoneme selected for the frame from among a plurality of base phonemes each of which is selected for a respective different one of the frames in the target voice signal, and (ii) phonemes included in the phoneme sequence, wherein each of the frames has a time length; designate a plurality of zones in the target voice signal, each of the zones having a time length; acquire, using the plurality of relative values, a plurality of likelihoods each indicating how likely a respective zone from among the plurality of zones is a zone in which voice corresponding to the search word is spoken; and specify a zone corresponding to the search word from among the plurality of zones, based on the plurality of likelihoods. 16. The non-transitory recording medium
Phonemes, fenemes or fenones being the recognition units · CPC title
Query formulation · CPC title
Word spotting · CPC title
Detection of discrete points within a voice signal · CPC title
using natural language modelling · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.