Speech recognition system, acoustic processing method, and non-temporary computer-readable medium

US12482459B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12482459-B2
Application numberUS-202217897352-A
CountryUS
Kind codeB2
Filing dateAug 29, 2022
Priority dateAug 29, 2022
Publication dateNov 25, 2025
Grant dateNov 25, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The speech recognition that is disclosed analyzes an acoustic feature for each subframe of an audio signal; provides a first model configured to determine a hidden state for each frame consisting of multiple subframes on the basis of the acoustic feature; provides a second model configured to determine a hidden state for each frame consisting of multiple subframes on the basis of the acoustic feature; and provides a third model configured to determine an utterance content on the basis of a sequence of the hidden states of each block consisting of multiple frames belonging to a voice segment.

First claim

Opening claim text (preview).

What is claimed is: 1 . A speech recognition system comprising: a processor and a memory, the processor coupled to the memory, the processor is configured to: input an audio signal; calculate an acoustic feature for each subframe of the audio signal; calculate, by using a first model, a hidden state series for each frame consisting of multiple subframes on the basis of the acoustic feature; specify, by using a second model, whether a voice segment or a non-voice segment for each block on the basis of the hidden state series, the block consisting of a plurality of frames; calculate, by using a third model, a probability for an utterance content candidate on the basis of a sequence of the hidden state provided series for each block having a single voice segment to specify an utterance content; and train the third model to calculate the probability for the utterance content candidate on the basis of hidden state series; wherein the processor is configured to: specify a first frame subsequent to the non-voice segment as a beginning of the voice segment, specify a second frame prior to a succeeding non-voice segment as an end of the voice segment, adjust block arrangement of the audio signal, by concatenating one or more frames up to the end of the voice segment in a first block with the end of the voice segment to a second block proceeding to the first block, concatenating one or more frames from the beginning of the voice segment in a third block with the beginning of the voice segment to a fourth block subsequent to the third block, search for recognition results indicating an utterance content for each block arrangement of the audio signal based on the probability for the utterance content candidate calculated by using the third model, and output the recognition results to an external device. 2 . The speech recognition system according to claim 1 , wherein the processor is configure to divide, by using the second model, block comprising two or more voice segments into two or more blocks, the two or more blocks containing respective voice segments. 3 . The speech recognition system according to claim 2 , wherein the processor is configured to calculate for each frame a probability that the frame belongs to a voice segment on the basis of the hidden state, the probability being as a voice segment probability; specify, a segment having consecutive inactive frames in which the number of inactive frames is more than a predetermined threshold frame number as the non-voice segment, each of the inactive frames having voice segment probability equals to or less than a predetermined probability threshold; and specify a segment having the consecutive inactive frames as voice segments. 4 . The speech recognition system according to claim 1 , wherein the first model comprises a first-stage model and a second-stage model, the first-stage model being designed for converting an acoustic feature for each subframe to a frame feature for each frame, and the second-stage model being designed for specifying the hidden state series on the basis of the frame feature. 5 . The speech recognition system according to claim 1 , wherein the third model is designed for calculating an estimated probability for each candidate of the utterance content corresponding to a hidden state series up to the latest block forming a voice segment, and specifying the utterance content with the highest estimated probability. 6 . A non-transitory computer-readable medium storing instructions at a speech recognition system, the instructions executed by a processor cause the speech recognition system to: input an audio signal; calculate an acoustic feature for each subframe of the audio signal; calculate, by using a first model, a hidden state series for each frame consisting of multiple subframes on the basis of the acoustic feature; specify, by using a second model, whether a voice segment or a non-voice segment for each block on the basis of the hidden state series, the block consisting of a plurality of frames; calculate, by using a third model, a probability for an utterance content candidate on the basis of a sequence of the hidden state series provided for each block having a single voice segment to specify an utterance content; and train the third model to calculate the probability for the utterance content candidate on the basis of hidden state series; wherein the instructions cause the speech recognition system to: specify a first frame subsequent to the non-voice segment as a beginning of the voice segment, specify a second frame prior to a succeeding non-voice segment as an end of the voice segment, adjust block arrangement of the audio signal, by concatenating one or more frames up to the end of the voice segment in a first block with the end of the voice segment to a second block preceding to the first block, concatenating one or more frames from the beginning of the voice segment in a third block with the beginning of the voice segment to a fourth block subsequent to the third block, search for recognition results indicating an utterance content for each block arrangement of the audio signal based on the probability for the utterance content candidate calculated by using the third model, and output the recognition results to an external device. 7 . A method for speech recognition, comprising the steps of: inputting an audio signal; calculating an acoustic feature for each subframe of the audio signal; calculating, by using a first model, a hidden state series for each frame consisting of multiple subframes on the basis of the acoustic feature; specifying, by using a second model, whether a voice segment or a non-voice segment for each block on the basis of the hidden state series, the block consisting of a plurality of frames; calculating, by using a third model, a probability for an utterance content candidate on the basis of a sequence of the hidden state series provided for each block having a single voice segment to specify an utterance content; and training the third model to calculate the probability for the utterance content candidate on the basis of hidden state series; further comprising the steps of: specifying a first frame subsequent to the non-voice segment as a beginning of the voice segment, specifying a second frame prior to a succeeding non-voice segment as an end if the voice segment, adjusting block arrangement of the audio signal, by concatenating one or more frames up to the end of the voice segment in a first block with the end of the voice segment to a second block preceding to the first block, concatenating one or more frames from the beginning of the voice segment in a third block with the beginning of the voice segment to a fourth block subsequent to the third block, searching for recognition results indicating an utterance content for each of arranged blocks of the audio signal based on the probability calculated by using the third model, and outputting the recognition results to an external device.

Assignees

Inventors

Classifications

  • G10L15/04Primary

    Segmentation; Word boundary detection · CPC title

  • Feature extraction for speech recognition; Selection of recognition unit · CPC title

  • Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title

  • using artificial neural networks · CPC title

  • G10L15/197Primary

    Probabilistic grammars, e.g. word n-grams · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12482459B2 cover?
The speech recognition that is disclosed analyzes an acoustic feature for each subframe of an audio signal; provides a first model configured to determine a hidden state for each frame consisting of multiple subframes on the basis of the acoustic feature; provides a second model configured to determine a hidden state for each frame consisting of multiple subframes on the basis of the acoustic f…
Who is the assignee on this patent?
Honda Motor Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 25 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).