What technology area does this patent fall under?

Primary CPC classification G10L15/26. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Speech recognition device, speech recognition method, and non-transitory computer-readable medium

US12579983B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12579983-B2
Application number	US-202318453338-A
Country	US
Kind code	B2
Filing date	Aug 22, 2023
Priority date	Sep 20, 2022
Publication date	Mar 17, 2026
Grant date	Mar 17, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A speech recognition device includes: an acquisition part, acquiring a speech signal; a speech feature amount calculation part, calculating a speech feature amount; a first speech recognition part, based on the speech feature amount, performing speech recognition using a learned first E2E model, attaching a first tag to a vocabulary portion of a specific class in text that is a recognition result, and outputting the same; a second speech recognition part, based on the speech feature amount, performing speech recognition using a learned second E2E model, attaching a second tag to a vocabulary portion of a specific class in a phoneme that is a recognition result, and outputting the same; a phoneme replacement part, replacing a vocabulary with the first tag with a phoneme with the second tag; and an output part, converting the phoneme with the second tag into text and outputting the same.

First claim

Opening claim text (preview).

What is claimed is: 1 . A speech recognition device, comprising: an acquisition part, acquiring a speech signal; a speech feature amount calculation part, calculating a speech feature amount of the acquired speech signal; a first speech recognition part, based on the speech feature amount, performing speech recognition using a first end-to-end model that has been learned, attaching a first tag to a vocabulary portion of a specific class in a text that is a recognition result, and outputting the vocabulary portion with the first tag; a second speech recognition part, based on the speech feature amount, performing speech recognition using a second end-to-end model that has been learned, attaching a second tag to a vocabulary portion of a specific class in a phoneme that is a recognition result, and outputting the phoneme with the second tag; a phoneme replacement part, replacing the vocabulary portion with the first tag in the text recognized by the first speech recognition part with the phoneme with the second tag; and an output part, converting the phoneme with the second tag obtained by replacement by the phoneme replacement part into a text and outputting the converted text, wherein, in response to a text with a highest similarity in a language model in which text and phonemes are associated has the similarity greater than a threshold, the output part converts and outputs the phoneme with the second tag obtained by replacement by the phoneme replacement part; and in response to the text with highest similarity in a language model stored in a language model storage part has the similarity equal to or less than a threshold, the output part outputs the phoneme with the second tag obtained by replacement by the phoneme replacement part as the text with the first tag recognized by the first speech recognition part as it is. 2 . The speech recognition device according to claim 1 , wherein the output part converts the phoneme with the second tag obtained by replacement by the phoneme replacement part into a text with a highest similarity in a language model in which text and phonemes are associated. 3 . The speech recognition device according to claim 1 , wherein the first end-to-end model is learned using a speech signal and text data for each utterance unit; and the second end-to-end model is learned using a speech signal and phoneme data for each utterance unit. 4 . The speech recognition device according to claim 2 , wherein the first end-to-end model is learned using a speech signal and text data for each utterance unit; and the second end-to-end model is learned using a speech signal and phoneme data for each utterance unit. 5 . The speech recognition device according to claim 1 , wherein, in response to there being a plurality of vocabulary portions of the specific class with the first tag in the text outputted by the first speech recognition part, the phoneme replacement part replaces a first vocabulary portion of the specific class with the first tag with the phoneme with the second tag. 6 . The speech recognition device according to claim 2 , wherein, in response to there being a plurality of vocabulary portions of the specific class with the first tag in the text outputted by the first speech recognition part, the phoneme replacement part replaces the first vocabulary portion of the specific class with the first tag with a phoneme with the second tag. 7 . The speech recognition device according to claim 1 , wherein the vocabulary portion of the specific class is at least one proper noun of a person's name, a department name, a product name, a model name, a part name, and a place name. 8 . The speech recognition device according to claim 2 , wherein the vocabulary portion of the specific class is at least one proper noun of a person's name, a department name, a product name, a model name, a part name, and a place name. 9 . A speech recognition method, comprising: by an acquisition part, acquiring a speech signal; by a speech feature amount calculation part, calculating a speech feature amount of the acquired speech signal; by a first speech recognition part, based on the speech feature amount, performing speech recognition using a first end-to-end model that has been learned, attaching a first tag to a vocabulary portion of a specific class in a text that is a recognition result, and outputting the vocabulary portion with the first tag; by a second speech recognition part, based on the speech feature amount, performing speech recognition using a second end-to-end model that has been learned, attaching a second tag to a vocabulary portion of a specific class in a phoneme that is a recognition result, and outputting the phoneme with the second tag; by a phoneme replacement part, replacing the vocabulary portion with the first tag in the text recognized by the first speech recognition part with a phoneme with the second tag; and by an output part, converting the phoneme with the second tag obtained by replacement by the phoneme replacement part into a text and outputting the converted text, wherein, in response to a text with a highest similarity in a language model in which text and phonemes are associated has the similarity greater than a threshold, the output part converts and outputs the phoneme with the second tag obtained by replacement by the phoneme replacement part; and in response to the text with highest similarity in a language model stored in a language model storage part has the similarity equal to or less than a threshold, the output part outputs the phoneme with the second tag obtained by replacement by the phoneme replacement part as the text with the first tag recognized by the first speech recognition part as it is. 10 . A non-transitory computer-readable medium storing a program, the program causing a computer to: acquire a speech signal; calculate a speech feature amount of the acquired speech signal; based on the speech feature amount, perform speech recognition using a first end-to-end model that has been learned, attach a first tag to a vocabulary portion of a specific class in a text that is a recognition result, and output the vocabulary portion with the first tag; based on the speech feature amount, perform speech recognition using a second end-to-end model that has been learned, attach a second tag to a vocabulary portion of a specific class in a phoneme that is a recognition result, and output the phoneme with the second tag; replace the vocabulary portion with the first tag in the text recognized using the first end-to-end model with the phoneme with the second tag; and convert the phoneme with the second tag obtained by replacement into a text and output the converted text, wherein, in response to a text with a highest similarity in a language model in which text and phonemes are associated has the similarity greater than a threshold, the computer converts and outputs the phoneme with the second tag obtained by replacement by the phoneme replacement part; and in response to the text with highest similarity in a language model stored in a language model storage part has the similarity equal to or less than a threshold, the computer outputs the phoneme with the second tag obtained by replacement by the phoneme replacement part as the text with the first tag recognized by the first speech recognition part as it is.

Assignees

Honda Motor Co Ltd

Inventors

Classifications

G06F40/166
Editing, e.g. inserting or deleting · CPC title
G10L2015/025
Phonemes, fenemes or fenones being the recognition units · CPC title
G10L15/193
Formal grammars, e.g. finite state automata, context free grammars or word networks · CPC title
G10L15/26Primary
Speech to text systems (G10L15/08 takes precedence) · CPC title
G10L15/32Primary
Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title

Patent family

Related publications grouped by family.

View patent family 90244233

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12579983B2 cover?: A speech recognition device includes: an acquisition part, acquiring a speech signal; a speech feature amount calculation part, calculating a speech feature amount; a first speech recognition part, based on the speech feature amount, performing speech recognition using a learned first E2E model, attaching a first tag to a vocabulary portion of a specific class in text that is a recognition resu…
Who is the assignee on this patent?: Honda Motor Co Ltd
What technology area does this patent fall under?: Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).