Method and device with speech processing

US2025149027A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025149027-A1
Application numberUS-202418903676-A
CountryUS
Kind codeA1
Filing dateOct 1, 2024
Priority dateNov 7, 2023
Publication dateMay 8, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of processing speech includes: obtaining a speech input; obtaining an instruction related to the speech input; obtaining a speech representation corresponding to the speech input; obtaining an adapter that includes speech information by fusing a pre-trained adapter with the speech representation; and obtaining a response corresponding to the instruction by inputting both the adapter that includes the speech information and the instruction to a language model, the language model generating the response based on the adapter that includes the speech model and the speech information.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of processing speech, the method comprising: obtaining a speech input; obtaining an instruction related to the speech input; obtaining a speech representation corresponding to the speech input; obtaining an adapter that includes speech information by fusing a pre-trained adapter with the speech representation; and obtaining a response corresponding to the instruction by inputting both the adapter that includes the speech information and the instruction to a language model. 2 . The method of claim 1 , wherein the adapter that includes the speech information and the pre-trained adapter have a same length. 3 . The method of claim 1 , wherein the obtaining of the adapter that includes the speech information comprises: inputting the pre-trained adapter as a query of a multi-head attention; inputting the speech representation as a key-value to the multi-head attention; and determining an output of the multi-head attention to be the adapter that includes the speech information. 4 . The method of claim 1 , wherein the pre-trained adapter has a fixed-length, and the speech representation has a variable length. 5 . The method of claim 1 , wherein the speech representation is obtained by inputting the speech input to a speech encoder. 6 . The method of claim 1 , wherein the response comprises, with respect to the speech input, a speech recognition, a speech emotion recognition, a speaker recognition, a speech translation, or colloquial language understanding related to the speech input. 7 . A training method comprising: obtaining a speech input; obtaining an instruction generated based on the speech input and a labeled response; obtaining a speech representation corresponding to the speech input; obtaining an adapter that includes speech information by fusing an adapter and the speech representation; obtaining a response corresponding to the instruction by inputting the adapter that includes the speech information and the instruction to a language model that generates the response corresponding to the instruction; and training the adapter based on the labeled response and the response corresponding to the instruction. 8 . The training method of claim 7 , wherein the speech representation is obtained by inputting the speech input to a speech encoder that generates the speech representation, the speech representation representing features of the speech input. 9 . The training method of claim 8 , wherein the language model and the speech encoder are pre-trained models. 10 . The training method of claim 7 , wherein the adapter that includes the speech information and the adapter fused with the speech representation have a same length. 11 . The training method of claim 7 , wherein the obtaining of the adapter that includes the speech information comprises: inputting the adapter as a query of multi-head attention; inputting the speech representation as a key-value of the multi-head attention; and determining an output of the multi-head attention to serve as the adapter that includes the speech information. 12 . The training method of claim 7 , wherein the adapter has a fixed-length, and the speech representation has a variable length. 13 . A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1 . 14 . A device for processing speech, the device comprising: a speech encoder configured to receive a speech input and configured to encode the speech input as a speech representation corresponding to the speech input; a fusion model configured to output an adapter that includes speech information by fusing a pre-trained adapter and the speech representation; and a language model configured to receive an instruction related to the speech input and the adapter including the speech information, the language model configured to output a response corresponding to the instruction. 15 . The device of claim 14 , wherein the adapter including the speech information and the pre-trained adapter have a same length. 16 . The device of claim 14 , wherein the fusion model is configured to: receive the pre-trained adapter as a query of a multi-head attention; receive the speech representation as a key-value of the multi-head attention; and output an output of the multi-head attention to the adapter comprising the speech information. 17 . The device of claim 14 , wherein the pre-trained adapter has a fixed-length, and the speech representation has a variable length.

Assignees

Inventors

Classifications

  • Feedback of the input speech · CPC title

  • Execution procedure of a spoken command · CPC title

  • Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

  • Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title

  • for estimating an emotional state · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025149027A1 cover?
A method of processing speech includes: obtaining a speech input; obtaining an instruction related to the speech input; obtaining a speech representation corresponding to the speech input; obtaining an adapter that includes speech information by fusing a pre-trained adapter with the speech representation; and obtaining a response corresponding to the instruction by inputting both the adapter th…
Who is the assignee on this patent?
Samsung Electronics Co Ltd, Seoul Nat Univ R&Db Foundation
What technology area does this patent fall under?
Primary CPC classification G10L15/183. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 08 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).