Language models using spoken language modeling
US-2024386885-A1 · Nov 21, 2024 · US
US11862143B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11862143-B2 |
| Application number | US-202016996961-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 19, 2020 |
| Priority date | Jul 27, 2020 |
| Publication date | Jan 2, 2024 |
| Grant date | Jan 2, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure is related to systems and methods for processing speech dialogue. The method includes obtaining target speech dialogue data. The method includes obtaining a text vector representation sequence, a phonetic symbol vector representation sequence, and a role vector representation sequence by performing a vector transformation on the target speech dialogue data based on a text embedding model, a phonetic symbol embedding model, and a role embedding model, respectively. The method includes determining a representation vector corresponding to the target speech dialogue data by inputting the text vector representation sequence, the phonetic symbol vector representation sequence, and the role vector representation sequence into a trained speech dialogue coding model. The method includes determining a summary of the target speech dialogue data by inputting the representation vector into a classification model.
Opening claim text (preview).
We claim: 1. A method for processing speech dialogue implemented on a computing device having at least one processor and at least one storage device, the method comprising: obtaining target speech dialogue data; obtaining a text vector representation sequence, a phonetic symbol vector representation sequence, and a role vector representation sequence by performing a vector transformation on the target speech dialogue data based on a text embedding model, a phonetic symbol embedding model, and a role embedding model, respectively; determining a representation vector corresponding to the target speech dialogue data by inputting the text vector representation sequence, the phonetic symbol vector representation sequence, and the role vector representation sequence into a trained speech dialogue coding model; determining a summary of the target speech dialogue data by inputting the representation vector into a classification model; and generating an output utilizing the determined summary of the target speech dialogue data. 2. The method of claim 1 , further comprising: obtaining a sentence text of the summary of the target speech dialogue data; and performing a grammatical correction operation on the sentence text. 3. The method of claim 1 , wherein the text embedding model includes at least one of: a word embedding sub-model configured to determine a word vector representation sequence of the target speech dialogue data; a position embedding sub-model configured to determine a position vector representation sequence of the target speech dialogue data; and a paragraph embedding sub-model configured to determine a paragraph vector representation sequence of the target speech dialogue data. 4. The method of claim 1 , wherein the determining the representation vector corresponding to the target speech dialogue data includes: obtaining at least one of a dialect vector representation sequence, an emotion vector representation sequence, or a background text vector representation sequence corresponding to the target speech dialogue data, wherein the dialect vector representation sequence is determined by performing a vector transformation on the target speech dialogue data based on a dialect embedding model; the emotion vector representation sequence is determined by performing a vector transformation on the target speech dialogue data based on an emotion embedding model; and the background text vector representation sequence is determined by performing a vector transformation on a background text of the target speech dialogue data based on a background text embedding model; and determining the representation vector corresponding to the target speech dialogue data by inputting the text vector representation sequence, the phonetic symbol vector representation sequence, the role vector representation sequence, and at least one of the dialect vector representation sequence, the emotion vector representation sequence, or the background text vector representation sequence into the trained speech dialogue coding model. 5. The method of claim 1 , wherein the speech dialogue coding model is determined according to a training process, the training process including: obtaining sample speech dialogue data; obtaining a text vector representation sequence, a phonetic symbol vector representation sequence, and a role vector representation sequence by performing a vector transformation on the sample speech dialogue data based on a text embedding model, a phonetic symbol embedding model, and a role embedding model, respectively; and obtaining a pre-trained speech dialogue coding model by pre-training the speech dialogue coding model in a self-supervised learning manner based on the text vector representation sequence, the phonetic symbol vector representation sequence, and the role vector representation sequence. 6. The method of claim 5 , wherein the training process further includes: jointly pre-training the speech dialogue coding model and at least one of the text embedding model, the phonetic symbol embedding model, or the role embedding model. 7. The method of claim 5 , wherein the pre-training the speech dialogue coding model in the self-supervised learning manner includes: designating at least portion of at least one of the text vector representation sequence, the phonetic symbol vector representation sequence, or the role vector representation sequence as an annotation, the annotation including at least portion of elements in the role vector representation sequence. 8. The method of claim 7 , wherein the annotation further includes one or more keywords in the text vector representation sequence. 9. The method of claim 7 , wherein the annotation further includes an order of sentences embodied in the text vector representation sequence. 10. The method of claim 5 , wherein the obtaining the pre-trained speech dialogue coding model by pre-training the speech dialogue coding model includes: obtaining at least one of a dialect vector representation sequence, an emotion vector representation sequence, or a background text vector representation sequence corresponding to the sample speech dialogue data, wherein the dialect vector representation sequence is determined by performing a vector transformation on the sample speech dialogue data based on a dialect embedding model; the emotion vector representation sequence is determined by performing a vector transformation on the sample speech dialogue data based on an emotion embedding model; and the background text vector representation sequence is determined by performing a vector transformation on a background text of the sample speech dialogue data based on a background text embedding model; and obtaining the pre-trained speech dialogue coding model by pre-training the speech dialogue coding model in the self-supervised learning manner based on the text vector representation sequence, the phonetic symbol vector representation sequence, the role vector representation sequence, and at least one of the dialect vector representation sequence, the emotion vector representation sequence, or the background text vector representation sequence. 11. A system for processing speech dialogue, comprising: at least one storage device including a set of instructions; and at least one processor in communication with the at least one storage device, wherein when executing the set of instructions, the at least one processor is directed to cause the system to: obtain target speech dialogue data; obtain a text vector representation sequence, a phonetic symbol vector representation sequence, and a role vector representation sequence by performing a vector transformation on the target speech dialogue data based on a text embedding model, a phonetic symbol embedding model, and a role embedding model, respectively; determine a representation vector corresponding to the target speech dialogue data by inputting the text vector representation sequence, the phonetic symbol vector representation sequence, and the role vector representation sequence into a trained speech dialogue coding model; determine a summary of the target speech dialogue data by inputting the representation vector into a classification model; and generate an output utilizing the determined summary of the target speech dialogue data. 12. The system of claim 11 , wherein the at least one processor is further directed to cause the system to: obtain a sentence text of the summary of the target speech dialogue data; and perform a grammatical correction operation on the sentence text. 13. The system of claim 11 , wherein the text embedding model includ
Training · CPC title
Classification techniques · CPC title
Grammatical analysis; Style critique · CPC title
Feature extraction for speech recognition; Selection of recognition unit · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.