Method and electronic device for processing audio, and non-transitory storage medium

US11355100B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11355100-B2
Application numberUS-202017033715-A
CountryUS
Kind codeB2
Filing dateSep 26, 2020
Priority dateApr 15, 2020
Publication dateJun 7, 2022
Grant dateJun 7, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for processing information includes that: a current audio is acquired, and a current text corresponding to the current audio is acquired; feature extraction is performed on the current audio through a speech feature extraction portion in a semantic analysis model, to obtain a speech feature of the current audio; feature extraction is performed on the current text through a text feature extraction portion in the semantic analysis model, to obtain a text feature of the current text; semantic classification is performed on the speech feature and the text feature through a classification portion in the semantic analysis model, to obtain a classification result; and recognition of the current audio is rejected in response to the classification result indicating that the current audio is to be rejected for recognition.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for processing information, implemented by an electronic device and applied to audio interaction scenarios, the method comprising: acquiring, by an audio acquisition component in the electronic device, an audio, and acquiring, by a processor in the electronic device, a text corresponding to the audio; performing, by the processor, feature extraction on the audio through a speech feature extraction portion in a semantic analysis model, to obtain a speech feature of the audio; performing, by the processor, feature extraction on the text through a text feature extraction portion in the semantic analysis model, to obtain a text feature of the text; performing, by the processor, semantic classification on the speech feature and the text feature through a classification portion in the semantic analysis model, to obtain a classification result; and rejecting, by the processor, to recognize the audio in response to the classification result indicating that the audio is to be rejected for recognition, and performing, by the processor, semantic analysis on the audio to obtain an analysis result in response to the classification result indicating that the audio is not to be rejected for recognition, and outputting, by an audio output component in the electronic device, response information corresponding to the analysis result, wherein the performing semantic classification on the speech feature and the text feature through the classification portion in the semantic analysis model, to obtain the classification result comprises: splicing a speech feature vector for charactering the speech feature and a text feature vector for charactering the text feature, to obtain a spliced feature vector that is to be input into the classification portion; and performing the semantic classification on the spliced feature vector through the classification portion, to obtain the classification result. 2. The method of claim 1 , wherein the speech feature comprises at least one of: a tone feature, an intonation feature, or a speech rate feature. 3. The method of claim 1 , further comprising: obtaining a speech feature vector for charactering the speech feature according to a vector transformation mechanism in the speech feature extraction portion; performing convolution calculation between a convolution kernel of the speech feature extraction portion and the speech feature vector, to obtain a convolution operation value; and extracting a feature vector of the speech feature by processing the convolution operation value through a pooling layer of the speech feature extraction portion. 4. The method of claim 1 , wherein the text feature comprises: a literal meaning feature and a context feature of the text; the performing feature extraction on the text through the text feature extraction portion in the semantic analysis model, to obtain the text feature of the text comprises: performing semantic analysis on each word in the text through the semantic analysis model to obtain a literal meaning feature of the word; and obtaining the context feature by extracting a feature from an adjacent text of the text through the text feature extraction portion. 5. The method of claim 4 , further comprising: obtaining a knowledge data feature by determining knowledge data associated with the text from a knowledge graph based on the text; and performing semantic classification on the speech feature, the text feature and the knowledge data feature through the classification portion, to obtain the classification result. 6. A smart electronic device implementing the method of claim 1 , wherein the current text corresponding to the current audio is obtained according to the acquired current audio prior to interactions between the smart electronic and the user; and the semantic analysis model comprises three independent portions, including a speech feature extraction portion, a text feature extraction portion, and a classification portion, such that the speech features and the text features are extracted in parallel based on the independent speech feature extraction portion and the text feature extraction portion, thereby improving a data processing speed of the semantic analysis model. 7. The smart electronic device of claim 6 , wherein the smart electronic device is configured to analyze the speech features and the text features at a same time, to thereby more accurately determine what the current audio wants to express by combining results of speech analysis with results of text analysis, improve accuracy of extracted features and the classification result, and reduce probability of false response and unnecessary semantic analysis process. 8. An electronic device for processing information, comprising: a processor; and memory storing instructions executable by the processor, wherein the processor is configured to: acquire, through an audio acquisition component, an audio, and acquire a text corresponding to the audio; perform feature extraction on the audio through a speech feature extraction portion in a semantic analysis model, to obtain a speech feature of the audio; perform feature extraction on the text through a text feature extraction portion in the semantic analysis model, to obtain a text feature of the text; perform semantic classification on the speech feature and the text feature through a classification portion in the semantic analysis model, to obtain a classification result; and reject to recognize the audio in response to the classification result indicating that the audio is to be rejected for recognition, and perform semantic analysis on the audio to obtain an analysis result in response to the classification result indicating that the audio is not to be rejected for recognition, and output, through an audio output component, response information corresponding to the analysis result, wherein the processor is further configured to: splice a speech feature vector for characterizing the speech feature and a text feature vector for characterizing the text feature, to obtain a spliced feature vector that is to be input into the classification portion; and perform the semantic classification on the spliced feature vector through the classification portion, to obtain the classification result. 9. The device of claim 8 , wherein the speech feature comprises at least one of: a tone feature, an intonation feature, or a speech rate feature. 10. The device of claim 8 , wherein the processor is further configured to: obtain a speech feature vector for characterizing the speech feature according to a vector transformation mechanism in the speech feature extraction portion; perform convolution calculation between a convolution kernel of the speech feature extraction portion and the speech feature vector, to obtain a convolution operation value; and extract a feature vector of the speech feature by processing the convolution operation value through a pooling layer of the speech feature extraction portion. 11. The device of claim 8 , wherein the text feature comprises: a literal meaning feature and a context feature of the text; and the processor is further configured to: perform semantic analysis on each word in the text through the semantic analysis model, to obtain a literal meaning feature of the word; and obtain the context feature by extracting a feature from an adjacent text of the text through the text feature extraction portion. 12. The device of claim 11 , wherein the processor is further configured to: determine a knowledge data feature by determining knowledge data associated with the text from a knowledge graph based on th

Assignees

Inventors

Classifications

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • G10L15/02Primary

    Feature extraction for speech recognition; Selection of recognition unit · CPC title

  • Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

  • Discourse or dialogue representation · CPC title

  • Speech classification or search · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11355100B2 cover?
A method for processing information includes that: a current audio is acquired, and a current text corresponding to the current audio is acquired; feature extraction is performed on the current audio through a speech feature extraction portion in a semantic analysis model, to obtain a speech feature of the current audio; feature extraction is performed on the current text through a text feature…
Who is the assignee on this patent?
Beijing Xiaomi Pinecone Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/02. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 07 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).