Electronic device for recognizing speech

US11074910B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11074910-B2
Application numberUS-201815866072-A
CountryUS
Kind codeB2
Filing dateJan 9, 2018
Priority dateJan 9, 2017
Publication dateJul 27, 2021
Grant dateJul 27, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An electronic device includes a microphone obtaining an audio signal, a memory in which a speaker model is stored, and at least one processor. The at least one processor is configured to obtain a voice signal from the audio signal, to compare the voice signal with the speaker model to verify a user, and, if a verification result indicates that the user corresponds to a pre-enrolled speaker, to perform an operation corresponding to the obtained voice signal.

First claim

Opening claim text (preview).

What is claimed is: 1. An electronic device comprising: a microphone configured to obtain an audio signal; a scene classifier; a sensor; a memory in which a speaker model is stored; and at least one processor, wherein the at least one processor is configured to: classify, by the scene classifier, the audio signal as user speech or noise, based on distribution of the audio signal; in response to classifying the audio signal as noise, control not to receive the audio signal through the microphone; in response to classifying the audio signal as user speech, obtain a voice signal from the audio signal and compare the voice signal with the speaker model to verify a user; based on a verification result indicating that the user corresponds to a pre-enrolled speaker, perform an operation corresponding to the obtained voice signal; and verify the user based on a similarity between the speaker model and a talk model based on talk contents between the pre-enrolled speaker and another speaker, wherein, when a movement of the electronic device is sensed by the sensor, a buffering signal is transmitted to the microphone such that the audio signal is obtained after a preset point in time when the movement is sensed, and wherein, while transmitting the buffering signal, a state of the processor is changed from a sleep state to an activation state such that the processor recognizes a command from the obtained voice signal after the buffering signal is transmitted. 2. The electronic device of claim 1 , wherein the at least one processor includes a digital signal processor (DSP) electrically connected to the microphone and an application processor (AP) electrically connected to the DSP, wherein the DSP performs an operation of verifying the user and changes the state of the AP from the sleep state to the activation state based on the verification result indicating the user is the pre-enrolled speaker, and wherein the AP recognizes the command from the obtained voice signal and performs an operation associated with the command. 3. The electronic device of claim 1 , wherein the at least one processor is further configured to: determine that a signal having energy, a magnitude of which is greater than or equal to a critical value, in the audio signal is the voice signal; and determine that a signal having energy, the magnitude of which is less than the critical value, is noise. 4. The electronic device of claim 1 , wherein the at least one processor is further configured to: obtain the voice signal based on a zero crossing rate of the audio signal. 5. The electronic device of claim 1 , wherein the at least one processor is further configured to: obtain the voice signal based on a signal to noise ratio (SNR). 6. The electronic device of claim 1 , wherein the at least one processor is further configured to: obtain the voice signal based on a distribution of the audio signal. 7. The electronic device of claim 1 , wherein the at least one processor is further configured to: compare a feature value of the voice signal with a feature value of the speaker model to verify the user. 8. The electronic device of claim 7 , wherein at least one of the feature value of the voice signal and the feature value of the speaker model includes at least one of linear prediction coding (LPC) and mel-frequency cepstral coefficients (MFCC). 9. The electronic device of claim 1 , wherein the at least one processor is further configured to: verify the user by using at least one of a hidden Markov model (HMM), a Gaussian mixture model (GMM), a support vector machine (SVM), i-vector, probabilistic linear discriminant analysis (PLDA), and a deep neural network (DNN). 10. The electronic device of claim 1 , wherein the at least one processor is further configured to: verify the user based on a similarity between the speaker model and a universal background model (UBM). 11. The electronic device of claim 1 , wherein the at least one processor is further configured to: obtain the voice signal through the microphone under a specified condition; and normalize a feature value of the obtained voice signal to generate the speaker model. 12. The electronic device of claim 11 , wherein the at least one processor is further configured to: based on the electronic device transmitting a call to an external device, obtain the voice signal through the microphone. 13. The electronic device of claim 11 , wherein the at least one processor is further configured to: based on a recording application being executed, obtain the voice signal through the microphone. 14. The electronic device of claim 1 , wherein the at least one processor is further configured to: output information about whether the speaker model is generated, through a display. 15. A wearable electronic device comprising: a sensor configured to sense movement of a user; a microphone configured to obtain an audio signal based on the movement being sensed; a memory in which a speaker model is stored; and at least one processor including a digital signal processor (DSP) electrically connected to the microphone and an application processor (AP) electrically connected to the DSP, wherein the at least one processor is configured to: obtain a voice signal from the audio signal; compare the voice signal with the speaker model to verify a user; based on a verification result indicating that the user corresponds to a pre-enrolled speaker, perform an operation corresponding to the obtained voice signal, and when the movement is sensed, transmit a buffering signal to the microphone such that the audio signal is obtained after a preset point in time when the movement is sensed, and while transmitting the buffering signal, change a state of the AP from a sleep state to an activation state such that the AP recognizes a command from the obtained voice signal after the buffering signal is transmitted. 16. The wearable electronic device of claim 15 , wherein the microphone obtains the audio signal after a preset time from a point in time when the movement is sensed. 17. The wearable electronic device of claim 15 , wherein the sensor includes at least one of an acceleration sensor, a gyro sensor, a gravity sensor, and a geomagnetic sensor. 18. The wearable electronic device of claim 15 , wherein the at least one processor is further configured to: normalize a feature value of the obtained voice signal to generate the speaker model; and output information about whether the speaker model is generated, to a display.

Assignees

Inventors

Classifications

  • the extracted parameters being zero crossing rates · CPC title

  • using predictive techniques · CPC title

  • G10L17/04Primary

    Training, enrolment or model building · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • User authentication · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11074910B2 cover?
An electronic device includes a microphone obtaining an audio signal, a memory in which a speaker model is stored, and at least one processor. The at least one processor is configured to obtain a voice signal from the audio signal, to compare the voice signal with the speaker model to verify a user, and, if a verification result indicates that the user corresponds to a pre-enrolled speaker, to …
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L17/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 27 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).