System and method for enhancing speech activity detection using facial feature detection
US-2016189733-A1 · Jun 30, 2016 · US
US11074910B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11074910-B2 |
| Application number | US-201815866072-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 9, 2018 |
| Priority date | Jan 9, 2017 |
| Publication date | Jul 27, 2021 |
| Grant date | Jul 27, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An electronic device includes a microphone obtaining an audio signal, a memory in which a speaker model is stored, and at least one processor. The at least one processor is configured to obtain a voice signal from the audio signal, to compare the voice signal with the speaker model to verify a user, and, if a verification result indicates that the user corresponds to a pre-enrolled speaker, to perform an operation corresponding to the obtained voice signal.
Opening claim text (preview).
What is claimed is: 1. An electronic device comprising: a microphone configured to obtain an audio signal; a scene classifier; a sensor; a memory in which a speaker model is stored; and at least one processor, wherein the at least one processor is configured to: classify, by the scene classifier, the audio signal as user speech or noise, based on distribution of the audio signal; in response to classifying the audio signal as noise, control not to receive the audio signal through the microphone; in response to classifying the audio signal as user speech, obtain a voice signal from the audio signal and compare the voice signal with the speaker model to verify a user; based on a verification result indicating that the user corresponds to a pre-enrolled speaker, perform an operation corresponding to the obtained voice signal; and verify the user based on a similarity between the speaker model and a talk model based on talk contents between the pre-enrolled speaker and another speaker, wherein, when a movement of the electronic device is sensed by the sensor, a buffering signal is transmitted to the microphone such that the audio signal is obtained after a preset point in time when the movement is sensed, and wherein, while transmitting the buffering signal, a state of the processor is changed from a sleep state to an activation state such that the processor recognizes a command from the obtained voice signal after the buffering signal is transmitted. 2. The electronic device of claim 1 , wherein the at least one processor includes a digital signal processor (DSP) electrically connected to the microphone and an application processor (AP) electrically connected to the DSP, wherein the DSP performs an operation of verifying the user and changes the state of the AP from the sleep state to the activation state based on the verification result indicating the user is the pre-enrolled speaker, and wherein the AP recognizes the command from the obtained voice signal and performs an operation associated with the command. 3. The electronic device of claim 1 , wherein the at least one processor is further configured to: determine that a signal having energy, a magnitude of which is greater than or equal to a critical value, in the audio signal is the voice signal; and determine that a signal having energy, the magnitude of which is less than the critical value, is noise. 4. The electronic device of claim 1 , wherein the at least one processor is further configured to: obtain the voice signal based on a zero crossing rate of the audio signal. 5. The electronic device of claim 1 , wherein the at least one processor is further configured to: obtain the voice signal based on a signal to noise ratio (SNR). 6. The electronic device of claim 1 , wherein the at least one processor is further configured to: obtain the voice signal based on a distribution of the audio signal. 7. The electronic device of claim 1 , wherein the at least one processor is further configured to: compare a feature value of the voice signal with a feature value of the speaker model to verify the user. 8. The electronic device of claim 7 , wherein at least one of the feature value of the voice signal and the feature value of the speaker model includes at least one of linear prediction coding (LPC) and mel-frequency cepstral coefficients (MFCC). 9. The electronic device of claim 1 , wherein the at least one processor is further configured to: verify the user by using at least one of a hidden Markov model (HMM), a Gaussian mixture model (GMM), a support vector machine (SVM), i-vector, probabilistic linear discriminant analysis (PLDA), and a deep neural network (DNN). 10. The electronic device of claim 1 , wherein the at least one processor is further configured to: verify the user based on a similarity between the speaker model and a universal background model (UBM). 11. The electronic device of claim 1 , wherein the at least one processor is further configured to: obtain the voice signal through the microphone under a specified condition; and normalize a feature value of the obtained voice signal to generate the speaker model. 12. The electronic device of claim 11 , wherein the at least one processor is further configured to: based on the electronic device transmitting a call to an external device, obtain the voice signal through the microphone. 13. The electronic device of claim 11 , wherein the at least one processor is further configured to: based on a recording application being executed, obtain the voice signal through the microphone. 14. The electronic device of claim 1 , wherein the at least one processor is further configured to: output information about whether the speaker model is generated, through a display. 15. A wearable electronic device comprising: a sensor configured to sense movement of a user; a microphone configured to obtain an audio signal based on the movement being sensed; a memory in which a speaker model is stored; and at least one processor including a digital signal processor (DSP) electrically connected to the microphone and an application processor (AP) electrically connected to the DSP, wherein the at least one processor is configured to: obtain a voice signal from the audio signal; compare the voice signal with the speaker model to verify a user; based on a verification result indicating that the user corresponds to a pre-enrolled speaker, perform an operation corresponding to the obtained voice signal, and when the movement is sensed, transmit a buffering signal to the microphone such that the audio signal is obtained after a preset point in time when the movement is sensed, and while transmitting the buffering signal, change a state of the AP from a sleep state to an activation state such that the AP recognizes a command from the obtained voice signal after the buffering signal is transmitted. 16. The wearable electronic device of claim 15 , wherein the microphone obtains the audio signal after a preset time from a point in time when the movement is sensed. 17. The wearable electronic device of claim 15 , wherein the sensor includes at least one of an acceleration sensor, a gyro sensor, a gravity sensor, and a geomagnetic sensor. 18. The wearable electronic device of claim 15 , wherein the at least one processor is further configured to: normalize a feature value of the obtained voice signal to generate the speaker model; and output information about whether the speaker model is generated, to a display.
the extracted parameters being zero crossing rates · CPC title
using predictive techniques · CPC title
Training, enrolment or model building · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
User authentication · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.