Isolated word training and detection

US2016189706A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016189706-A1
Application numberUS-201514606588-A
CountryUS
Kind codeA1
Filing dateJan 27, 2015
Priority dateDec 30, 2014
Publication dateJun 30, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatuses are described for isolated word training and detection. Isolated word training devices and systems are provided in which a user may provide a wake-up phrase from 1 to 3 times to train the device or system. A concatenated phoneme model of the user-provided wake-up phrase may be generated based on the provided wake-up phrase and a pre-trained phoneme model database. A word model of the wake-up phrase may be subsequently generated from the concatenated phoneme model and the provided wake-up phrase. Once trained, the user-provided wake-up phrase may be used to unlock the device or system and/or to wake up the device or system from a standby mode of operation. The word model of the user-provided wake-up phrase may be further adapted based on additional provisioning of the wake-up phrase.

First claim

Opening claim text (preview).

What is claimed is: 1 . An isolated word training system that comprises: an input component configured to receive at least one audio input representation; a recognition component configured to generate a phoneme concatenation model of the at least one audio input representation based on a phoneme transcription; an adaptation component configured to generate a first word model of the at least one audio input based on the phoneme concatenation model. 2 . The isolated word training system of claim 1 , wherein the adaptation component is further configured to generate a second word model based on the first word model and at least one additional audio input representation. 3 . The isolated word training system of claim 1 , that further comprises: a phoneme model database that includes the one or more stored phoneme models, each of the one or more stored phoneme models being a pre-trained Hidden Markov Model. 4 . The isolated word training system of claim 1 , wherein the recognition component is configured to generate the phoneme concatenation model by: decoding feature vectors of the at least one audio input using the one or more stored phoneme models to generate the phoneme transcription that comprises one or more phoneme identifiers; selecting a subset of phoneme identifiers from the phoneme transcription; and selecting corresponding phoneme models from a phoneme model database for each phoneme identifier in the subset as the phoneme concatenation model. 5 . The isolated word training system of claim 4 , that further comprises: a feature extraction component configured to: derive speech features of speech of the at least one audio input; and generate the feature vectors based on the speech features. 6 . The isolated word training system of claim 5 , that further comprises: a voice activity detection component configured to: detect an onset of the speech; determine a termination of the speech; and provide the speech to the feature extraction component. 7 . The isolated word training system of claim 1 , wherein the adaptation component is configured to generate the first word model by at least one of: removing one or more phonemes from the phoneme concatenation model; combining one or more phonemes of the phoneme concatenation model; adapting one or more state transition probabilities of the phoneme concatenation model; adapting an observation symbol probability distribution of the phoneme concatenation model; or adapting the phoneme concatenation model as a whole. 8 . An electronic device that comprises: a first processing component configured to: receive a user-specified wake-up phrase, generate a phoneme concatenation model of the user-specified wake-up phrase, and generate a word model of the user-specified wake-up phrase based on the phoneme concatenation model; and a second processing component configured to: detect audio activity of the user, and determine if the user-specified wake-up phrase is present within the audio activity based on the word model. 9 . The electronic device of claim 8 , wherein the first processing component is further configured to operate in a training mode of the electronic device, and wherein the second processing component is further configured to: operate in a stand-by mode of the electronic device, and provide, in the stand-by mode, an indication to the electronic device to operate in a normal operating mode subsequent to a determination that the user-specified wake-up phrase is present in the audio activity. 10 . The electronic device of claim 9 , that further comprises: a memory component configured to: buffer the audio activity and the user-specified wake-up phrase of the user; and provide the audio activity or the user-specified wake-up phrase to at least one of a voice activity detector or an automatic speech recognition component. 11 . The electronic device of claim 9 , wherein the first processing component, in the training mode, is further configured to: decode feature vectors of the at least one audio input using the one or more stored phoneme models to generate a phoneme transcription that comprises one or more phoneme identifiers; select a subset of phoneme identifiers from the phoneme transcription; and select corresponding phoneme models from a phoneme model database for each phoneme identifier in the subset as the phoneme concatenation model. 12 . The electronic device of claim 11 , wherein the first processing component, in the training mode, is further configured to: derive speech features of the user-specified wake-up phrase; and generate the feature vectors based on the speech features. 13 . The electronic device of claim 9 , wherein the second processing component, in the stand-by mode, is further configured to: derive speech features of the audio activity; generate feature vectors based on the speech features; and determine if the user-specified wake-up phrase is present by comparing the feature vectors to the word model. 14 . The electronic device of claim 8 , wherein the first processing component is further configured to: generate the first word model by at least one of: removing one or more phonemes from the phoneme concatenation model; combining one or more phonemes of the phoneme concatenation model; adapting one or more state transition probabilities of the phoneme concatenation model; adapting an observation symbol probability distribution of the phoneme concatenation model; or adapting the phoneme concatenation model as a whole. 15 . The electronic device of claim 8 , wherein the first processing component is further configured to update the word model based on the user-specified wake-up phrase in a subsequent audio input. 16 . The electronic device of claim 8 , that further comprises: a phoneme model database that includes one or more stored phoneme models, each of the one or more stored phoneme models being a pre-trained Hidden Markov Model. 17 . A computer-readable storage medium having program instructions recorded thereon that, when executed by an electronic device, perform a method for utilizing a user-specified wake-up phrase in the electronic device, the method comprising: training the electronic device using the user-specified wake-up phrase, received from a user, to generate, at the electronic device, a user-dependent word model of the user-specified wake-up phrase; and transitioning the electronic device from a stand-by mode to a normal operating mode subsequent to a detection, that is based on the word model, of the user-specified wake-up phrase. 18 . The computer-readable storage medium of claim 17 , wherein training the electronic device using the user-specified wake-up phrase includes receiving the user-specified wake-up phrase from the user no more than three times. 19 . The computer-readable storage medium of claim 18 , wherein the electronic device generates the word model based on a phoneme concatenation model of the user-specified wake-up phrase. 20 . The computer-readable storage medium of claim 19 , wherein the training further comprises: generating a phoneme transcription of the user-specified wake-up phrase that includes one or more phoneme identifiers; and generating the phoneme concatenation model of the user-specified wake-up phrase based on stored phoneme models that correspond to the one or more phoneme identifiers.

Assignees

Inventors

Classifications

  • G10L15/063Primary

    Training · CPC title

  • G10L15/02Primary

    Feature extraction for speech recognition; Selection of recognition unit · CPC title

  • Phonemes, fenemes or fenones being the recognition units · CPC title

  • Word boundary detection · CPC title

  • Training of HMMs · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016189706A1 cover?
Methods, systems, and apparatuses are described for isolated word training and detection. Isolated word training devices and systems are provided in which a user may provide a wake-up phrase from 1 to 3 times to train the device or system. A concatenated phoneme model of the user-provided wake-up phrase may be generated based on the provided wake-up phrase and a pre-trained phoneme model databa…
Who is the assignee on this patent?
Broadcom Corp
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 30 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).