Automatic synchronization for an offline virtual assistant
US-2024347055-A1 · Oct 17, 2024 · US
US2016336006A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016336006-A1 |
| Application number | US-201514711447-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 13, 2015 |
| Priority date | May 13, 2015 |
| Publication date | Nov 17, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer system for language modeling may collect training data from one or more information sources, generate a spoken corpus containing text of transcribed speech, and generate a typed corpus containing typed text. The computer system may derive feature vectors from the spoken corpus, analyze the typed corpus to determine feature vectors representing items of typed text, and generate an unspeakable corpus by filtering the typed corpus to remove each item of typed text represented by a feature vector that is within a similarity threshold of a feature vector derived from the spoken corpus. The computer system may derive feature vectors from the unspeakable corpus and train a classifier to perform discriminative data selection for language modeling based on the feature vectors derived from the spoken corpus and the feature vectors derived from the unspeakable corpus.
Opening claim text (preview).
What is claimed is: 1 . A computer system for language modeling, the computer system comprising: a processor configured to execute computer-executable instructions; and memory storing computer-executable instructions configured to: collect training data from one or more information sources; generate a spoken corpus containing text of transcribed speech; generate a typed corpus containing typed text; derive feature vectors from the spoken corpus; analyze the typed corpus to determine feature vectors representing items of typed text; generate an unspeakable corpus by filtering the typed corpus to remove each item of typed text represented by a feature vector that is within a similarity threshold of a feature vector derived from the spoken corpus; derive feature vectors from the unspeakable corpus; and train a classifier based on the feature vectors derived from the spoken corpus and the feature vectors derived from the unspeakable corpus. 2 . The computer system of claim 1 , wherein common features are expressed by the feature vectors derived from the spoken corpus, the feature vectors representing items of typed text, and the feature vectors derived from the unspeakable corpus. 3 . The computer system of claim 1 , wherein the typed corpus contains typed text generated by users of a social networking service. 4 . The computer system of claim 1 , wherein the feature vector derived from the spoken corpus presents features including item length and percentage of vowels. 5 . The computer system of claim 1 , wherein the classifier is trained to predict whether an item of text is speakable enough to be used as training data for language modeling. 6 . The computer system of claim 1 , wherein the memory further stores computer-executable instructions configured to: collect new typed text from one or more of the information sources; determine feature vectors representing items of new typed text; employ the classifier to predict whether each item of new typed text is speakable based on a feature vector representing the item of new typed text; generate a speakable corpus containing only items of new typed text that are predicted to be speakable; and train a language model based on the speakable corpus. 7 . The computer system of claim 6 , wherein the memory further stores computer-executable instructions configured to: train the language model based on the spoken corpus. 8 . The computer system of claim 6 , wherein the language model is a statistical language model for determining a conditional probability of an item given one or more previous items. 9 . The computer system of claim 6 , wherein the memory further stores computer-executable instructions configured to: perform speech recognition based on the language model. 10 . A computer-implemented method for language modeling performed by a computer system including one or more computing devices, the computer-implemented method comprising: collecting training data from one or more information sources; generating a spoken corpus containing text of transcribed speech; generating a typed corpus containing typed text; deriving feature vectors from the spoken corpus; generating an unspeakable corpus by filtering the typed corpus to remove each item of typed text that is within a similarity threshold of one or more items in the spoken corpus; deriving feature vectors from the unspeakable corpus; and training a classifier based on the feature vectors derived from the spoken corpus and the feature vectors derived from the unspeakable corpus. 11 . The computer-implemented method of claim 10 , wherein the unspeakable corpus is generated by filtering the typed corpus to remove each item of typed text represented by a feature vector that is within a threshold distance of a feature vector derived from the spoken corpus. 12 . The computer-implemented method of claim 11 , wherein common features are expressed by the feature vectors derived from the spoken corpus, feature vectors representing items of typed text, and the feature vectors derived from the unspeakable corpus. 13 . The computer-implemented method of claim 10 , wherein the typed corpus contains typed text generated by users of a social networking service. 14 . The computer-implemented method of claim 10 , further comprising: collecting new typed text from one or more of the information sources; determining a feature vector representing an item of new typed text; and employing the classifier to predict whether the item of new typed text is speakable based on the feature vector representing the item of new typed text. 15 . The computer-implemented method of claim 14 , further comprising: generating a speakable corpus containing only items of new typed text that are predicted to be speakable; and training a language model based on the speakable corpus. 16 . The computer-implemented method of claim 15 , further comprising training the language model based on the spoken corpus. 17 . The computer-implemented method of claim 15 , further comprising performing speech recognition based on the language model. 18 . A computer-readable storage medium storing computer-executable instructions that, when executed by a computing device, cause the computing device to implement: a training data collection component configured to generate a spoken corpus containing text of transcribed speech and a typed corpus containing typed text; a filtering component configured to generate an unspeakable corpus by filtering the typed corpus to remove each item of typed text represented by a feature vector that is within a similarity threshold of a feature vector derived from the spoken corpus; and a classifier training component configured to train a classifier based on feature vectors derived from the spoken corpus and feature vectors derived from the unspeakable corpus. 19 . The computer-readable storage medium of claim 18 , further storing computer-executable instructions that, when executed by a computing device, cause the computing device to implement: a feature extraction component configured to determine feature vectors representing items of new typed text and employ the classifier to predict whether each item of new typed text is speakable based on a feature vector representing the item of new typed text. 20 . The computer-readable storage medium of claim 19 , further storing computer-executable instructions that, when executed by a computing device, cause the computing device to implement: a language model training component configured to train a language model based on a speakable corpus containing only items of new typed text that are predicted to be speakable.
Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules · CPC title
using distance or distortion measures between unknown speech and reference templates · CPC title
using natural language modelling · CPC title
updating or merging of old and new templates; Mean values; Weighting · CPC title
using lexical or orthographic knowledge sources · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.