Learning dialogue state tracking with limited labeled data
US-2021174798-A1 · Jun 10, 2021 · US
US11216619B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11216619-B2 |
| Application number | US-202016860565-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 28, 2020 |
| Priority date | Apr 28, 2020 |
| Publication date | Jan 4, 2022 |
| Grant date | Jan 4, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A mechanism is provided to implement a text classifier training augmentation mechanism for incorporating unlabeled data into the generation of a text classifier. For each term of a plurality of terms in each document of a plurality of documents in a set of unlabeled data, a term frequency value is determined. The term is normalized by dividing the term frequency value by a total number of terms in the document. An inverse document frequency (idf) value is determined for each term based on the term frequency value. A subset of terms is filtered from the plurality of terms based the determined idf values. The idf values for the remaining terms are transformed into feature weights. Terms from a set of labeled data are re-weighted based on the feature weights determined from the set of unlabeled data. The text classifier is then generated using the re-weighted labeled data.
Opening claim text (preview).
What is claimed is: 1. A method, in a data processing system, comprising at least one processor and at least one memory, wherein the at least one memory comprises instructions that are executed by the at least one processor to configure the at least one processor to implement a text classifier training augmentation mechanism for incorporating unlabeled data in addition to labeled data into the generation of a text classifier, the method comprising: determining, by the text classifier training augmentation mechanism, an inverse document frequency (idf) value for each term in a plurality of terms in a set of unlabeled data; re-weighting, by the text classifier training augmentation mechanism, terms from a set of labeled data based on the idf values for the plurality of terms in the set of unlabeled data; generating, by the text classifier training augmentation mechanism, a set of normalized sample reweights based on a similarity between each sentence in the set of labeled data and each sentence in the set of unlabeled data; generating, by the text classifier training augmentation mechanism, a set of augmented sentences based on the plurality of sentences in the set of unlabeled data; performing, by the text classifier training augmentation mechanism, an inter-sample agreement check to identify a consistency loss value between the plurality of sentences in the set of unlabeled data and the set of augmented sentences; and generating, by a machine learning mechanism, the text classifier using the re-weighted labeled data, the set of normalized sample reweights, and the consistency loss value, wherein the machine learning mechanism generates the text classifier using the plurality of sentences in the set of unlabeled data, the set of augmented sentences from the set of unlabeled data, and the consistency loss value using the following loss function: Loss(original example)+alpha*Loss(weighted example)+gamma*Consistency_loss(unlabeled samples) where alpha and gamma are hyperparameters that are user configurable. 2. The method of claim 1 , further comprising: weighing down, by the text classifier training augmentation mechanism, frequent terms in the plurality of terms while scaling up rare terms in the plurality of terms by computing the idf value for the term using the following equation: IDF ( t ) = log _ e ( Total number of documents Number of documents with term t ) . 3. The method of claim 1 , wherein generating the set of normalized sample reweights further comprises: generating, by the text classifier training augmentation mechanism, a sentence representation for each sentence of a plurality of sentences in the set of unlabeled data; computing, by the text classifier training augmentation mechanism, a cosine similarity between each sentence representation of a plurality of sentences in the set of labeled data and each sentence representation of the set of unlabeled data; determining, by the text classifier training augmentation mechanism, a weighted sum of the similarities for each sentence in the set of labeled data; and normalizing, by the text classifier training augmentation mechanism, the weighted sums over all the plurality of sentences in the labeled data thereby producing the set of normalized sample reweights. 4. The method of claim 3 , wherein the Loss(weighted example) is equal to a learned-weight multiplied by the Loss(original example). 5. The method of claim 1 , wherein performing the inter-sample agreement check further comprises: generating, by the text classifier training augmentation mechanism, a sentence representation for each sentence in the set of unlabeled data thereby generating the set of augmented sentences; and identifying, by the text classifier training augmentation mechanism, a prediction distribution between the plurality of sentences in the set of unlabeled data and the set of augmented sentences from the set of unlabeled data. 6. The method of claim 5 , wherein the Consistency_loss is an inverse to a similarity between at least one generated sentence and at least one associated unlabeled sentence. 7. The method of claim 1 , wherein determining the inverse document frequency (idf) value for each term in the plurality of terms comprises: for each term of the plurality of terms in the set of unlabeled data, determining a term frequency value; normalizing the term by dividing the term frequency value by a total number of terms in the document. 8. The method of claim 7 , further comprising: filtering a subset of terms from the plurality of terms based the determined idf values; and transforming the idf values for the remaining terms into feature weights. 9. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a data processing system, causes the data processing system to implement a text classifier training augmentation mechanism for incorporating unlabeled data in addition to labeled data into the generation of a text classifier, and further causes the data processing system to: determine an inverse document frequency (idf) value for each term in a plurality of terms in
Combinations of networks · CPC title
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
Supervised learning · CPC title
Learning methods · CPC title
Semantic analysis · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.