Contrastive meta-learning for zero-shot learning
US-2022382979-A1 · Dec 1, 2022 · US
US12321841B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12321841-B2 |
| Application number | US-202217972167-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 24, 2022 |
| Priority date | Oct 24, 2022 |
| Publication date | Jun 3, 2025 |
| Grant date | Jun 3, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Unsupervised cross-domain data augmentation techniques for long-text document based prediction and explanation are provided. In one aspect, a system for long-document based prediction includes: an encoder for creating embeddings of long-document texts with hierarchical sparse self-attention, and making predictions using the embeddings of the long-document texts; and a multi-source counterfactual augmentation module for generating perturbed long-document texts using unlabeled sentences from at least one external source to train the encoder. A method for long-document based prediction is also provided.
Opening claim text (preview).
What is claimed is: 1. A system for long-document based prediction, comprising: an encoder for creating embeddings of long-document texts with hierarchical sparse self-attention, and making predictions using the embeddings of the long-document texts, wherein the hierarchical sparse self-attention comprises multiple stacked layers of multi-head sparse self attention and one-dimensional convolutional filters with parameterized activation functions to capture long-range sentence-level dependencies; and wherein the encoder implements a sparsity matrix to filter out trivial attention weights and enable focus on attentively important sentences; and a multi-source counterfactual augmentation module for generating perturbed long-document texts using unlabeled sentences from at least one external source to train the encoder, wherein the multi-source counterfactual augmentation module enforces both semantic alignment through topic classification and task alignment through influence function scoring, and wherein a semi-supervised training protocol is used that alternates between supervised learning of the encoder and augmentation using multi-source data through multiple rounds until convergence, with each round comprising forty epochs of supervised encoder training followed by forty epochs of augmentation training, and wherein a bidirectional Kullback-Leibler (KL) regularization component is introduced to reduce model overfitting by enforcing consistency between output distributions of different sub-models generated by dropout, with a hyperparameter a controlling the weight of the KL divergence terms. 2. The system of claim 1 , wherein the long-document texts comprise more than 500 sentences. 3. The system of claim 1 , wherein the long-document texts comprise earnings call transcripts, and wherein the embeddings comprise embeddings of the earnings call transcripts. 4. The system of claim 3 , wherein the encoder comprises a predictor with a fully-connected layer for predicting a significance level of market volatility over n-days following an earnings call using the embeddings of the earnings call transcripts. 5. The system of claim 3 , wherein the at least one external source comprises financial news. 6. The system of claim 1 , wherein the multi-source counterfactual augmentation module comprises a topic classifier for identifying salient sentences in the long-document texts for perturbation; and linking the salient sentences in the long-document texts to the unlabeled sentences from the at least one other source through topics. 7. The system of claim 6 , wherein the multi-source counterfactual augmentation module comprises an unsupervised counterfactual augmentation module for replacing one of the salient sentences of the long-document texts with one of the unlabeled sentences from the at least one external source as a perturbation, and determining a degree by which the replacing changes the predictions. 8. The system of claim 7 , wherein the determining is performed using example-based model explanation. 9. A method for long-document based prediction, the method comprising: creating, by an encoder, embeddings of long-document texts with hierarchical sparse self-attention, wherein the hierarchical sparse self-attention comprises multiple stacked layers of multi-head sparse self attention and one-dimensional convolutional filters with parameterized activation functions to capture long-range sentence-level dependencies; and wherein the encoder implements a sparsity matrix to filter out trivial attention weights and enable focus on attentively important sentences; training the encoder using perturbed long-document texts generated by counterfactual augmentation with unlabeled sentences from at least one external source, wherein the multi-source counterfactual augmentation module enforces both semantic alignment through topic classification and task alignment through influence function scoring, and wherein a semi-supervised training protocol is used that alternates between supervised learning of the encoder and augmentation using multi-source data through multiple rounds until convergence, with each round comprising forty epochs of supervised encoder training followed by forty epochs of augmentation training, and wherein a bidirectional Kullback-Leibler (KL) regularization component is introduced to reduce model overfitting by enforcing consistency between output distributions of different sub-models generated by dropout, with a hyperparameter a controlling the weight of the KL divergence terms; and making predictions, by the encoder, using the embeddings of the long-document texts. 10. The method of claim 9 , wherein the long-document texts comprise more than 500 sentences. 11. The method of claim 9 , wherein the long-document texts comprise earnings call transcripts, and wherein the embeddings comprise embeddings of the earnings call transcripts. 12. The method of claim 11 , further comprising: predicting a significance level of market volatility over n-days following an earnings call using the embeddings of the earnings call transcripts. 13. The method of claim 11 , wherein the at least one external source comprises financial news. 14. The method of claim 9 , further comprising: identifying salient sentences in the long-document texts for perturbation; and linking the salient sentences in the long-document texts to the unlabeled sentences from the at least one other source through topics. 15. The method of claim 14 , further comprising: replacing one of the salient sentences of the long-document texts with one of the unlabeled sentences from at least one other source as a perturbation. 16. The method of claim 15 , further comprising: determining a degree by which the replacing changes the predictions. 17. The method of claim 16 , wherein the determining is performed using example-based model explanation. 18. A computer program product for long-document based prediction, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform: creating, by an encoder, embeddings of long-document texts with hierarchical sparse self-attention, wherein the hierarchical sparse self-attention comprises multiple stacked layers of multi-head sparse self attention and one-dimensional convolutional filters with parameterized activation functions to capture long-range sentence-level dependencies; and wherein the encoder implements a sparsity matrix to filter out trivial attention weights and enable focus on attentively important sentences; training the encoder using perturbed long-document texts generated by counterfactual augmentation with unlabeled sentences from at least one external source, wherein the multi-source counterfactual augmentation module enforces both semantic alignment through topic classification and task alignment through influence function scoring, and wherein a semi-supervised training protocol is used that alternates between supervised learning of the encoder and augmentation using multi-source data through multiple rounds until convergence, with each round comprising forty epochs of supervised encoder training followed by forty epochs of augmentation training, and wherein a bidirectional Kullback-Leibler (KL) regularization component is introduced to reduce model overfitting by enforcing consistency between output distributions of different sub-models generated by dropout, with a hyperparameter a controlling the weight
Vector coding (for television signals, see H04N19/94) · CPC title
Character encoding · CPC title
Semantic analysis · CPC title
Editing, e.g. inserting or deleting · CPC title
Market predictions or forecasting for commercial activities · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.