Learning method and information processing apparatus
US-2023259717-A1 · Aug 17, 2023 · US
US2021365633A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2021365633-A1 |
| Application number | US-202016881995-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 22, 2020 |
| Priority date | May 22, 2020 |
| Publication date | Nov 25, 2021 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of the present disclosure include systems and methods for packing tokens to train sequence models. In some embodiments, a plurality of datasets for training a sequence model is received. Each dataset in the plurality of datasets includes a sequence of correlated tokens. A set of training data is generated that includes a subset of a sequence of tokens from a first dataset in the plurality of datasets and a subset of a sequence of tokens from a second, different dataset in the plurality of datasets. The sequence model is trained using the set of training data.
Opening claim text (preview).
What is claimed is: 1 . A system comprising: a set of processing units; and a non-transitory machine-readable medium storing instructions that when executed by at least one processing unit in the set of processing units cause the at least one processing unit to: receive a plurality of datasets for training a sequence model, each dataset in the plurality of datasets comprising a sequence of correlated tokens; generate a set of training data comprising a subset of a sequence of tokens from a first dataset in the plurality of datasets and a subset of a sequence of tokens from a second, different dataset in the plurality of datasets; and train the sequence model using the set of training data. 2 . The system of claim 1 , wherein generating the set of training data comprises: determining, for the sequence of tokens of each dataset in the plurality of datasets, a set of groups of tokens in the sequence of tokens; and determining, for each group of tokens in the set of groups of tokens of each sequence of tokens, a length of the group of tokens, wherein generating the set of training data is based on the lengths of the groups of tokens. 3 . The system of claim 2 , wherein the set of training data comprises a data structure having a defined length, wherein generating the set of training data comprises: identifying a group of tokens in the plurality of datasets having the longest length equal to or less than the defined length of the data structure; and packing the data structure with the identified group of tokens. 4 . The system of claim 3 , wherein generating the set of training data further comprises iteratively packing the data structure with remaining groups of tokens in the plurality of datasets having the longest length that is equal to or less than a remaining length in the data structure. 5 . The system of claim 3 , wherein the data structure is a first data structure, wherein the set of training data further comprises a second data structure having the defined length, wherein generating the set of training data comprises: identifying a remaining group of tokens in the plurality of datasets having the longest length equal to or less than the defined length of the data structure; and packing the second data structure with the identified group of tokens. 6 . The system of claim 1 , wherein generating the set of training data comprises adding label data to the set of training data indicating that the subset of the sequence of tokens from the first dataset and the subset of the sequence of tokens from the second, different dataset are not correlated. 7 . The method of claim 1 , wherein the sequence of correlated tokens in the first dataset is a first set of sentences from a first paragraph of text, wherein the sequence of correlated tokens in the second dataset is a second set of sentences from a second paragraph of text. 8 . A system comprising: a set of processing units; and a non-transitory machine-readable medium storing instructions that when executed by at least one processing unit in the set of processing units cause the at least one processing unit to: receive a set of input data for training a sequence model, the input data comprising a sequence of tokens; group the sequence of tokens into a set of groups of tokens; generate a set of training data comprising the set of groups of tokens and copies of at least a portion of a group of tokens in the set of groups of tokens; and train the sequence model using the set of training data. 9 . The system of claim 8 , wherein the copies of at least the portion of the group of tokens are copies of at least the portion of a first group of tokens, wherein the set of training data further comprises copies of at least a portion of a second group of tokens in the set of groups of tokens. 10 . The system of claim 9 , wherein generating the set of training data comprises: generating a data structure having a defined length; packing a first row of the data structure with the first group of tokens and the copies of at least the portion of the first group of tokens until the length of the first row of the data structure is filled up with tokens from the first group of tokens; and packing the second row of the data structure with the second group of tokens and the copies of at least the portion of the second group of tokens until the length of the second row of the data structure is filled up with tokens from the second group of tokens. 11 . The system of claim 10 , wherein the instructions further cause the at least one processing unit to: determine a first set of embeddings comprising an embedding for each token in the first group of tokens and the copies of at least the portion of the first group of tokens; determine a second set of embeddings comprising an embedding for each token in the second group of tokens and the copies of at least the portion of the second group of tokens; and add the first set of embeddings to the second set of embeddings. 12 . The system of claim 9 , wherein generating the set of training data comprises repeating the copies of at least the portion of the first group of tokens and the copies of at least the portion of the second group of tokens. 13 . The system of claim 8 , wherein generating the set of training data comprises: generating a data structure having a defined length; and packing the data structure with the set of groups of tokens and the copies of the at least one portion of the group of tokens in the set of groups of tokens so that a total number of tokens packed into the data structure is equal to the defined length. 14 . A method comprising: receiving a plurality of datasets for training a sequence model, each dataset in the plurality of datasets comprising a sequence of correlated tokens; generating a set of training data comprising a subset of a sequence of tokens from a first dataset in the plurality of datasets and a subset of a sequence of tokens from a second, different dataset in the plurality of datasets; and training the sequence model using the set of training data. 15 . The method of claim 14 , wherein generating the set of training data comprises: determining, for the sequence of tokens of each dataset in the plurality of datasets, a set of groups of tokens in the sequence of tokens; determining, for each group of tokens in the set of groups of tokens of each sequence of tokens, a length of the group of tokens, wherein generating the set of training data is based on the lengths of the groups of tokens. 16 . The method of claim 15 , wherein the set of training data comprises a data structure having a defined length, wherein generating the set of training data comprises: identifying a group of tokens in the plurality of datasets having the longest length equal to or less than the defined length of the data structure; and packing the data structure with the identified group of tokens. 17 . The method of claim 16 , wherein generating the set of training data further comprises iteratively packing the data structure with remaining groups of tokens in the plurality of datasets having the longest length that is equal to or less than a remaining length in the data structure. 18 . The method of claim 16 , wherein the data structure is a first data structure, wherein the set of training data further comprises a second data structure having the defined length, wherein generating the set of training data comprises: identifying a remaining group of tokens in the plurality of datasets having the longest
Combinations of networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Supervised learning · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.