Token Packing for Sequence Models

US2021365633A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021365633-A1
Application numberUS-202016881995-A
CountryUS
Kind codeA1
Filing dateMay 22, 2020
Priority dateMay 22, 2020
Publication dateNov 25, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure include systems and methods for packing tokens to train sequence models. In some embodiments, a plurality of datasets for training a sequence model is received. Each dataset in the plurality of datasets includes a sequence of correlated tokens. A set of training data is generated that includes a subset of a sequence of tokens from a first dataset in the plurality of datasets and a subset of a sequence of tokens from a second, different dataset in the plurality of datasets. The sequence model is trained using the set of training data.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system comprising: a set of processing units; and a non-transitory machine-readable medium storing instructions that when executed by at least one processing unit in the set of processing units cause the at least one processing unit to: receive a plurality of datasets for training a sequence model, each dataset in the plurality of datasets comprising a sequence of correlated tokens; generate a set of training data comprising a subset of a sequence of tokens from a first dataset in the plurality of datasets and a subset of a sequence of tokens from a second, different dataset in the plurality of datasets; and train the sequence model using the set of training data. 2 . The system of claim 1 , wherein generating the set of training data comprises: determining, for the sequence of tokens of each dataset in the plurality of datasets, a set of groups of tokens in the sequence of tokens; and determining, for each group of tokens in the set of groups of tokens of each sequence of tokens, a length of the group of tokens, wherein generating the set of training data is based on the lengths of the groups of tokens. 3 . The system of claim 2 , wherein the set of training data comprises a data structure having a defined length, wherein generating the set of training data comprises: identifying a group of tokens in the plurality of datasets having the longest length equal to or less than the defined length of the data structure; and packing the data structure with the identified group of tokens. 4 . The system of claim 3 , wherein generating the set of training data further comprises iteratively packing the data structure with remaining groups of tokens in the plurality of datasets having the longest length that is equal to or less than a remaining length in the data structure. 5 . The system of claim 3 , wherein the data structure is a first data structure, wherein the set of training data further comprises a second data structure having the defined length, wherein generating the set of training data comprises: identifying a remaining group of tokens in the plurality of datasets having the longest length equal to or less than the defined length of the data structure; and packing the second data structure with the identified group of tokens. 6 . The system of claim 1 , wherein generating the set of training data comprises adding label data to the set of training data indicating that the subset of the sequence of tokens from the first dataset and the subset of the sequence of tokens from the second, different dataset are not correlated. 7 . The method of claim 1 , wherein the sequence of correlated tokens in the first dataset is a first set of sentences from a first paragraph of text, wherein the sequence of correlated tokens in the second dataset is a second set of sentences from a second paragraph of text. 8 . A system comprising: a set of processing units; and a non-transitory machine-readable medium storing instructions that when executed by at least one processing unit in the set of processing units cause the at least one processing unit to: receive a set of input data for training a sequence model, the input data comprising a sequence of tokens; group the sequence of tokens into a set of groups of tokens; generate a set of training data comprising the set of groups of tokens and copies of at least a portion of a group of tokens in the set of groups of tokens; and train the sequence model using the set of training data. 9 . The system of claim 8 , wherein the copies of at least the portion of the group of tokens are copies of at least the portion of a first group of tokens, wherein the set of training data further comprises copies of at least a portion of a second group of tokens in the set of groups of tokens. 10 . The system of claim 9 , wherein generating the set of training data comprises: generating a data structure having a defined length; packing a first row of the data structure with the first group of tokens and the copies of at least the portion of the first group of tokens until the length of the first row of the data structure is filled up with tokens from the first group of tokens; and packing the second row of the data structure with the second group of tokens and the copies of at least the portion of the second group of tokens until the length of the second row of the data structure is filled up with tokens from the second group of tokens. 11 . The system of claim 10 , wherein the instructions further cause the at least one processing unit to: determine a first set of embeddings comprising an embedding for each token in the first group of tokens and the copies of at least the portion of the first group of tokens; determine a second set of embeddings comprising an embedding for each token in the second group of tokens and the copies of at least the portion of the second group of tokens; and add the first set of embeddings to the second set of embeddings. 12 . The system of claim 9 , wherein generating the set of training data comprises repeating the copies of at least the portion of the first group of tokens and the copies of at least the portion of the second group of tokens. 13 . The system of claim 8 , wherein generating the set of training data comprises: generating a data structure having a defined length; and packing the data structure with the set of groups of tokens and the copies of the at least one portion of the group of tokens in the set of groups of tokens so that a total number of tokens packed into the data structure is equal to the defined length. 14 . A method comprising: receiving a plurality of datasets for training a sequence model, each dataset in the plurality of datasets comprising a sequence of correlated tokens; generating a set of training data comprising a subset of a sequence of tokens from a first dataset in the plurality of datasets and a subset of a sequence of tokens from a second, different dataset in the plurality of datasets; and training the sequence model using the set of training data. 15 . The method of claim 14 , wherein generating the set of training data comprises: determining, for the sequence of tokens of each dataset in the plurality of datasets, a set of groups of tokens in the sequence of tokens; determining, for each group of tokens in the set of groups of tokens of each sequence of tokens, a length of the group of tokens, wherein generating the set of training data is based on the lengths of the groups of tokens. 16 . The method of claim 15 , wherein the set of training data comprises a data structure having a defined length, wherein generating the set of training data comprises: identifying a group of tokens in the plurality of datasets having the longest length equal to or less than the defined length of the data structure; and packing the data structure with the identified group of tokens. 17 . The method of claim 16 , wherein generating the set of training data further comprises iteratively packing the data structure with remaining groups of tokens in the plurality of datasets having the longest length that is equal to or less than a remaining length in the data structure. 18 . The method of claim 16 , wherein the data structure is a first data structure, wherein the set of training data further comprises a second data structure having the defined length, wherein generating the set of training data comprises: identifying a remaining group of tokens in the plurality of datasets having the longest

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • Supervised learning · CPC title

  • G06F40/284Primary

    Lexical analysis, e.g. tokenisation or collocates · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021365633A1 cover?
Embodiments of the present disclosure include systems and methods for packing tokens to train sequence models. In some embodiments, a plurality of datasets for training a sequence model is received. Each dataset in the plurality of datasets includes a sequence of correlated tokens. A set of training data is generated that includes a subset of a sequence of tokens from a first dataset in the plu…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 25 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).