Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06N3/084. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Nov 25 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Token Packing for Sequence Models

US2021365633A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2021365633-A1
Application number	US-202016881995-A
Country	US
Kind code	A1
Filing date	May 22, 2020
Priority date	May 22, 2020
Publication date	Nov 25, 2021
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure include systems and methods for packing tokens to train sequence models. In some embodiments, a plurality of datasets for training a sequence model is received. Each dataset in the plurality of datasets includes a sequence of correlated tokens. A set of training data is generated that includes a subset of a sequence of tokens from a first dataset in the plurality of datasets and a subset of a sequence of tokens from a second, different dataset in the plurality of datasets. The sequence model is trained using the set of training data.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system comprising: a set of processing units; and a non-transitory machine-readable medium storing instructions that when executed by at least one processing unit in the set of processing units cause the at least one processing unit to: receive a plurality of datasets for training a sequence model, each dataset in the plurality of datasets comprising a sequence of correlated tokens; generate a set of training data comprising a subset of a sequence of tokens from a first dataset in the plurality of datasets and a subset of a sequence of tokens from a second, different dataset in the plurality of datasets; and train the sequence model using the set of training data. 2 . The system of claim 1 , wherein generating the set of training data comprises: determining, for the sequence of tokens of each dataset in the plurality of datasets, a set of groups of tokens in the sequence of tokens; and determining, for each group of tokens in the set of groups of tokens of each sequence of tokens, a length of the group of tokens, wherein generating the set of training data is based on the lengths of the groups of tokens. 3 . The system of claim 2 , wherein the set of training data comprises a data structure having a defined length, wherein generating the set of training data comprises: identifying a group of tokens in the plurality of datasets having the longest length equal to or less than the defined length of the data structure; and packing the data structure with the identified group of tokens. 4 . The system of claim 3 , wherein generating the set of training data further comprises iteratively packing the data structure with remaining groups of tokens in the plurality of datasets having the longest length that is equal to or less than a remaining length in the data structure. 5 . The system of claim 3 , wherein the data structure is a first data structure, wherein the set of training data further comprises a second data structure having the defined length, wherein generating the set of training data comprises: identifying a remaining group of tokens in the plurality of datasets having the longest length equal to or less than the defined length of the data structure; and packing the second data structure with the identified group of tokens. 6 . The system of claim 1 , wherein generating the set of training data comprises adding label data to the set of training data indicating that the subset of the sequence of tokens from the first dataset and the subset of the sequence of tokens from the second, different dataset are not correlated. 7 . The method of claim 1 , wherein the sequence of correlated tokens in the first dataset is a first set of sentences from a first paragraph of text, wherein the sequence of correlated tokens in the second dataset is a second set of sentences from a second paragraph of text. 8 . A system comprising: a set of processing units; and a non-transitory machine-readable medium storing instructions that when executed by at least one processing unit in the set of processing units cause the at least one processing unit to: receive a set of input data for training a sequence model, the input data comprising a sequence of tokens; group the sequence of tokens into a set of groups of tokens; generate a set of training data comprising the set of groups of tokens and copies of at least a portion of a group of tokens in the set of groups of tokens; and train the sequence model using the set of training data. 9 . The system of claim 8 , wherein the copies of at least the portion of the group of tokens are copies of at least the portion of a first group of tokens, wherein the set of training data further comprises copies of at least a portion of a second group of tokens in the set of groups of tokens. 10 . The system of claim 9 , wherein generating the set of training data comprises: generating a data structure having a defined length; packing a first row of the data structure with the first group of tokens and the copies of at least the portion of the first group of tokens until the length of the first row of the data structure is filled up with tokens from the first group of tokens; and packing the second row of the data structure with the second group of tokens and the copies of at least the portion of the second group of tokens until the length of the second row of the data structure is filled up with tokens from the second group of tokens. 11 . The system of claim 10 , wherein the instructions further cause the at least one processing unit to: determine a first set of embeddings comprising an embedding for each token in the first group of tokens and the copies of at least the portion of the first group of tokens; determine a second set of embeddings comprising an embedding for each token in the second group of tokens and the copies of at least the portion of the second group of tokens; and add the first set of embeddings to the second set of embeddings. 12 . The system of claim 9 , wherein generating the set of training data comprises repeating the copies of at least the portion of the first group of tokens and the copies of at least the portion of the second group of tokens. 13 . The system of claim 8 , wherein generating the set of training data comprises: generating a data structure having a defined length; and packing the data structure with the set of groups of tokens and the copies of the at least one portion of the group of tokens in the set of groups of tokens so that a total number of tokens packed into the data structure is equal to the defined length. 14 . A method comprising: receiving a plurality of datasets for training a sequence model, each dataset in the plurality of datasets comprising a sequence of correlated tokens; generating a set of training data comprising a subset of a sequence of tokens from a first dataset in the plurality of datasets and a subset of a sequence of tokens from a second, different dataset in the plurality of datasets; and training the sequence model using the set of training data. 15 . The method of claim 14 , wherein generating the set of training data comprises: determining, for the sequence of tokens of each dataset in the plurality of datasets, a set of groups of tokens in the sequence of tokens; determining, for each group of tokens in the set of groups of tokens of each sequence of tokens, a length of the group of tokens, wherein generating the set of training data is based on the lengths of the groups of tokens. 16 . The method of claim 15 , wherein the set of training data comprises a data structure having a defined length, wherein generating the set of training data comprises: identifying a group of tokens in the plurality of datasets having the longest length equal to or less than the defined length of the data structure; and packing the data structure with the identified group of tokens. 17 . The method of claim 16 , wherein generating the set of training data further comprises iteratively packing the data structure with remaining groups of tokens in the plurality of datasets having the longest length that is equal to or less than a remaining length in the data structure. 18 . The method of claim 16 , wherein the data structure is a first data structure, wherein the set of training data further comprises a second data structure having the defined length, wherein generating the set of training data comprises: identifying a remaining group of tokens in the plurality of datasets having the longest

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/084Primary
Backpropagation, e.g. using gradient descent · CPC title
G06N3/09
Supervised learning · CPC title
G06F40/284Primary
Lexical analysis, e.g. tokenisation or collocates · CPC title

Patent family

Related publications grouped by family.

View patent family 76197572

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021365633A1 cover?: Embodiments of the present disclosure include systems and methods for packing tokens to train sequence models. In some embodiments, a plurality of datasets for training a sequence model is received. Each dataset in the plurality of datasets includes a sequence of correlated tokens. A set of training data is generated that includes a subset of a sequence of tokens from a first dataset in the plu…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Nov 25 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Learning method and information processing apparatus

Position Masking for Transformer Models

Token-position handling for sequence based neural networks

Training of model for processing sequence data

Address information feature extraction method based on deep neural network model

Frequently asked questions