Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06N3/04. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Jul 06 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Transformer-based autoregressive language model selection

US2023214629A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2023214629-A1
Application number	US-202117566375-A
Country	US
Kind code	A1
Filing date	Dec 30, 2021
Priority date	Dec 30, 2021
Publication date	Jul 6, 2023
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Generally discussed herein are devices, systems, and methods for improving architecture search and identification with constraints. A method can include receiving, at a compute device, a request for a transformer-based autoregressive language model (TBALM), the request specifying a maximum latency, identifying TBALM architectures that satisfies the maximum latency, identifying a TBALM architecture of the identified TBALM architectures that has a greatest number of decoder parameters resulting in an identified TBALM architecture, and providing the identified TBALM architecture.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: receiving, at a compute device, a request for a transformer-based autoregressive language model (TBALM), the request specifying a maximum latency; identifying TBALM architectures that satisfy the maximum latency; identifying a TBALM architecture of the identified TBALM architectures that has a greatest number of decoder parameters resulting in an identified TBALM architecture; and providing the identified TBALM architecture. 2 . The method of claim 1 , wherein: the request further specifies a maximum amount of memory consumed by the TBALM; and identifying the TBALM architecture includes identifying the TBALM architecture of the respective architectures that (i) satisfies the maximum latency, (ii) satisfies the maximum amount of memory consumed, and (iii) has a greatest number of decoder parameters for the architectures that satisfy both the maximum latency and maximum amount of memory consumed resulting in the identified TBALM architecture. 3 . The method of claim 1 , further comprising using a total number of decoder parameters of the architecture as a proxy for architecture accuracy. 4 . The method of claim 3 , wherein the total number of decoder parameters includes weights and biases of only the decoder of the TBALM architecture. 5 . The method of claim 4 , wherein the decoder parameters include, of the identified TBALM architecture: weights of attention heads; model dimensions; inner dimension of a feed forward network (FFN); and number of decoder layers. 6 . The method of claim 3 , wherein: the compute device is a client device; and the method further comprises generating, by the compute device, a pareto curve of number of decoder parameters versus latency for a variety of TBALM architectures, based on a processor of the compute device resulting in a generated pareto curve. 7 . The method of claim 6 , wherein identifying the TBALM architecture of the respective architectures that (i) satisfies the maximum latency and (ii) has a greatest number of decoder parameters for the architectures that satisfy the maximum latency includes selecting the TBALM corresponding to a point at a boundary of the generated pareto curve. 8 . A device comprising: processing circuitry; a memory including instructions that, when executed by the processing circuitry, cause the processing circuitry to perform operations comprising: receiving a request for a transformer-based autoregressive language model (TBALM), the request specifying a maximum latency; identifying TBALM architectures that satisfy the maximum latency; identifying a TBALM architecture of the identified TBALM architectures that has a greatest number of decoder parameters resulting in an identified TBALM architecture; and providing the identified TBALM architecture. 9 . The system of claim 8 , wherein: the request further specifies a maximum amount of memory consumed by the TBALM; and identifying the TBALM architecture includes identifying the TBALM architecture of the respective architectures that (i) satisfies the maximum latency, (ii) satisfies the maximum amount of memory consumed, and (iii) has a greatest number of decoder parameters for the architectures that satisfy both the maximum latency and maximum amount of memory consumed resulting in the identified TBALM architecture. 10 . The system of claim 8 , wherein the operations further comprise using a total number of decoder parameters of the architecture as a proxy for architecture accuracy. 11 . The system of claim 10 , wherein the total number of decoder parameters includes weights and biases of only the decoder of the TBALM architecture. 12 . The system of claim 11 , wherein the decoder parameters include, of the identified TBALM architecture: weights of attention heads; model dimensions; inner dimension of a feed forward network (FFN); and number of decoder layers. 13 . The system of claim 10 , wherein: the processing circuitry is part of a client device; and the operations further comprise generating, by the compute device, a pareto curve of number of decoder parameters versus latency for a variety of TBALM architectures, based on a processor of the compute device resulting in a generated pareto curve. 14 . The system of claim 13 , wherein identifying the TBALM architecture of the respective architectures that (i) satisfies the maximum latency and (ii) has a greatest number of decoder parameters for the architectures that satisfy the maximum latency includes selecting the TBALM corresponding to a point at a boundary of the generated pareto curve. 15 . A machine-readable medium including instructions that, when executed by a machine, cause the machine to perform operations comprising: receiving a request for a transformer-based autoregressive language model (TBALM), the request specifying a maximum latency; identifying TBALM architectures that satisfy the maximum latency; identifying a TBALM architecture of the identified TBALM architectures that has a greatest number of decoder parameters resulting in an identified TBALM architecture; and providing the identified TBALM architecture. 16 . The machine-readable medium of claim 15 , wherein: the request further specifies a maximum amount of memory consumed by the TBALM; and identifying the TBALM architecture includes identifying the TBALM architecture of the respective architectures that (i) satisfies the maximum latency, (ii) satisfies the maximum amount of memory consumed, and (iii) has a greatest number of decoder parameters for the architectures that satisfy both the maximum latency and maximum amount of memory consumed resulting in the identified TBALM architecture. 17 . The machine-readable medium of claim 15 , wherein the operations further comprise using a total number of decoder parameters of the architecture as a proxy for architecture accuracy. 18 . The machine-readable medium of claim 17 , wherein the total number of decoder parameters includes weights and biases of only the decoder of the TBALM architecture. 19 . The machine-readable medium of claim 18 , wherein the decoder parameters include, of the identified TBALM architecture: weights of attention heads; model dimensions; inner dimension of a feed forward network (FFN); and number of decoder layers. 20 . The machine-readable medium of claim 17 , wherein: the machine is a client device; and the operations further comprise generating, by the compute device, a pareto curve of number of decoder parameters versus latency for a variety of TBALM architectures, based on a processor of the compute device resulting in a generated pareto curve.

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06N3/04Primary
Architecture, e.g. interconnection topology · CPC title
G06K9/6262
Physics · mapped topic
G06F40/274
Converting codes to words; Guess-ahead of partial word inputs · CPC title
G06N3/0455Primary
Auto-encoder networks; Encoder-decoder networks · CPC title
G06N3/082
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

Patent family

Related publications grouped by family.

View patent family 85036518

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2023214629A1 cover?: Generally discussed herein are devices, systems, and methods for improving architecture search and identification with constraints. A method can include receiving, at a compute device, a request for a transformer-based autoregressive language model (TBALM), the request specifying a maximum latency, identifying TBALM architectures that satisfies the maximum latency, identifying a TBALM architect…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06N3/04. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Jul 06 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Neural architecture search system and search method

Device and computer-implemented method for a neural architecture search

Electronic device and method for controlling the electronic device thereof

Systems and methods for auto machine learning and neural architecture search

Frequently asked questions