What technology area does this patent fall under?

Primary CPC classification G06F18/213. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 14 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Lightweight transformer for high resolution images

US11983239B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11983239-B2
Application number	US-202117342483-A
Country	US
Kind code	B2
Filing date	Jun 8, 2021
Priority date	Jun 8, 2021
Publication date	May 14, 2024
Grant date	May 14, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for obtaining attention features are described. Some examples may include: receiving, at a projector of a transformer, a plurality of tokens associated with image features of a first dimensional space; generating, at the projector of the transformer, projected features by concatenating the plurality of tokens with a positional map, the projected features having a second dimensional space that is less than the first dimensional space; receiving, at an encoder of the transformer, the projected features and generating encoded representations of the projected features using self-attention; decoding, at a decoder of the transformer, the encoded representations and obtaining a decoded output; and projecting the decoded output to the first dimensional space and adding the image features of the first dimensional space to obtain attention features associated with the image features.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of obtaining attention features, the method comprising: receiving, at a projector of a transformer, a plurality of tokens associated with image features of a first dimensional space; generating, at the projector of the transformer, projected features by concatenating the plurality of tokens with a positional map, the projected features having a second dimensional space that is less than the first dimensional space; receiving, at an encoder of the transformer, the projected features and generating encoded representations of the projected features using self-attention; decoding, at a decoder of the transformer, the encoded representations of the projected features and obtaining a decoded output; and projecting the decoded output to the first dimensional space and adding the image features of the first dimensional space to obtain attention features associated with the image features. 2. The method of claim 1 , further comprising: applying, at the encoder of the transformer, self-attention to the projected features using a multi-head self-attention configuration, the multi-head self-attention configuration receiving the projected features as keys, values, and queries from the projector. 3. The method of claim 2 , further comprising: combining a result of applying the self-attention to the projected features with the keys, values, and queries from the projector to generate encoder self-attention residential output; and processing the encoder self-attention residual output to generate the encoded representations of the projected features. 4. The method of claim 2 , further comprising: applying, at the decoder of the transformer, self-attention to the encoded representations of the projected features using a multi-head self-attention configuration, the multi-head self-attention configuration receiving as input, keys and values from the encoder and one or more semantic embeddings as queries. 5. The method of claim 4 , further comprising: combining a result of applying the self-attention to the encoded representations of the projected features with the keys and values from the encoder and one or more semantic embeddings to generate decoder self-attention residual output; and processing the decoder self-attention residual output to generate the decoded output, wherein the decoded output is at the second dimensional space. 6. The method of claim 1 , wherein the projected features are obtained using a bilinear interpolation. 7. The method of claim 1 , wherein the positional map includes a two-dimensional positional map. 8. A system, comprising: one or more storage devices storing instructions that when executed by one or more hardware processors, cause the one or more hardware processors to implement a neural network for generating image attention features by processing image features combined with a two-dimensional position map, the neural network comprising: a projector of a transformer configured to receive a plurality of tokens associated with image features of a first dimensional space and generate projected features by concatenating the plurality of tokens with the two-dimensional positional map, the projected features having a second dimensional space that is less than the first dimensional space; an encoder of the transformer configured to receive projected features and generate encoded representations of the projected features using self-attention; and a decoder configured to decode the encoded representations of the projected features and obtain a decoded output, wherein the decoded output is projected to the first dimensional space and combined with the image features of the first dimensional space to obtain the attention features. 9. The system of claim 8 , wherein the encoder is configured to apply, at the encoder of the transformer, self-attention to the projected features using a multi-head self-attention configuration, the multi-head self-attention configuration receiving the projected features as keys, values, and queries from the projector. 10. The system of claim 9 , wherein the encoder is configured to: combine a result of applying the self-attention to the projected features with the keys, values, and queries from the projector to generate encoder self-attention residential output; and process the encoder self-attention residual output to generate the encoded representations of the projected features. 11. The system of claim 9 , wherein the decoder of the transformer is configured to apply self-attention to the encoded representations of the projected features using a multi-head self-attention configuration, the multi-head self-attention configuration receiving as input, keys and values from the encoder and one or more semantic embeddings as queries. 12. The system of claim 11 , wherein the decoder is configured to: combine a result of applying the self-attention to the encoded representations of the projected features with the keys and values from the encoder and one or more semantic embeddings to generate decoder self-attention residential output; and process the decoder self-attention residual output to generate the decoded output, wherein the decoded output is at the second dimensional space. 13. The system of claim 8 , wherein the projected features are obtained using a bilinear interpolation. 14. A non-transitory computer-readable storage medium comprising instructions being executable by one or more processors to perform a method, the method comprising: receiving, at a projector of a transformer, a plurality of tokens associated with image features of a first dimensional space; generating, at the projector of the transformer, projected features by concatenating the plurality of tokens with a positional map, the projected features having a second dimensional space that is less than the first dimensional space; receiving, at an encoder of the transformer, the projected features and generating encoded representations of the projected features using self-attention; decoding, at a decoder of the transformer, the encoded representations of the projected features and obtaining a decoded output; and projecting the decoded output to the first dimensional space and adding the image features of the first dimensional space to obtain attention features associated with the image features. 15. The computer-readable storage medium of claim 14 , wherein the method further includes applying, at the encoder of the transformer, self-attention to the projected features using a multi-head self-attention configuration, the multi-head self-attention configuration receiving the projected features as keys, values, and queries from the projector. 16. The computer-readable storage medium of claim 15 , wherein the method further includes: combining a result of applying the self-attention to the projected features with the keys, values, and queries from the projector to generate encoder self-attention residential output; and processing the encoder self-attention residual output to generate the encoded representations of the projected features. 17. The computer-readable storage medium of claim 15 , wherein the method further includes applying, at the decoder of the transformer, self-attention to the encoded representations of the projected features using a multi-head self-attention configuration, the multi-head self-attention configuration receiving as input, keys and values from the encoder and one or more semantic embeddings as queries. 18. The computer-readable storage medium of claim 17 , wherein the

Assignees

Lemon Inc

Inventors

Classifications

G06N3/0495
Quantised networks; Sparse networks; Compressed networks · CPC title
G06N3/0985
Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title
G06N3/082
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

View patent family 84284632

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11983239B2 cover?: Systems and methods for obtaining attention features are described. Some examples may include: receiving, at a projector of a transformer, a plurality of tokens associated with image features of a first dimensional space; generating, at the projector of the transformer, projected features by concatenating the plurality of tokens with a positional map, the projected features having a second dime…
Who is the assignee on this patent?: Lemon Inc
What technology area does this patent fall under?: Primary CPC classification G06F18/213. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 14 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).