Who is the assignee on this patent?

IBM, Massachusetts Inst Technology

What technology area does this patent fall under?

Primary CPC classification G06V10/26. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 26 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Interpretability-aware redundancy reduction for vision transformers

US12154307B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12154307-B2
Application number	US-202117559053-A
Country	US
Kind code	B2
Filing date	Dec 22, 2021
Priority date	Dec 22, 2021
Publication date	Nov 26, 2024
Grant date	Nov 26, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A sequence of patch tokens representing an image can be received. A network can be trained to learn informative patch tokens and uninformative patch tokens in the sequence of patch tokens, in learning to recognize an object in the image. The sequence of patch tokens can be reduced by removing the uninformative patch tokens from the sequence of patch tokens. The reduced sequence of patch tokens can be input to an attention-based deep learning neural network. The attention-based deep learning neural network can be fine-tuned to recognize the object in the image using the reduced sequence of patch tokens.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving a sequence of patch tokens representing an image; training a network to learn informative patch tokens and uninformative patch tokens in the sequence of patch tokens, in learning to recognize an object in the image, wherein the network includes a linear layer, wherein the sequence of patch tokens is input to the linear layer and the linear layer outputs a binary vector by applying a policy token to the sequence of patch tokens and feeding result of applying the policy token to the patch tokens into an activation function, the binary vector indicating whether the sequence of patch tokens input to the linear layer is activated or deactivated, wherein a value of the policy token is learned based on reinforcement learning during the training of the network; reducing the sequence of patch tokens by removing the uninformative patch tokens from the sequence of patch tokens; and inputting the reduced sequence of patch tokens to an attention-based deep learning neural network; fine-tuning the attention-based deep learning neural network to recognize the object in the image using the reduced sequence of patch tokens, wherein the attention-based deep learning neural network is divided into D number of groups, each group in the D number of groups including the network's linear layer and L blocks of multi-head self-attention layer and feed-forward network. 2. The method of claim 1 , wherein the network includes a multi-headed module connected to the attention-based deep learning neural network. 3. The method of claim 1 , wherein the attention-based deep learning neural network includes a vision transformer. 4. The method of claim 1 , wherein the network includes a linear layer, wherein the sequence of patch tokens input to the linear layer is activated or deactivated based on applying an activation function and a policy token. 5. The method of claim 1 , wherein training of the network and fine-tuning the attention-based deep learning neural network are performed together, wherein parameters learned by the network is used in fine-tuning the attention-based deep learning neural network. 6. The method of claim 1 , wherein the sequence of patch tokens have positional embeddings. 7. The method of claim 1 , wherein the network is optimized using reinforcement learning based on a prediction of the attention-based deep learning neural network. 8. A system comprising: a processor; and a memory device coupled with the processor; the processor configured to at least: receive a sequence of patch tokens representing an image; train a network to learn informative patch tokens and uninformative patch tokens in the sequence of patch tokens, in learning to recognize an object in the image, wherein the network includes a linear layer, wherein the sequence of patch tokens is input to the linear layer and the linear layer outputs a binary vector by applying a policy token to the sequence of patch tokens and feeding result of applying the policy token to the patch tokens into an activation function, the binary vector indicating whether the sequence of patch tokens input to the linear layer is activated or deactivated, wherein a value of the policy token is learned based on reinforcement learning during the training of the network; reduce the sequence of patch tokens by removing the uninformative patch tokens from the sequence of patch tokens; and input the reduced sequence of patch tokens to an attention-based deep learning neural network; fine-tune the attention-based deep learning neural network to recognize the object in the image using the reduced sequence of patch tokens, wherein the attention-based deep learning neural network is divided into D number of groups, each group in the D number of groups including the network's linear layer and L blocks of multi-head self-attention layer and feed-forward network. 9. The system of claim 8 , wherein the network includes a multi-headed module connected to the attention-based deep learning neural network. 10. The system of claim 8 , wherein the attention-based deep learning neural network includes a vision transformer. 11. The system of claim 8 , wherein the network includes a linear layer, wherein the sequence of patch tokens input to the linear layer is activated or deactivated based on applying an activation function and a policy token. 12. The system of claim 8 , wherein the processor is configured to train the network and fine-tune the attention-based deep learning neural network together, wherein parameters learned by the network is used in fine-tuning the attention-based deep learning neural network. 13. The system of claim 8 , wherein the sequence of patch tokens have positional embeddings. 14. The system of claim 8 , wherein the network is optimized using reinforcement learning based on a reward determined based on a prediction of the attention-based deep learning neural network. 15. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions readable by a device to cause the device to: receive a sequence of patch tokens representing an image; train a network to learn informative patch tokens and uninformative patch tokens in the sequence of patch tokens, in learning to recognize an object in the image, wherein the network includes a linear layer, wherein the sequence of patch tokens is input to the linear layer and the linear layer outputs a binary vector by applying a policy token to the sequence of patch tokens and feeding result of applying the policy token to the patch tokens into an activation function, the binary vector indicating whether the sequence of patch tokens input tothe linear layer is activated or deactivated, wherein a value of the policy token is learned based on reinforcement learning during the training of the network; reduce the sequence of patch tokens by removing the uninformative patch tokens from the sequence of patch tokens; and input the reduced sequence of patch tokens to an attention-based deep learning neural network; fine-tune the attention-based deep learning neural network to recognize the object in the image using the reduced sequence of patch tokens, wherein the attention-based deep learning neural network is divided into D number of groups, each group in the D number of groups including the network's linear layer and L blocks of multi-head self-attention layer and feed-forward network. 16. The computer program product of claim 15 , wherein the network includes a multi-headed module connected to the attention-based deep learning neural network. 17. The computer program product of claim 15 , wherein the attention-based deep learning neural network includes a vision transformer. 18. The computer program product of claim 15 , wherein the network includes a linear layer, wherein the sequence of patch tokens input to the linear layer is activated or deactivated based on applying an activation function and a policy token. 19. The computer program product of claim 15 , wherein the device is caused to train the network and fine-tune the attention-based deep learning neural network together, wherein parameters learned by the network is used in fine-tuning the attention-based deep learning neural network. 20. The computer program product of claim 15 , wherein the device is caused to optimize the network using reinforcement learning based on a reward determined based on a prediction of the attent

Assignees

Inventors

Classifications

G06T7/136
involving thresholding · CPC title
G06T2207/20081
Training; Learning · CPC title
G06T7/11
Region-based segmentation · CPC title
G06T2207/20084
Artificial neural networks [ANN] · CPC title
G06N3/08
Learning methods · CPC title

Patent family

Related publications grouped by family.

View patent family 86768595

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12154307B2 cover?: A sequence of patch tokens representing an image can be received. A network can be trained to learn informative patch tokens and uninformative patch tokens in the sequence of patch tokens, in learning to recognize an object in the image. The sequence of patch tokens can be reduced by removing the uninformative patch tokens from the sequence of patch tokens. The reduced sequence of patch tokens …
Who is the assignee on this patent?: IBM, Massachusetts Inst Technology
What technology area does this patent fall under?: Primary CPC classification G06V10/26. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 26 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Adaptive Token Sampling for Efficient Transformer

Automatic estimation of tumor cellularity using a dpi ai platform

End-to-End Attention Pooling-Based Classification Method for Histopathology Images

Method and system for detecting actions in videos

Filtering methods for visual object detection

Target object classification using three-dimensional geometric filtering

Boosting object detection performance in videos

Frequently asked questions