Model interpretation method, image processing method, electronic device, and storage medium

US12530879B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12530879-B2
Application numberUS-202318099551-A
CountryUS
Kind codeB2
Filing dateJan 20, 2023
Priority dateSep 15, 2022
Publication dateJan 20, 2026
Grant dateJan 20, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided is a model interpretation method, an image processing method, an electronic device and a storage medium, relating to the field of artificial intelligence, in particular to the field of deep learning. The model interpretation method includes: obtaining a token vector corresponding to an image feature input to a first model; obtaining a model prediction result output by the first model; and determining, according to a combination of an attention weight and a gradient, an association relation between the token vector input to the first model and the model prediction result output by the first model, where the association relation is used to characterize interpretability of the first model.

First claim

Opening claim text (preview).

What is claimed is: 1 . A model interpretation method, comprising: obtaining a token vector corresponding to an image feature input to a first model, wherein the token vector corresponding to the image feature is a token-level vector, in which an image is divided into fixed-size patches without overlapping, each patch of the fixed-size patches is pulled into a one-dimensional vector, and all one-dimensional vectors of the fixed-size patches are recorded as a Classification Token (CLS) sequence; obtaining a model prediction result output by the first model; performing, according to an attention weight, perception of model interpretation, to obtain a first interpretation result, comprising one of: adopting an estimation method based on a feature token, to obtain the first interpretation result, and adopting an estimation method based on an attention head, to obtain the first interpretation result; solving an integral gradient from the attention weight, to obtain a gradient of the attention weight; performing, according to the gradient of the attention weight, decision-making of the model interpretation, to obtain a second interpretation result; and performing point-multiplication on the first interpretation result and the second interpretation result, to obtain an association relation between the token vector input to the first model and the model prediction result output by the first model, wherein the association relation is used to characterize interpretability of the first model. 2 . The method of claim 1 , wherein in a case where the estimation method based on the feature token is adopted, performing, according to the attention weight, the perception of the model interpretation, to obtain the first interpretation result, comprises: weighting, for a self-attention module in the first model, the token vector with a first attention weight, to obtain an association relation based on the token vector, wherein the first attention weight is weights for different token vectors; and performing, according to the association relation based on the token vector, the perception of the model interpretation, to obtain the first interpretation result. 3 . The method of claim 1 , wherein in a case where the estimation method based on the attention head is adopted, performing, according to the attention weight, the perception of the model interpretation, to obtain the first interpretation result, comprises: weighting, for a self-attention module in the first model, the token vector with a second attention weight, to obtain an association relation based on the attention head, wherein the second attention weight is weights for different attention heads; and performing, according to the association relation based on the attention head, the perception of the model interpretation, to obtain the first interpretation result. 4 . The method of claim 1 , wherein the first model is a trained model, or a model to be trained. 5 . An image processing method, comprising: inputting a token vector corresponding to an image feature to be processed to a first model, to execute an image processing including at least one of image classification, image recognition, or image segmentation, wherein the first model obtains an association relation between the token vector input to the first model and a model prediction result output by the first model, according to the model interpretation method of claim 1 , and the association relation is used to characterize interpretability of the first model; and executing at least one of following processing by adopting the association relation: performing, according to the association relation, compensatory processing on the model prediction result output by the first model; performing, according to the association relation, reliability assessment processing on the first model; or performing, according to the association relation, traceability processing on the first model. 6 . An electronic device, comprising: at least one processor; and a memory connected in communication with the at least one processor; wherein the memory stores an instruction executable by the at least one processor, and the instruction, when executed by the at least one processor, enables the at least one processor to execute operations, comprising: inputting a token vector corresponding to an image feature to be processed to a first model, to execute an image processing including at least one of image classification, image recognition, or image segmentation, wherein the first model obtains an association relation between the token vector input to the first model and a model prediction result output by the first model, according to the model interpretation method of claim 1 , and the association relation is used to characterize interpretability of the first model; and executing at least one of following processing by adopting the association relation: performing, according to the association relation, compensatory processing on the model prediction result output by the first model; performing, according to the association relation, reliability assessment processing on the first model; or performing, according to the association relation, traceability processing on the first model. 7 . A non-transitory computer-readable storage medium storing a computer instruction thereon, wherein the computer instruction is used to cause a computer to execute operations, comprising: inputting a token vector corresponding to an image feature to be processed to a first model, to execute an image processing including at least one of image classification, image recognition, or image segmentation, wherein the first model obtains an association relation between the token vector input to the first model and a model prediction result output by the first model, according to the model interpretation method of claim 1 , and the association relation is used to characterize interpretability of the first model; and executing at least one of following processing by adopting the association relation: performing, according to the association relation, compensatory processing on the model prediction result output by the first model; performing, according to the association relation, reliability assessment processing on the first model; or performing, according to the association relation, traceability processing on the first model. 8 . An electronic device, comprising: at least one processor; and a memory connected in communication with the at least one processor; wherein the memory stores an instruction executable by the at least one processor, and the instruction, when executed by the at least one processor, enables the at least one processor to execute operations, comprising: obtaining a token vector corresponding to an image feature input to a first model, wherein the token vector corresponding to the image feature is a token-level vector, in which an image is divided into fixed-size patches without overlapping, each patch of the fixed-size patches is pulled into a one-dimensional vector, and all one-dimensional vectors of the fixed-size patches are recorded as a Classification Token (CLS) sequence; obtaining a model prediction result output by the first model; performing, according to an attention weight, perception of model interpretation, to obtain a first interpretation result, by one of: adopting an estimation method based on a feature token, to obtain the first interpretation result, and adopting an estimation method based on an attention head, to obtain the first interpretation result; solving an integral gradient from the attention weight, to obtain a gradient of the attention weight; performing, according to the gradient of the attent

Assignees

Inventors

Classifications

  • G06V10/776Primary

    Validation; Performance evaluation · CPC title

  • using electronic means · CPC title

  • Learning methods · CPC title

  • G06V10/82Primary

    using neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12530879B2 cover?
Provided is a model interpretation method, an image processing method, an electronic device and a storage medium, relating to the field of artificial intelligence, in particular to the field of deep learning. The model interpretation method includes: obtaining a token vector corresponding to an image feature input to a first model; obtaining a model prediction result output by the first model; …
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06V10/776. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 20 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).