Who is the assignee on this patent?

Huaneng Lancang River Hydropower Inc, Univ Hohai

What technology area does this patent fall under?

Primary CPC classification G06V10/7715. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 27 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Time-series image description method for dam defects based on local self-attention

Patent metadata
Field	Value
Publication number	US-12536775-B2
Application number	US-202318337409-A
Country	US
Kind code	B2
Filing date	Jun 19, 2023
Priority date	May 11, 2022
Publication date	Jan 27, 2026
Grant date	Jan 27, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A time-series image description method for dam defects based on local self-attention mechanism is provided, including: performing frame sampling on an input time-series image of dam defect, extracting a feature sequence using a convolutional neural network and using the sequence as an input to a self-attention encoder, where the encoder includes a Transformer network based on a variable self-attention mechanism that dynamically establishes contextual feature relations for each frame; generating description text using a long short term memory (LSTM) network based on a local attention mechanism to enable each word predicted to be feature related to an image frame, improving text generation accuracy by establishing a contextual dependency between image and text. A dynamic mechanism is added to the present application for calculating the global self-attention of image frames, and LSTM networks with added local attention directly establish the correspondence between image and text modal data.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A time-series image description method for dam defects based on local self-attention mechanism, comprising following steps: (1) performing frame sampling on an input time-series image and extracting a feature sequence using a convolutional neural network; (2) encoding the feature sequence of the time-series image using a Transformer network based on a variable self-attention mechanism to dynamically establish a contextual relation for each frame; and (3) generating description text using a long short term memory (LSTM) network based on a local attention mechanism to enable each word predicted to focus on a corresponding image frame; wherein in step (2), specific steps of encoding the feature sequence of the time-series image using a Transformer network based on a variable self-attention mechanism comprise: (2.1) obtaining a query vector q, a keyword vector k and a value vector v corresponding to each sampled frame using a linear fully-connected layer: q =Linear( X )= W Q X k =Linear( X )= W K X v =Linear( X )= W V X, where vector q directs a current feature image to selectively focus on contextual features in a time-based dimension; vector k is used to calculate attention weights of a current feature map and other feature maps; and vector v is used to add information from the current feature map to a self-attention weight; (2.2) adding a dot product of vector q and vector k to a current image block to obtain the attention weight as follows: Attention ⁢ ( q , k , v ) = softmax ⁢ ( q ⁢ k T d k ) ⁢ v , where d k is input vector dimension, obtained by dividing an input sequence dimension by a number of self-attention heads; vector q and vector k are dot-produced to obtain similarity scores of respective sequence elements, divided by √{square root over (d k )} for normalisation to ensure a stability of gradient propagation in the convolutional neural network; (2.3) introducing a multi-headed deformable coding structure into the Transformer network, enabling a model to sample and calculate attention weight for only a set of key frames around a current frame, namely assigning a certain number of keyword vectors k to the query vector q for each element in the sequence: Atten ⁢ ( z q , p q , X t ) = ∑ m = 1 K W m [ ∑ k = 1 K A m ⁢ q ⁢ k · W m ′ ⁢ x v ( p q + Δ ⁢ p m ⁢ q ⁢ k ) ] , where P q is a position reference point of the current frame, W m and W m ′ are weighted learnable feature matrices, Δp mqk and A mpk represent a sampling offset and a self-attention weight of a k th sampling point in a m th self-attention head, respectively, and are capable of being normalized as Σ k∈Ω A mpk =1, and are obtained by training through a fully-connected network and finally linearly projected into the query vector to obtain a sampled frame feature map {circumflex over (x)} t containing contextual information. 2 . The time-series image description method for dam defects based on local self-attention mechanism according to claim 1 , wherein in step (1), specific steps of performing frame sampling on an input time-series image and extracting a feature sequence using a convolutional neural network comprise: (1.1) dividing the input time-series image into T′ segments of equal length without overlap, randomly selecting a frame X, from each segment to form a set of [x 1 , x 2 , . . . , x T ]; and (1.2) using the convolutional neural network to process each sampled image frame, extracting a feature map as input to the self-attention encoder and recording as F t =[X 1 , X 2 , . . . , X T ], where X, is a feature representation of each sampled image frame. 3 . The time-series image description method for dam defects based on local self-attention mechanism according to claim 1 , wherein in step (3), specific s

Assignees

Inventors

Classifications

G06T2207/30184
Infrastructure · CPC title
G06T2207/30132
Masonry; Concrete · CPC title
G06V10/82
using neural networks · CPC title
G06V10/774
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
G06T7/174
involving the use of two or more images · CPC title

Patent family

Related publications grouped by family.

View patent family 88699305

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12536775B2 cover?: A time-series image description method for dam defects based on local self-attention mechanism is provided, including: performing frame sampling on an input time-series image of dam defect, extracting a feature sequence using a convolutional neural network and using the sequence as an input to a self-attention encoder, where the encoder includes a Transformer network based on a variable self-at…
Who is the assignee on this patent?: Huaneng Lancang River Hydropower Inc, Univ Hohai
What technology area does this patent fall under?: Primary CPC classification G06V10/7715. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 27 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Apparatus for processing labeled data to be used in learning of discriminator, method of controlling the apparatus, and non-transitory computer-readable recording medium

Information processing device, information processing method, and storage medium

Dense Video Captioning

Frequently asked questions