Apparatus for processing labeled data to be used in learning of discriminator, method of controlling the apparatus, and non-transitory computer-readable recording medium
US-2020279132-A1 · Sep 3, 2020 · US
US12536775B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12536775-B2 |
| Application number | US-202318337409-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 19, 2023 |
| Priority date | May 11, 2022 |
| Publication date | Jan 27, 2026 |
| Grant date | Jan 27, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A time-series image description method for dam defects based on local self-attention mechanism is provided, including: performing frame sampling on an input time-series image of dam defect, extracting a feature sequence using a convolutional neural network and using the sequence as an input to a self-attention encoder, where the encoder includes a Transformer network based on a variable self-attention mechanism that dynamically establishes contextual feature relations for each frame; generating description text using a long short term memory (LSTM) network based on a local attention mechanism to enable each word predicted to be feature related to an image frame, improving text generation accuracy by establishing a contextual dependency between image and text. A dynamic mechanism is added to the present application for calculating the global self-attention of image frames, and LSTM networks with added local attention directly establish the correspondence between image and text modal data.
Opening claim text (preview).
The invention claimed is: 1 . A time-series image description method for dam defects based on local self-attention mechanism, comprising following steps: (1) performing frame sampling on an input time-series image and extracting a feature sequence using a convolutional neural network; (2) encoding the feature sequence of the time-series image using a Transformer network based on a variable self-attention mechanism to dynamically establish a contextual relation for each frame; and (3) generating description text using a long short term memory (LSTM) network based on a local attention mechanism to enable each word predicted to focus on a corresponding image frame; wherein in step (2), specific steps of encoding the feature sequence of the time-series image using a Transformer network based on a variable self-attention mechanism comprise: (2.1) obtaining a query vector q, a keyword vector k and a value vector v corresponding to each sampled frame using a linear fully-connected layer: q =Linear( X )= W Q X k =Linear( X )= W K X v =Linear( X )= W V X, where vector q directs a current feature image to selectively focus on contextual features in a time-based dimension; vector k is used to calculate attention weights of a current feature map and other feature maps; and vector v is used to add information from the current feature map to a self-attention weight; (2.2) adding a dot product of vector q and vector k to a current image block to obtain the attention weight as follows: Attention ( q , k , v ) = softmax ( q k T d k ) v , where d k is input vector dimension, obtained by dividing an input sequence dimension by a number of self-attention heads; vector q and vector k are dot-produced to obtain similarity scores of respective sequence elements, divided by √{square root over (d k )} for normalisation to ensure a stability of gradient propagation in the convolutional neural network; (2.3) introducing a multi-headed deformable coding structure into the Transformer network, enabling a model to sample and calculate attention weight for only a set of key frames around a current frame, namely assigning a certain number of keyword vectors k to the query vector q for each element in the sequence: Atten ( z q , p q , X t ) = ∑ m = 1 K W m [ ∑ k = 1 K A m q k · W m ′ x v ( p q + Δ p m q k ) ] , where P q is a position reference point of the current frame, W m and W m ′ are weighted learnable feature matrices, Δp mqk and A mpk represent a sampling offset and a self-attention weight of a k th sampling point in a m th self-attention head, respectively, and are capable of being normalized as Σ k∈Ω A mpk =1, and are obtained by training through a fully-connected network and finally linearly projected into the query vector to obtain a sampled frame feature map {circumflex over (x)} t containing contextual information. 2 . The time-series image description method for dam defects based on local self-attention mechanism according to claim 1 , wherein in step (1), specific steps of performing frame sampling on an input time-series image and extracting a feature sequence using a convolutional neural network comprise: (1.1) dividing the input time-series image into T′ segments of equal length without overlap, randomly selecting a frame X, from each segment to form a set of [x 1 , x 2 , . . . , x T ]; and (1.2) using the convolutional neural network to process each sampled image frame, extracting a feature map as input to the self-attention encoder and recording as F t =[X 1 , X 2 , . . . , X T ], where X, is a feature representation of each sampled image frame. 3 . The time-series image description method for dam defects based on local self-attention mechanism according to claim 1 , wherein in step (3), specific s
Infrastructure · CPC title
Masonry; Concrete · CPC title
using neural networks · CPC title
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
involving the use of two or more images · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.