Processing images using self-attention based neural networks
US-12125247-B2 · Oct 22, 2024 · US
US2024013521A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2024013521-A1 |
| Application number | US-202118033305-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 15, 2021 |
| Priority date | Dec 15, 2020 |
| Publication date | Jan 11, 2024 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method for restoring a sequence for a dataset with frame dropping includes receiving an input sequence. A set of features is extracted from the input sequence. A frequency distribution is determined for the input sequence based on the extracted features. Time domain information for the sequence is restored and in turn, data for the input sequence is augmented based on the restored time domain information. Additionally, noise is removed from the input sequence.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method comprising: receiving an input sequence; extracting a set of features from the input sequence; determining a frequency distribution for the input sequence based on the extracted features; restoring time domain information for the input sequence by performing an inverse fast Fourier transformation on the frequency distribution; augmenting data for the input sequence by decoding the restored time domain information; and classifying the input sequence based on the augmented data. 2 . The computer-implemented method of claim 1 , in which a full input sequence is restored. 3 . The computer-implemented method of claim 2 , in which the full input sequence is restored based at least in part on an average sample dropping ratio for the input sequence. 4 . The computer-implemented method of claim 1 , further comprising restoring an order of the input sequence. 5 . The computer-implemented method of claim 1 , in which the input sequence comprises a sequence of range-Doppler images. 6 . The computer-implemented method of claim 5 , in which the range-Doppler images correspond to one or more hand gestures. 7 . The computer-implemented method of claim 1 , further comprising determining a length of a cycle of the input sequence. 8 . The computer-implemented method of claim 1 , further comprising extracting at least one noise portion from the input sequence. 9 . A computer-implemented method, comprising: receiving a sequence including one or more motion portions and one or more noise portions; extracting features representing the sequence; identifying one or more of the noise portions via an artificial neural network (ANN), the ANN trained to identify noise based on the extracted features; and removing the identified noise portions of the sequence. 10 . The computer-implemented method of claim 9 , further comprising: segmenting the sequence into multiple sequence segments; and determining a prediction of whether each sequence segment includes the noise. 11 . The computer-implemented method of claim 10 , in which the multiple sequence segments are defined according to a sliding window having a predefined length. 12 . The computer-implemented method of claim 10 , in which the predefined length is proportional to one or more of a continuous duration of a gesture or the sampling rate of the input sequence. 13 . The computer-implemented method of claim 10 , in which for an overlapped portion with adjacent windows with a different prediction, a boundary determination is based on a half portion of the overlapped portion. 14 . The computer-implemented method of claim 10 , in which, for an overlapped portion with adjacent windows with a different prediction, the overlapped portion is identified as noise. 15 . An apparatus comprising: a memory; and at least one processor coupled to the memory, the at least one processor being configured: to receive an input sequence; to extract a set of features from the input sequence; to determine a frequency distribution for the input sequence based on the extracted features; to restore time domain information for the input sequence by performing an inverse fast Fourier transformation on the frequency distribution; to augment data for the input sequence by decoding the restored time domain information; and to classify the input sequence based on the augmented data. 16 . The apparatus of claim 15 , in which the at least one processor is further configured to restore a full input sequence. 17 . The apparatus of claim 16 , in which the at least one processor is further configured to restore a full input sequence based at least in part on an average sample dropping ratio for the input sequence. 18 . The apparatus of claim 15 , in which the at least one processor is further configured to restore an order of the input sequence. 19 . The apparatus of claim 15 , in which the input sequence comprises a sequence of range-Doppler images. 20 . The apparatus of claim 19 , in which the range-Doppler images correspond to one or more hand gestures. 21 . The apparatus of claim 15 , in which the at least one processor is further configured to determine a length of a cycle of the input sequence. 22 . The apparatus of claim 15 , in which the at least one processor is further configured to extract at least one noise portion from the input sequence. 23 . An apparatus comprising: a memory; and at least one processor coupled to the memory, the at least one processor being configured: to receive a sequence including one or more motion portions and one or more noise portions; to extract features representing the sequence; to identify one or more of the noise portions via an artificial neural network (ANN), the ANN trained to identify noise based on the extracted features; and to remove the identified noise portions of the sequence. 24 . The apparatus of claim 23 , in which the at least one processor is further configured: to segment the sequence into multiple sequence segments; and to determine a prediction of whether each sequence segment includes the noise. 25 . The apparatus of claim 24 , in which the at least one processor is further configured to define the multiple sequence segments according to a sliding window having a predefined length. 26 . The apparatus of claim 24 , in which the predefined length is proportional to one or more of a continuous duration of a gesture or the sampling rate of the input sequence. 27 . The apparatus of claim 24 , in which the at least one processor is further configured to determine, for an overlapped portion with adjacent windows with a different prediction, a boundary based on a half portion of the overlapped portion. 28 . The apparatus of claim 24 , in which the at least one processor is further configured to identify, for an overlapped portion with adjacent windows with a different prediction, the overlapped portion as noise.
Auto-encoder networks; Encoder-decoder networks · CPC title
Supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Recognition of hand or arm movements, e.g. recognition of deaf sign language (static hand signs G06V40/113) · CPC title
using neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.