Attention Based Feature Compression and Localization for Autonomous Devices
US-2020160117-A1 · May 21, 2020 · US
US11449713B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11449713-B2 |
| Application number | US-201916598629-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 10, 2019 |
| Priority date | Nov 16, 2018 |
| Publication date | Sep 20, 2022 |
| Grant date | Sep 20, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, methods, tangible non-transitory computer-readable media, and devices associated with object localization and generation of compressed feature representations are provided. For example, a computing system can access training data including a target feature representation and a source feature representation. An attention feature representation can be generated based on the target feature representation and a machine-learned attention model. An attended target feature representation can be generated based on masking the target feature representation with the attention feature representation. A matching score for the source feature representation and the target feature representation can be determined. A loss associated with the matching score and a ground-truth matching score for the source feature representation and the target feature representation can be determined. Furthermore, parameters of the machine-learned attention model can be adjusted based on the loss.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for training machine-learned models, the computer-implemented method comprising: accessing training data comprising a target feature representation and a source feature representation; generating an attention feature representation based at least in part on the target feature representation and a machine-learned attention model; generating an attended target feature representation based at least in part on masking the target feature representation with the attention feature representation; determining a matching score based at least in part on application of a matching function to the source feature representation and the attended target feature representation; determining a loss associated with the matching score and a ground-truth matching score for the source feature representation and the target feature representation; and adjusting one or more parameters of the machine-learned attention model based at least in part on the loss. 2. The computer-implemented method of claim 1 , further comprising: generating the training data comprising at least one of the source feature representation and the target feature representation based at least in part on one or more machine-learned feature extraction models. 3. The computer-implemented method of claim 1 , wherein the generating the attended target feature representation based at least in part on masking the target feature representation with the attention feature representation comprises: performing one or more content-aware band pass filtering operations that mask one or more portions of the attended target feature representation based at least in part on attention to specific bands in a frequency domain. 4. The computer-implemented method of claim 1 , wherein the determining the matching score based at least in part on application of a matching function to the attended target feature representation and the source feature representation comprises: determining an estimated position of a source object in an environment based at least in part on one or more comparisons of the source feature representation to the attended target feature representation. 5. The computer-implemented method of claim 4 , wherein the determining the loss associated with the matching score and the ground-truth matching score for the source feature representation and the target feature representation comprises: determining the loss based at least in part on one or more comparisons of the estimated position of the source object relative to a ground-truth position of the source object. 6. A computing system comprising: one or more processors; and one or more tangible non-transitory computer-readable media storing computer-readable instructions that are executable by the one or more processors to cause the one or more processors to perform operations, the operations comprising: accessing target data comprising a target feature representation of an environment; accessing a machine-learned attention model configured to generate an attention feature representation of the target feature representation of the environment based at least in part on evaluation of a loss associated with a matching score for a source feature representation and an attended target feature representation relative to a ground-truth matching score for the source feature representation and the target feature representation; generating the attention feature representation based at least in part on the target feature representation and the machine-learned attention model; and generating the attended target feature representation based at least in part on masking the target feature representation with the attention feature representation. 7. The computing system of claim 6 , wherein generating the attended target feature representation based at least in part on masking the target feature representation with the attention feature representation comprises: performing one or more hard attention operations to increase sparsity of the attended target feature representation. 8. The computing system of claim 7 , wherein the performing the one or more hard attention operations on the target feature representation to increase sparsity of the attended target feature representation comprises determining the sparsity of the attended target feature representation based at least in part on evaluation of the attended target feature representation with respect to a sparsity threshold. 9. The computing system of claim 8 , wherein the sparsity threshold is based in part on at least one of a predetermined accuracy of the attended target feature representation with respect to the target feature representation and a predetermined data size of the attended target feature representation. 10. The computing system of claim 6 , wherein the generating the attended target feature representation based at least in part on masking the target feature representation with the attention feature representation comprises: performing one or more compression operations on the attended target feature representation. 11. The computing system of claim 10 , wherein the one or more compression operations comprise a plurality of lossless binary compression operations that reconstruct the attended target feature representation without loss of information encoded in the attended target feature representation. 12. The computing system of claim 10 , wherein the one or more compression operations comprise one or more Huffman encoding operations performed prior to one or more Run-Length-Encoding operations. 13. The computing system of claim 6 , wherein the machine-learned attention model is a convolutional neural network that is trained end-to-end. 14. The computing system of claim 6 , further comprising: storing the attended target feature representation in a storage device of an autonomous vehicle associated with the computing system. 15. The computing system of claim 6 , further comprising: operating, based at least in part on the attended target feature representation, one or more vehicle localization systems or one or more mapping systems, wherein the attended target feature representation is used to determine a location in an environment based at least in part on one or more comparisons to another representation of the environment. 16. A vehicle comprising: one or more processors; a memory comprising one or more computer-readable media, the memory storing computer-readable instructions that are executable by the one or more processors to cause the one or more processors to perform operations comprising: accessing target data comprising a target feature representation of an environment; generating an attention feature representation of the target feature representation based at least in part on a machine-learned attention model that is trained by evaluating a loss associated with a matching score for the attention feature representation and a source feature representation compared to a ground-truth matching score for the target feature representation and the source feature representation, wherein the loss is based at least in part on at least one of a matching loss and a sparsity-inducing loss, the sparsity-inducing loss associated with increasing a sparsity of the attention feature representation; and generating an attended feature representation based at least in part on masking the target feature representation with the attention feature representation. 17. The vehicle of claim 16 , further comprising: storing the attended feature representation in the
Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title
using neural networks · CPC title
Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries · CPC title
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
Validation; Performance evaluation; Active pattern learning techniques · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.