Adapting a sequence model for use in predicting future device interactions with a computing system
US-2021004682-A1 · Jan 7, 2021 · US
US12198358B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12198358-B2 |
| Application number | US-202217962624-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 10, 2022 |
| Priority date | Nov 16, 2018 |
| Publication date | Jan 14, 2025 |
| Grant date | Jan 14, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, methods, tangible non-transitory computer-readable media, and devices associated with motion flow estimation are provided. For example, scene data including representations of an environment over a first set of time intervals can be accessed. Extracted visual cues can be generated based on the representations and machine-learned feature extraction models. At least one of the machine-learned feature extraction models can be configured to generate a portion of the extracted visual cues based on a first set of the representations of the environment from a first perspective and a second set of the representations of the environment from a second perspective. The extracted visual cues can be encoded using energy functions. Three-dimensional motion estimates of object instances at time intervals subsequent to the first set of time intervals can be determined based on the energy functions and machine-learned inference models.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: accessing a first pair of stereo images and a second pair of stereo images of an environment of an autonomous vehicle from a pair of stereo cameras, wherein the first pair of stereo images comprises a first image of the environment from a first perspective at a first time and a second image of the environment from a second perspective at the first time, wherein the second pair of stereo images comprise a first image of the environment from the first perspective at a second time and a second image of the environment from the second perspective at the second time; generating, in less than 10 seconds, a three-dimensional motion estimate for an object in the environment using a machine-learned motion flow model by: determining, using the machine-learned motion flow model, a plurality of extracted features from the first pair of stereo images and the second pair of stereo images, wherein at least one feature of the plurality of features describes an object instance associated with an object in the environment; and processing, using the machine-learned motion flow model, the plurality of extracted features, wherein the machine-learned motion flow model is configured to solve an energy optimization function for characterizing three-dimensional rigid motion of the object instance; and controlling a motion of the autonomous vehicle based on the three-dimensional motion estimate. 2. The computer-implemented method of claim 1 , wherein the three-dimensional motion estimate is generated in less than 1 second. 3. The computer-implemented method of claim 2 , wherein processing, using the machine-learned motion flow model, the plurality of extracted features comprises: executing a solver to optimize, over a plurality of steps, one or more energy terms associated with motion of the object instance; wherein the solver executes in less than 1 second. 4. The computer-implemented method of claim 1 , wherein the machine-learned motion flow model comprises a machine-learned segmentation model. 5. The computer-implemented method of claim 4 , wherein the machine-learned segmentation model is configured to associate the object instance with a portion of at least one of the first pair of stereo images and the second pair of stereo images. 6. The computer-implemented method of claim 1 , wherein the machine-learned motion flow model is trained end-to-end. 7. The computer-implemented method of claim 1 , wherein the plurality of extracted features from the first pair of stereo images and the second pair of stereo images comprise one or more visual cues, the one or more visual cues comprising at least one of an instance segmentation cue, an optical flow cue, or a stereo cue. 8. The computer-implemented method of claim 1 , wherein the machine-learned motion flow model is configured to solve the energy optimization function using a Gaussian-Newton (GN) algorithm implemented as layers in a neural network. 9. The computer-implemented method of claim 1 , the method further comprising: removing uncertain pixels from the plurality of extracted features before processing the plurality of extracted features with the machine-learned motion flow model. 10. A computing system comprising: one or more processors; and one or more tangible non-transitory computer readable media storing computer-readable instructions that are executable by the one or more processors to cause the one or more processors to perform operations, the operations comprising: accessing a first pair of stereo images and a second pair of stereo images of an environment of an autonomous vehicle from a pair of stereo cameras, wherein the first pair of stereo images comprises a first image of the environment from a first perspective at a first time and a second image of the environment from a second perspective at the first time, wherein the second pair of stereo images comprise a first image of the environment from the first perspective at a second time and a second image of the environment from the second perspective at the second time; generating, in less than 10 seconds, a three-dimensional motion estimate for an object in the environment using a machine-learned motion flow model by: determining, using the machine-learned motion flow model, a plurality of extracted features from the first pair of stereo images and the second pair of stereo images, wherein at least one feature of the plurality of features describes an object instance associated with an object in the environment; and processing, using the machine-learned motion flow model, the plurality of extracted features, wherein the machine-learned motion flow model is configured to solve an energy optimization function for characterizing three-dimensional rigid motion of the object instance; and controlling a motion of the autonomous vehicle based on the three-dimensional motion estimate. 11. The computing system of claim 10 , wherein the three-dimensional motion estimate is generated in less than 1 second. 12. The computing system of claim 10 , wherein the plurality of extracted features from the first pair of stereo images and the second pair of stereo images comprise one or more visual cues, the one or more visual cues comprising at least one of an instance segmentation cue, an optical flow cue, or a stereo cue. 13. The computing system of claim 10 , wherein processing, using the machine-learned motion flow model, the plurality of extracted features comprises: executing a solver to optimize, over a plurality of steps, one or more energy terms associated with motion of the object instance; wherein the solver executes in less than 1 second. 14. The computing system of claim 13 , wherein the plurality of steps are implemented as a plurality of layers of a neural network. 15. The computing system of claim 14 , comprising: a graphical processing unit (GPU); wherein the operations comprise: executing the layers of the neural network on the GPU. 16. The computing system of claim 10 , wherein the machine-learned motion flow model is trained end-to-end. 17. The computing system of claim 10 , wherein the pair of stereo cameras is configured to obtain the first pair of stereo images and the second pair of stereo images of an environment associated with an augmented reality system. 18. One or more tangible non-transitory computer readable media storing computer-readable instructions that are executable by one or more processors to cause the one or more processors to perform operations, the operations comprising: accessing a first pair of stereo images and a second pair of stereo images of an environment of an autonomous vehicle from a pair of stereo cameras, wherein the first pair of stereo images comprises a first image of the environment from a first perspective at a first time and a second image of the environment from a second perspective at the first time, wherein the second pair of stereo images comprise a first image of the environment from the first perspective at a second time and a second image of the environment from the second perspective at the second time; generating, in less than 10 seconds, a three-dimensional motion estimate for an object in the environment using a machine-learned motion flow model by: determining, using the machine-learned motion flow model, a plurality of extracted features from the first pair of stereo images and the second pair of stereo images, wherein at least one feature of the plurality of features describes an object instance associated with an object in t
exterior to a vehicle by using sensors mounted on the vehicle · CPC title
using neural networks · CPC title
Extraction of image or video features · CPC title
Determining parameters from multiple pictures (depth or shape recovery from multiple images G06T7/55; stereo camera calibration G06T7/85) · CPC title
Stereo images · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.