System and method for detecting interaction
US-2019188866-A1 · Jun 20, 2019 · US
US11475351B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11475351-B2 |
| Application number | US-201816124966-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 7, 2018 |
| Priority date | Nov 15, 2017 |
| Publication date | Oct 18, 2022 |
| Grant date | Oct 18, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, methods, tangible non-transitory computer-readable media, and devices for object detection, tracking, and motion prediction are provided. For example, the disclosed technology can include receiving sensor data including information based on sensor outputs associated with detection of objects in an environment over one or more time intervals by one or more sensors. The operations can include generating, based on the sensor data, an input representation of the objects. The input representation can include a temporal dimension and spatial dimensions. The operations can include determining, based on the input representation and a machine-learned model, detected object classes of the objects, locations of the objects over the one or more time intervals, or predicted paths of the objects. Furthermore, the operations can include generating, based on the input representation and the machine-learned model, an output including bounding shapes corresponding to the objects.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method of object detection, the computer-implemented method comprising: receiving sensor data comprising information based at least in part on one or more sensor outputs associated with detection of an environment over one or more time intervals by one or more sensors, wherein the environment comprises one or more objects; generating based at least in part on the sensor data, an input representation of the one or more objects, wherein the input representation comprises a temporal dimension and one or more spatial dimensions; determining, based at least in part on one or more fusion criteria provided in data associated with the temporal dimension of the input representation, whether to aggregate temporal information associated with the temporal dimension at a first convolution layer of a plurality of convolution layers of a machine-learned model or to aggregate the temporal information associated with the temporal dimension over two or more convolution layers of the plurality of convolution layers of the machine-learned model; determining based at least in part on the input representation and the machine-learned model, at least one of: (i) one or more detected object classes of the one or more objects, (ii) one or more locations of the one or more objects over the one or more time intervals, or (iii) one or more predicted paths of the one or more objects, wherein the machine-learned model aggregates the temporal information associated with the temporal dimension at the first convolution layer or aggregates the temporal information associated with the temporal dimension over two or more convolution layers of the machine-learned model, wherein aggregating the temporal information comprises reducing the one or more time intervals of the temporal dimension to one time interval in a manner that is determined based at least in part on the one or more fusion criteria; and generating, based at least in part on the input representation and the machine-learned model, output data comprising one or more bounding shapes corresponding to the one or more objects. 2. The computer-implemented method of claim 1 , further comprising: generating, based at least in part on the sensor data, a plurality of voxels corresponding to the environment comprising the one or more objects, wherein a height dimension of the plurality of voxels is used as an input channel of the input representation, and wherein the input representation is based at least in part on the plurality of voxels corresponding to one or more portions of the environment occupied by the one or more objects. 3. The computer-implemented method of claim 1 , wherein the input representation comprises a tensor associated with a plurality of dimensions comprising the temporal dimension and the one or more spatial dimensions, the temporal dimension of the tensor associated with the one or more time intervals, and the one or more spatial dimensions of the tensor comprising a width dimension, a depth dimension, or a height dimension that is used as an input channel for the machine-learned model. 4. The computer-implemented method of claim 3 , wherein the input representation is input to the first convolution layer of the plurality of convolution layers of the machine-learned model, and wherein weights of a plurality of feature maps for the plurality of convolution layers are shared between the plurality of convolution layers. 5. The computer-implemented method of claim 4 , further comprising: aggregating the temporal information to the tensor subsequent to aggregating spatial information associated with the one or more spatial dimensions to the tensor, wherein the temporal information is aggregated as the input representation is processed by the plurality of convolution layers, and wherein the temporal information is associated with the temporal dimension of the tensor. 6. The computer-implemented method of claim 1 , wherein the fusion criteria provided in data associated with the temporal dimension of the input representation comprises a flag signaling whether to aggregate temporal information associated with the temporal dimension at the first convolution layer of the plurality of convolution layers of the machine-learned model or to aggregate the temporal information associated with the temporal dimension over the two or more convolution layers of the plurality of convolution layers of the machine-learned model. 7. The computer-implemented method of claim 6 , wherein aggregating the temporal information comprises: reducing the one or more time intervals of the temporal dimension to one time interval by performing a one-dimensional convolution on the temporal information associated with the temporal dimension. 8. The computer-implemented method of claim 1 , wherein aggregating the temporal information comprises: reducing the one or more time intervals of the temporal dimension to one time interval by performing a two-dimensional convolution on the temporal information associated with the temporal dimension. 9. The computer-implemented method of claim 1 , further comprising: activating, based at least in part on the output data, one or more systems comprising mechanical systems, one or more electromechanical systems, or one or more electronic systems, associated with operation of a manually operated vehicle, an autonomous vehicle, or one or more robotic systems. 10. The computer-implemented method of claim 1 , further comprising: determining one or more travelled paths of the one or more objects based at least in part on one or more locations of the one or more objects over a sequence of the one or more time intervals comprising a last time interval associated with a current time and the one or more time intervals prior to the current time, wherein the one or more predicted paths of the one or more objects is based at least in part on the one or more travelled paths. 11. The computer-implemented method of claim 10 , further comprising: detecting an object of the one or more objects that is at least partly occluded; and determining, based at least in part on the one or more travelled paths of the one or more objects, a time associated with the object of the one or more objects that is at least partly occluded being detected. 12. The computer-implemented method of claim 1 , wherein the one or more sensor outputs comprise one or more three-dimensional points corresponding to a plurality of surfaces of the one or more objects detected by the one or more sensors. 13. The computer-implemented method of claim 1 , wherein the sensor data is associated with a birds eye view vantage point, the one or more sensors comprising one or more light detection and ranging devices (LIDAR), one or more cameras, one or more radar devices, one or more sonar devices, or one or more thermal sensors. 14. One or more tangible non-transitory computer-readable media storing computer-readable instructions that are executable by one or more processors to cause the one or more processors to perform operations, the operations comprising: receiving sensor data comprising information based at least in part on one or more sensor outputs associated with detection of an environment over one or more time intervals by one or more sensors, wherein the environment comprises one or more objects; generating, based at least in part on the sensor data, an input representation of the one or more objects, wherein the input representation comprises a temporal dimension and one or more spatial dimensions; determining, based at least in part on one or more fusion criteria provided in data associated w
Related publications grouped by family.
Answers are generated from the same data shown on this page.