Hierarchical machine-learning network architecture
US-2020210721-A1 · Jul 2, 2020 · US
US11155259B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11155259-B2 |
| Application number | US-201916386964-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 17, 2019 |
| Priority date | Sep 13, 2018 |
| Publication date | Oct 26, 2021 |
| Grant date | Oct 26, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system and method for egocentric-vision based future vehicle localization that include receiving at least one egocentric first person view image of a surrounding environment of a vehicle. The system and method also include encoding at least one past bounding box trajectory associated with at least one traffic participant that is captured within the at least one egocentric first person view image and encoding a dense optical flow of the egocentric first person view image associated with the at least one traffic participant. The system and method further include decoding at least one future bounding box associated with the at least one traffic participant based on a final hidden state of the at least one past bounding box trajectory encoding and the final hidden state of the dense optical flow encoding.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method for egocentric-vision based future vehicle localization, comprising: receiving at least one egocentric first person view image of a surrounding environment of a vehicle; encoding at least one past bounding box trajectory associated with at least one traffic participant that is captured within the at least one egocentric first person view image, wherein a final hidden state of the at least one past bounding box trajectory encoding is output; encoding a dense optical flow of the egocentric first person view image associated with the at least one traffic participant, wherein a final hidden state of the dense optical flow encoding is output; decoding at least one future bounding box associated with the at least one traffic participant based on a fusion of the final hidden state of the at least one past bounding box trajectory encoding and the final hidden state of the dense optical flow encoding; and controlling the vehicle to be autonomously driven based on the at least one future bounding box associated with the at least one traffic participant. 2. The computer-implemented method of claim 1 , wherein receiving the at least one egocentric first person view image includes extracting at least one spatial-temporal feature that pertains to an object and classifying the object as the at least one traffic participant based on a comparison of pixel locations and scale of the object against at least one traffic participant model. 3. The computer-implemented method of claim 2 , wherein encoding the at least one past bounding box trajectory includes computing at least one bounding box around the at least one traffic participant as classified, wherein at least one past trajectory is computed based on the at least one past bounding box. 4. The computer-implemented method of claim 3 , wherein encoding the at least one past bounding box trajectory includes encoding a past location, position, and trajectory the at least one traffic participant based on a pixel location and scale as specified by pixel coordinates of the at least one traffic participant bounding box at a time together with a width and height in pixels of the at least one egocentric first person view image of the at least one traffic participant. 5. The computer-implemented method of claim 1 , wherein encoding the dense optical flow of the egocentric first person view image includes evaluating pixel level information with respect to each of the pixels of past image frames to determine the dense optical flow of past image frames, wherein a pattern of an apparent motion change of the at least one traffic participant between two consecutive image frames is caused by the movement of the at least one traffic participant. 6. The computer-implemented method of claim 5 , wherein encoding the dense optical flow of the egocentric first person view image includes completing region of interest pooling of optical flow fields and the past bounding box trajectory, wherein a region of interest may be expanded from a bounding box to extract features associated with the at least one traffic participant. 7. The computer-implemented method of claim 1 , further including fusing the final hidden state of the at least one past bounding box trajectory encoding and the final hidden state of the dense optical flow encoding, wherein a final fused hidden state is outputted as hidden state vectors of gated recurrent unit models at a particular time. 8. The computer-implemented method of claim 7 , further including estimating a future ego-motion of the vehicle, wherein the future ego-motion of the vehicle is determined by an autonomous driving plan that is based on at least one of: an intended destination of the vehicle, a lane in which the vehicle is traveling, a status of a traffic signal, a traffic pattern, and a traffic regulation. 9. The computer-implemented method of claim 8 , wherein decoding at least one future bounding box associated with the at least one traffic participant includes inputting the final fused hidden state and the future ego-motion of the vehicle to a future localization decoder to decode the at least one future bounding box associated with the at least one traffic participant. 10. A system for egocentric-vision based future vehicle localization, comprising: a memory storing instructions when executed by a processor cause the processor to: receive at least one egocentric first person view image of a surrounding environment of a vehicle; encode at least one past bounding box trajectory associated with at least one traffic participant that is captured within the at least one egocentric first person view image, wherein a final hidden state of the at least one past bounding box trajectory encoding is output; encode a dense optical flow of the egocentric first person view image associated with the at least one traffic participant, wherein a final hidden state of the dense optical flow encoding is output; decode at least one future bounding box associated with the at least one traffic participant based on a fusion of the final hidden state of the at least one past bounding box trajectory encoding and the final hidden state of the dense optical flow encoding; and control the vehicle to be autonomously driven based on the at least one future bounding box associated with the at least one traffic participant. 11. The system of claim 10 , wherein receiving the at least one egocentric first person view image includes extracting at least one spatial-temporal feature that pertains to an object and classifying the object as the at least one traffic participant based on a comparison of pixel locations and scale of the object against at least one traffic participant model. 12. The system of claim 11 , wherein encoding the at least one past bounding box trajectory includes computing at least one bounding box around the at least one traffic participant as classified, wherein at least one past trajectory is computed based on the at least one past bounding box. 13. The system of claim 12 , wherein encoding the at least one past bounding box trajectory includes encoding a past location, position, and trajectory the at least one traffic participant based on a pixel location and scale as specified by pixel coordinates of the at least one traffic participant bounding box at a time together with a width and height in pixels of the at least one egocentric first person view image of the at least one traffic participant. 14. The system of claim 10 , wherein encoding the dense optical flow of the egocentric first person view image includes evaluating pixel level information with respect to each of the pixels of past image frames to determine the dense optical flow of past image frames, wherein a pattern of an apparent motion change of the at least one traffic participant between two consecutive image frames is caused by the movement of the at least one traffic participant. 15. The system of claim 14 , wherein encoding the dense optical flow of the egocentric first person view image includes completing region of interest pooling of optical flow fields and the past bounding box trajectory, wherein a region of interest may be expanded from a bounding box to extract features associated with the at least one traffic participant. 16. The system of claim 10 , further including fusing the final hidden state of the at least one past bounding box trajectory encoding and the final hidden state of the dense optical flow encoding, wherein a final fused hidden state is outputted as hidden state vectors of gated recurrent unit models at a particula
exterior to a vehicle by using sensors mounted on the vehicle · CPC title
by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition · CPC title
Training; Learning · CPC title
using feature-based methods, e.g. the tracking of corners or segments · CPC title
using trajectory prediction for other traffic participants · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.