Making object-level predictions of the future state of a physical system
US-2020092565-A1 · Mar 19, 2020 · US
US11260872B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11260872-B2 |
| Application number | US-201816159194-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 12, 2018 |
| Priority date | Oct 12, 2018 |
| Publication date | Mar 1, 2022 |
| Grant date | Mar 1, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system and method for utilizing a temporal recurrent network for online action detection that include receiving image data that is based on at least one image captured by a vehicle camera system. The system and method also include analyzing the image data to determine a plurality of image frames and outputting at least one goal-oriented action as determined during a current image frame. The system and method further include controlling a vehicle to be autonomously driven based on a naturalistic driving behavior data set that includes the at least one goal-oriented action.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method for utilizing a temporal recurrent network for online action detection, comprising: receiving image data that is based on at least one image captured by a vehicle camera system; analyzing the image data to determine a plurality of image frames, wherein the plurality of image frames include at least one past image frame, and a current image frame; completing feature representation of pseudo future information to output at least one predicted action based on sequentially running an encoder and a decoder of the temporal recurrent network with respect to spatial-temporal features included within the plurality of image frames; outputting at least one goal-oriented action as determined during the current image frame, wherein the at least one goal-oriented action is based on the at least one past image frame, the current image frame, and the at least one predicted action; and controlling a vehicle to be autonomously driven based on a naturalistic driving behavior data set that includes the at least one goal-oriented action, wherein the naturalistic driving behavior data set is stored and pre-trained with annotations associated with a plurality of goal-oriented actions. 2. The computer-implemented method of claim 1 , wherein analyzing the image data to determine the plurality of image frames includes down sampling the image data, wherein the down sampled image data is converted into the plurality of image frames. 3. The computer-implemented method of claim 1 , wherein analyzing the image data to determine the plurality of image frames includes performing spatial-temporal feature extraction that pertains to object recognition and scene recognition on the at least one past image frame and the current image frame. 4. The computer-implemented method of claim 3 , wherein at least one feature vector is extracted from the at least one past image frame and the current image frame that is associated with at least one spatial-temporal feature within the at least one past image frame and the current image frame, wherein the at least one feature vector is classified from a target frame of the plurality of image frames into a predefined behavioral event that is determined from a list of predetermined driving behaviors stored upon the naturalistic driving behavior data set. 5. The computer-implemented method of claim 3 , wherein outputting the at least one goal-oriented action includes decoding to output the at least one predicted action based on a feature representation that is predicted to occur at an immediate future point in time. 6. The computer-implemented method of claim 5 , wherein a future representation of the at least one predicted action is obtained by average pooling hidden states based on the feature vectors associated with the spatial-temporal features extracted from the at least one past image frame and the current image frame, wherein the at least one predicted action is output based on the future representation of the at least one predicted action. 7. The computer-implemented method of claim 6 , wherein outputting the at least one goal-oriented action includes the encoder of the temporal recurrent network concatenating at least one feature vector that is extracted for the at least one past image frame, the current image frame, and a future feature associated with the future representation of the predicted action. 8. The computer-implemented method of claim 7 , wherein outputting the at least one goal-oriented action includes outputting at least one action determined during a current frame based on the concatenation completed by the encoder of the temporal recurrent network, wherein a driving scene is evaluated to determine at least one driver action that is conducted absent any external stimuli that is presented within a surrounding environment of the vehicle. 9. The computer-implemented method of claim 1 , further including classifying at least one stimulus-driven action based on evaluating at least one behavioral event and the image data, wherein an external stimuli is determined to be a cause of the at least one behavioral event, wherein controlling the vehicle to be autonomously driven is based on the naturalistic driving behavior data set that includes the at least one stimulus-driven action. 10. A system for utilizing a temporal recurrent network for online action detection, comprising: a memory storing instructions when executed by a processor cause the processor to: receive image data that is based on at least one image captured by a vehicle camera system; analyze the image data to determine a plurality of image frames, wherein the plurality of image frames include at least one past image frame, and a current image frame; complete feature representation of pseudo future information to output at least one predicted action based on sequentially running an encoder and a decoder of the temporal recurrent network with respect to spatial-temporal features included within the plurality of image frames; output at least one goal-oriented action as determined during the current image frame, wherein the at least one goal-oriented action is based on the at least one past image frame, the current image frame, and at least one predicted action; and control a vehicle to be autonomously driven based on a naturalistic driving behavior data set that includes the at least one goal-oriented action, wherein the naturalistic driving behavior data set is stored and pre-trained with annotations associated with a plurality of goal-oriented actions. 11. The system of claim 10 , wherein analyzing the image data to determine the plurality of image frames includes down sampling the image data, wherein the down sampled image data is converted into the plurality of image frames. 12. The system of claim 10 , wherein analyzing the image data to determine the plurality of image frames includes performing spatial-temporal feature extraction that pertains to object recognition and scene recognition on the at least one past image frame and the current image frame. 13. The system of claim 12 , wherein at least one feature vector is extracted from the at least one past image frame and the current image frame that is associated with at least one spatial-temporal feature within the at least one past image frame and the current image frame, wherein the at least one feature vector is classified from a target frame of the plurality of image frames into a predefined behavioral event that is determined from a list of predetermined driving behaviors stored upon the naturalistic driving behavior data set. 14. The system of claim 12 , wherein outputting the at least one goal-oriented action includes decoding to output the at least one predicted action based on a feature representation that is predicted to occur at an immediate future point in time. 15. The system of claim 14 , wherein the future representation of the at least one predicted action is obtained by average pooling hidden states based on the feature vectors associated with the spatial-temporal features extracted from the at least one past image frame and the current image frame, wherein the at least one predicted action is output based on the future representation of the at least one predicted action. 16. The system of claim 15 , wherein outputting the at least one goal-oriented action includes the encoder of the temporal recurrent network concatenating at least one feature vector that is extracted for the at least one past image frame, the current image frame, and a future feature associated with the future representati
Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title
Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title
using neural networks · CPC title
Learning methods · CPC title
using classification, e.g. of video objects · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.