System and method for utilizing a temporal recurrent network for online action detection

US11260872B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11260872-B2
Application numberUS-201816159194-A
CountryUS
Kind codeB2
Filing dateOct 12, 2018
Priority dateOct 12, 2018
Publication dateMar 1, 2022
Grant dateMar 1, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method for utilizing a temporal recurrent network for online action detection that include receiving image data that is based on at least one image captured by a vehicle camera system. The system and method also include analyzing the image data to determine a plurality of image frames and outputting at least one goal-oriented action as determined during a current image frame. The system and method further include controlling a vehicle to be autonomously driven based on a naturalistic driving behavior data set that includes the at least one goal-oriented action.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method for utilizing a temporal recurrent network for online action detection, comprising: receiving image data that is based on at least one image captured by a vehicle camera system; analyzing the image data to determine a plurality of image frames, wherein the plurality of image frames include at least one past image frame, and a current image frame; completing feature representation of pseudo future information to output at least one predicted action based on sequentially running an encoder and a decoder of the temporal recurrent network with respect to spatial-temporal features included within the plurality of image frames; outputting at least one goal-oriented action as determined during the current image frame, wherein the at least one goal-oriented action is based on the at least one past image frame, the current image frame, and the at least one predicted action; and controlling a vehicle to be autonomously driven based on a naturalistic driving behavior data set that includes the at least one goal-oriented action, wherein the naturalistic driving behavior data set is stored and pre-trained with annotations associated with a plurality of goal-oriented actions. 2. The computer-implemented method of claim 1 , wherein analyzing the image data to determine the plurality of image frames includes down sampling the image data, wherein the down sampled image data is converted into the plurality of image frames. 3. The computer-implemented method of claim 1 , wherein analyzing the image data to determine the plurality of image frames includes performing spatial-temporal feature extraction that pertains to object recognition and scene recognition on the at least one past image frame and the current image frame. 4. The computer-implemented method of claim 3 , wherein at least one feature vector is extracted from the at least one past image frame and the current image frame that is associated with at least one spatial-temporal feature within the at least one past image frame and the current image frame, wherein the at least one feature vector is classified from a target frame of the plurality of image frames into a predefined behavioral event that is determined from a list of predetermined driving behaviors stored upon the naturalistic driving behavior data set. 5. The computer-implemented method of claim 3 , wherein outputting the at least one goal-oriented action includes decoding to output the at least one predicted action based on a feature representation that is predicted to occur at an immediate future point in time. 6. The computer-implemented method of claim 5 , wherein a future representation of the at least one predicted action is obtained by average pooling hidden states based on the feature vectors associated with the spatial-temporal features extracted from the at least one past image frame and the current image frame, wherein the at least one predicted action is output based on the future representation of the at least one predicted action. 7. The computer-implemented method of claim 6 , wherein outputting the at least one goal-oriented action includes the encoder of the temporal recurrent network concatenating at least one feature vector that is extracted for the at least one past image frame, the current image frame, and a future feature associated with the future representation of the predicted action. 8. The computer-implemented method of claim 7 , wherein outputting the at least one goal-oriented action includes outputting at least one action determined during a current frame based on the concatenation completed by the encoder of the temporal recurrent network, wherein a driving scene is evaluated to determine at least one driver action that is conducted absent any external stimuli that is presented within a surrounding environment of the vehicle. 9. The computer-implemented method of claim 1 , further including classifying at least one stimulus-driven action based on evaluating at least one behavioral event and the image data, wherein an external stimuli is determined to be a cause of the at least one behavioral event, wherein controlling the vehicle to be autonomously driven is based on the naturalistic driving behavior data set that includes the at least one stimulus-driven action. 10. A system for utilizing a temporal recurrent network for online action detection, comprising: a memory storing instructions when executed by a processor cause the processor to: receive image data that is based on at least one image captured by a vehicle camera system; analyze the image data to determine a plurality of image frames, wherein the plurality of image frames include at least one past image frame, and a current image frame; complete feature representation of pseudo future information to output at least one predicted action based on sequentially running an encoder and a decoder of the temporal recurrent network with respect to spatial-temporal features included within the plurality of image frames; output at least one goal-oriented action as determined during the current image frame, wherein the at least one goal-oriented action is based on the at least one past image frame, the current image frame, and at least one predicted action; and control a vehicle to be autonomously driven based on a naturalistic driving behavior data set that includes the at least one goal-oriented action, wherein the naturalistic driving behavior data set is stored and pre-trained with annotations associated with a plurality of goal-oriented actions. 11. The system of claim 10 , wherein analyzing the image data to determine the plurality of image frames includes down sampling the image data, wherein the down sampled image data is converted into the plurality of image frames. 12. The system of claim 10 , wherein analyzing the image data to determine the plurality of image frames includes performing spatial-temporal feature extraction that pertains to object recognition and scene recognition on the at least one past image frame and the current image frame. 13. The system of claim 12 , wherein at least one feature vector is extracted from the at least one past image frame and the current image frame that is associated with at least one spatial-temporal feature within the at least one past image frame and the current image frame, wherein the at least one feature vector is classified from a target frame of the plurality of image frames into a predefined behavioral event that is determined from a list of predetermined driving behaviors stored upon the naturalistic driving behavior data set. 14. The system of claim 12 , wherein outputting the at least one goal-oriented action includes decoding to output the at least one predicted action based on a feature representation that is predicted to occur at an immediate future point in time. 15. The system of claim 14 , wherein the future representation of the at least one predicted action is obtained by average pooling hidden states based on the feature vectors associated with the spatial-temporal features extracted from the at least one past image frame and the current image frame, wherein the at least one predicted action is output based on the future representation of the at least one predicted action. 16. The system of claim 15 , wherein outputting the at least one goal-oriented action includes the encoder of the temporal recurrent network concatenating at least one feature vector that is extracted for the at least one past image frame, the current image frame, and a future feature associated with the future representati

Assignees

Inventors

Classifications

  • Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title

  • Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title

  • using neural networks · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • using classification, e.g. of video objects · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11260872B2 cover?
A system and method for utilizing a temporal recurrent network for online action detection that include receiving image data that is based on at least one image captured by a vehicle camera system. The system and method also include analyzing the image data to determine a plurality of image frames and outputting at least one goal-oriented action as determined during a current image frame. The s…
Who is the assignee on this patent?
Honda Motor Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).