Using the image from a rear view camera in a three-camera electronic mirror system to provide early detection of on-coming cyclists in a bike lane
US-11161456-B1 · Nov 2, 2021 · US
US11625646B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11625646-B2 |
| Application number | US-202016841227-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 6, 2020 |
| Priority date | Apr 6, 2020 |
| Publication date | Apr 11, 2023 |
| Grant date | Apr 11, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, processing system and processor-readable medium for classifying human behavior based on a sequence of frames of a digital video. A 2D convolutional neural network is used to identify key points on a human body, such as human body joints, visible within each frame. An encoded representation of the key points is created for each video frame. The sequence of encoded representations corresponding to the sequence of frames is processed by a 3D CNN trained to identify human behaviors based on key point positions varying over time.
Opening claim text (preview).
The invention claimed is: 1. A method, carried out by a processor executing computer program instructions, comprising: receiving at least one key point position set for a frame of a sequence of frames, the at least one key point position set including a key point position for each key point of a human body detected in the frame, each key point position corresponding to a location of a joint of the human body; generating an encoded representation for each key point position set of the at least one key point position set for the frame, each encoded representation comprising: an X matrix having a plurality of X pixel coordinates for the plurality of key point positions in the key point position set, a first X pixel coordinate and second X pixel coordinate being positioned within the matrix relative to each other based on a proximity relationship or movement relationship between a first joint of the human body and a second joint of the human body corresponding to the first X pixel coordinate and second X pixel coordinate respectively; and a Y matrix having a plurality of Y pixel coordinates for the plurality of key point positions in the key point position set, a first Y pixel coordinate and second Y pixel coordinate being positioned within the matrix relative to each other based on a proximity relationship or movement relationship between a first joint of the human body and a second joint of the human body corresponding to the first Y pixel coordinate and second Y pixel coordinate respectively; and providing the encoded representation for each of the at least one key point position set for the frame to a human behaviour classifier that includes a machine learned model that is configured to identify a behaviour of the human body based on the encoded representation for each key point position set and output the identified behavior of the human body. 2. The method of claim 1 , further comprising: receiving a plurality of key point position sets, each key point position set correspond to one frame in the sequence of frames; and generating an encoded representation for each key point position set of the plurality of key point position sets; and providing the encoded representation to the human behaviour classifier that includes the machine learned model that is configured to identity a human behaviour based on the plurality of encoded representations and output the identified behavior of the human body. 3. The method of claim 2 , further comprising: receiving the sequence of frames; and processing each respective frame in the sequence of frames to generate the key point position set corresponding to the respective frame. 4. The method of claim 3 , wherein the key point position set is generated using a key points identifier configured to receive a bounding box for the human body comprising one or more pixel values of a plurality of pixels of the respective frame, process the bounding box to identify key points within the bounding box and generate a key point position for each key point, and generate the key point position set that includes the key point position for each key point identified in the frame. 5. The method of claim 1 , wherein each encoded representation further comprises: a Z matrix having a plurality of Z depth coordinates for the plurality of key point positions in the key point position set, a first Z depth coordinate and second Z coordinate being positioned within the matrix relative to each other based on a proximity relationship or movement relationship between a first joint of the human body and a second joint of the human body corresponding to the first Z coordinate and second Z coordinate respectively. 6. A processing system, comprising: a processor; and a memory having stored thereon executable instructions that, when executed by the processor, cause the device to: receive at least one key point position set for a frame of a sequence of frame, the at least one key point position set including a key point position for each key point of a human body detected in the frame, each key point position corresponding to a location of the key point on the human body; generate an encoded representation for each key point position set of the at least one key point position set for the frame, each encoded representation comprising: an X matrix having a plurality of X pixel coordinates for the plurality of key point positions in the key point position set, a first X pixel coordinate and second X pixel coordinate being positioned within the matrix relative to each other based on a proximity relationship or movement relationship between a first joint of the human body and a second joint of the human body corresponding to the first X pixel coordinate and second X pixel coordinate respectively; and a Y matrix having a plurality of Y pixel coordinates for the plurality of key point positions in the key point position set, a first Y pixel coordinate and second Y pixel coordinate being positioned within the matrix relative to each other based on a proximity relationship or movement relationship between a first joint of the human body and a second joint of the human body corresponding to the first Y pixel coordinate and second Y pixel coordinate respectively; and provide the encoded representation for each of the at least one key point position set for the frame to a human behaviour classifier that includes a machine learned model that is configured to identify a behaviour of the human body based on the encoded representation for each key point position set and output the identified behavior of the human body. 7. The processing system of claim 6 , wherein the executable instructions, when executed by the processor, further cause the device to: receive a plurality of key point position sets, each key point position set correspond to one frame in the sequence of frames; and generate an encoded representation for each key point position set of the plurality of key point position sets; and provide the encoded representation to the human behaviour classifier that includes the machine learned model that is configured to identity a human behaviour based on the plurality of encoded representations and output the identified behavior of the human body. 8. The processing system of claim 7 , wherein the executable instructions, when executed by the processor, further cause the device to: receive the sequence of frames; and for each frame of the sequence of frames, generate the key point position set corresponding to the frame. 9. The processing system of claim 6 , wherein the encoded representation is a matrix representation and wherein the machine learned model is a matrix machine learned model, and wherein each key point position corresponds to a joint of the human body. 10. The processing system of claim 6 , wherein each encoded representation further comprises: a Z matrix having a plurality of Z depth coordinates for the plurality of key point positions in the key point position set, a first Z depth coordinate and second Z coordinate being positioned within the matrix relative to each other based on a proximity relationship or movement relationship between a first joint of the human body and a second joint of the human body corresponding to the first Z coordinate and second Z coordinate respectively. 11. A non-transitory processor-readable medium containing instructions which, when executed by a processor of a processing system cause the processing system to: receive at least one key point position set for a frame of a sequence of frame, the at least one key point position set including a key point position for each key point of a human body detected in the frame, each key point positio
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title
using neural networks · CPC title
Extraction of image or video features · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.