Pedestrian re-identification methods and apparatuses, electronic devices, and storage media

US11301687B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11301687-B2
Application numberUS-201916726878-A
CountryUS
Kind codeB2
Filing dateDec 25, 2019
Priority dateFeb 12, 2018
Publication dateApr 12, 2022
Grant dateApr 12, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A pedestrian re-identification method includes: obtaining a target video containing a target pedestrian and at least one candidate video; encoding each target video segment in the target video and each candidate video segment in the at least one candidate segment separately; determining a score of similarity between the each target video segment and the each candidate video segment according to encoding results, the score of similarity being used for representing a degree of similarity between pedestrian features in the target video segment and the candidate video segment; and performing pedestrian re-identification on the at least one candidate video according to the score of similarity.

First claim

Opening claim text (preview).

The invention claimed is: 1. A pedestrian re-identification method, comprising: obtaining at least one candidate video and a target video containing a target pedestrian; encoding each target video segment in the target video and each candidate video segment in the at least one candidate segment respectively; determining, according to encoding results, a score of similarity between the each target video segment and the each candidate video segment, the score of similarity being used for representing a degree of similarity between pedestrian features in the target video segment and pedestrian features in the candidate video segment; and performing, according to the score of similarity, pedestrian re-identification on the at least one candidate video, wherein the determining, according to encoding results, a score of similarity between the each target video segment and the each candidate video segment comprises: performing a subtraction operation on an encoding result of the each target video segment and an encoding result of the each candidate video segment in sequence; performing, in each dimension, a square operation on a result of the subtraction operation; performing a full connection operation on a feature vector obtained by the square operation to obtain a two-dimensional feature vector; performing a normalization operation on the two-dimensional feature vector; and obtaining the score of similarity between the each target video segment and the each candidate video segment. 2. The method according to claim 1 , wherein the encoding each target video segment in the target video and each candidate video segment in the at least one candidate segment respectively comprises: obtaining a first target feature vector and a second target feature vector of each target video frame in the each target video segment as well as an index feature vector of the each target video segment; obtaining a first candidate feature vector and a second candidate feature vector of each candidate video frame in the each candidate video segment; generating, according to the index feature vector, the first target feature vector, and the first candidate feature vector, an attention weight vector; and obtaining, according to the attention weight vector, the second target feature vector, and the second candidate feature vector, an encoding result of the each target video segment and an encoding result of the each candidate video segment. 3. The method according to claim 2 , wherein the obtaining a first target feature vector and a second target feature vector of each target video frame in the each target video segment as well as an index feature vector of the each target video segment, and obtaining a first candidate feature vector and a second candidate feature vector of each candidate video frame in the each candidate video segment comprises: extracting an image feature vector of the each target video frame and an image feature vector of the each candidate video frame respectively; generating, according to the image feature vector of the each target video frame, the first target feature vector and the second target feature vector of the each target video frame as well as the index feature vector of the each target video frame; and generating, according to the image feature vector of the each candidate video frame, the first candidate feature vector and the second candidate feature vector of the each candidate video frame. 4. The method according to claim 2 , wherein the attention weight vector comprises a target attention weight vector and a candidate attention weight vector, wherein the generating, according to the index feature vector, the first target feature vector, and the first candidate feature vector, an attention weight vector comprises: generating, according to the index feature vector and the first target feature vector, a target attention weight vector of the each target video frame; and generating, according to the index feature vector and the first candidate feature vector, a candidate attention weight vector of the each candidate video frame. 5. The method according to claim 4 , wherein the generating, according to the index feature vector and the first target feature vector, a target attention weight vector of the each target video frame comprises: generating, according to the index feature vector and the first target feature vector of the each target video frame, a target heat map of the each target video frame; and performing normalization processing on the target heat map to obtain the target attention weight vector of the each target video frame; and/or the generating, according to the index feature vector and the first candidate feature vector, a candidate attention weight vector of the each candidate video frame comprises: generating, according to the index feature vector and the first candidate feature vector of the each candidate video frame, a candidate heat map of the each candidate video frame; and performing normalization processing on the candidate heat map to obtain the candidate attention weight vector of the each candidate video frame, wherein a heat map is formed by performing an inner product operation on a key feature vector of the each target video frame or the each candidate video frame and the index feature vector of the each target video segment, and the heat map is used for reflecting a correction between each feature in the target video frame or the candidate video frame and global information. 6. The method according to claim 2 , wherein the obtaining, according to the attention weight vector, the second target feature vector, and the second candidate feature vector, an encoding result of the each target video segment and an encoding result of the each candidate video segment comprises: obtaining, according to the target attention weight vector and the second target feature vector of the target video frame, the encoding result of the each target video segment; and obtaining, according to the candidate attention weight vector and the second candidate feature vector of the candidate video frame, the encoding result of the each candidate video segment. 7. The method according to claim 6 , wherein the obtaining, according to the target attention weight vector and the second target feature vector of the each target video frame, the encoding result of the each target video segment comprises: multiplying the target attention weight vector of the each target video frame by the second target feature vector of the each target video frame; adding, in time dimension, multiplication result of the each target video frame; and obtain the encoding result of the each target video segment; and/or the obtaining, according to the candidate attention weight vector and the second candidate feature vector of the each candidate video frame, the encoding result of the each candidate video segment comprises: multiplying the candidate attention weight vector of the each candidate video frame by the second candidate feature vector of the each candidate video frame; adding, in time dimension, multiplication result of the each candidate video frame; and obtaining the encoding result of the each candidate video segment. 8. The method according to claim 1 , wherein the performing, according to the score of similarity, pedestrian re-identification on the at least one candidate video comprises: for each candidate video in the at least one candidate video, taking a sum of a preset proportion of top scores in scores of similarity between each candidate video segment of the candidate video and the each target video segment as a score of similarity of the candidate video; ranking the score of similarity of the each candidate video in a

Assignees

Inventors

Classifications

  • G06V20/46Primary

    Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title

  • Contour-based spatial representations, e.g. vector-coding · CPC title

  • Surveillance or monitoring of activities, e.g. for recognising suspicious objects (recognising microscopic objects G06V20/69) · CPC title

  • G06V20/48Primary

    Matching video sequences · CPC title

  • the region being a picture, frame or field · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11301687B2 cover?
A pedestrian re-identification method includes: obtaining a target video containing a target pedestrian and at least one candidate video; encoding each target video segment in the target video and each candidate video segment in the at least one candidate segment separately; determining a score of similarity between the each target video segment and the each candidate video segment according to…
Who is the assignee on this patent?
Beijing Sensetime Tech Development Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06V20/46. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 12 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).