System and method for appearance search
US-2018157939-A1 · Jun 7, 2018 · US
US11301687B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11301687-B2 |
| Application number | US-201916726878-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 25, 2019 |
| Priority date | Feb 12, 2018 |
| Publication date | Apr 12, 2022 |
| Grant date | Apr 12, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A pedestrian re-identification method includes: obtaining a target video containing a target pedestrian and at least one candidate video; encoding each target video segment in the target video and each candidate video segment in the at least one candidate segment separately; determining a score of similarity between the each target video segment and the each candidate video segment according to encoding results, the score of similarity being used for representing a degree of similarity between pedestrian features in the target video segment and the candidate video segment; and performing pedestrian re-identification on the at least one candidate video according to the score of similarity.
Opening claim text (preview).
The invention claimed is: 1. A pedestrian re-identification method, comprising: obtaining at least one candidate video and a target video containing a target pedestrian; encoding each target video segment in the target video and each candidate video segment in the at least one candidate segment respectively; determining, according to encoding results, a score of similarity between the each target video segment and the each candidate video segment, the score of similarity being used for representing a degree of similarity between pedestrian features in the target video segment and pedestrian features in the candidate video segment; and performing, according to the score of similarity, pedestrian re-identification on the at least one candidate video, wherein the determining, according to encoding results, a score of similarity between the each target video segment and the each candidate video segment comprises: performing a subtraction operation on an encoding result of the each target video segment and an encoding result of the each candidate video segment in sequence; performing, in each dimension, a square operation on a result of the subtraction operation; performing a full connection operation on a feature vector obtained by the square operation to obtain a two-dimensional feature vector; performing a normalization operation on the two-dimensional feature vector; and obtaining the score of similarity between the each target video segment and the each candidate video segment. 2. The method according to claim 1 , wherein the encoding each target video segment in the target video and each candidate video segment in the at least one candidate segment respectively comprises: obtaining a first target feature vector and a second target feature vector of each target video frame in the each target video segment as well as an index feature vector of the each target video segment; obtaining a first candidate feature vector and a second candidate feature vector of each candidate video frame in the each candidate video segment; generating, according to the index feature vector, the first target feature vector, and the first candidate feature vector, an attention weight vector; and obtaining, according to the attention weight vector, the second target feature vector, and the second candidate feature vector, an encoding result of the each target video segment and an encoding result of the each candidate video segment. 3. The method according to claim 2 , wherein the obtaining a first target feature vector and a second target feature vector of each target video frame in the each target video segment as well as an index feature vector of the each target video segment, and obtaining a first candidate feature vector and a second candidate feature vector of each candidate video frame in the each candidate video segment comprises: extracting an image feature vector of the each target video frame and an image feature vector of the each candidate video frame respectively; generating, according to the image feature vector of the each target video frame, the first target feature vector and the second target feature vector of the each target video frame as well as the index feature vector of the each target video frame; and generating, according to the image feature vector of the each candidate video frame, the first candidate feature vector and the second candidate feature vector of the each candidate video frame. 4. The method according to claim 2 , wherein the attention weight vector comprises a target attention weight vector and a candidate attention weight vector, wherein the generating, according to the index feature vector, the first target feature vector, and the first candidate feature vector, an attention weight vector comprises: generating, according to the index feature vector and the first target feature vector, a target attention weight vector of the each target video frame; and generating, according to the index feature vector and the first candidate feature vector, a candidate attention weight vector of the each candidate video frame. 5. The method according to claim 4 , wherein the generating, according to the index feature vector and the first target feature vector, a target attention weight vector of the each target video frame comprises: generating, according to the index feature vector and the first target feature vector of the each target video frame, a target heat map of the each target video frame; and performing normalization processing on the target heat map to obtain the target attention weight vector of the each target video frame; and/or the generating, according to the index feature vector and the first candidate feature vector, a candidate attention weight vector of the each candidate video frame comprises: generating, according to the index feature vector and the first candidate feature vector of the each candidate video frame, a candidate heat map of the each candidate video frame; and performing normalization processing on the candidate heat map to obtain the candidate attention weight vector of the each candidate video frame, wherein a heat map is formed by performing an inner product operation on a key feature vector of the each target video frame or the each candidate video frame and the index feature vector of the each target video segment, and the heat map is used for reflecting a correction between each feature in the target video frame or the candidate video frame and global information. 6. The method according to claim 2 , wherein the obtaining, according to the attention weight vector, the second target feature vector, and the second candidate feature vector, an encoding result of the each target video segment and an encoding result of the each candidate video segment comprises: obtaining, according to the target attention weight vector and the second target feature vector of the target video frame, the encoding result of the each target video segment; and obtaining, according to the candidate attention weight vector and the second candidate feature vector of the candidate video frame, the encoding result of the each candidate video segment. 7. The method according to claim 6 , wherein the obtaining, according to the target attention weight vector and the second target feature vector of the each target video frame, the encoding result of the each target video segment comprises: multiplying the target attention weight vector of the each target video frame by the second target feature vector of the each target video frame; adding, in time dimension, multiplication result of the each target video frame; and obtain the encoding result of the each target video segment; and/or the obtaining, according to the candidate attention weight vector and the second candidate feature vector of the each candidate video frame, the encoding result of the each candidate video segment comprises: multiplying the candidate attention weight vector of the each candidate video frame by the second candidate feature vector of the each candidate video frame; adding, in time dimension, multiplication result of the each candidate video frame; and obtaining the encoding result of the each candidate video segment. 8. The method according to claim 1 , wherein the performing, according to the score of similarity, pedestrian re-identification on the at least one candidate video comprises: for each candidate video in the at least one candidate video, taking a sum of a preset proportion of top scores in scores of similarity between each candidate video segment of the candidate video and the each target video segment as a score of similarity of the candidate video; ranking the score of similarity of the each candidate video in a
Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title
Contour-based spatial representations, e.g. vector-coding · CPC title
Surveillance or monitoring of activities, e.g. for recognising suspicious objects (recognising microscopic objects G06V20/69) · CPC title
Matching video sequences · CPC title
the region being a picture, frame or field · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.