Target tracking method and apparatus, medium, and device

US2021019627A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021019627-A1
Application numberUS-202017063997-A
CountryUS
Kind codeA1
Filing dateOct 6, 2020
Priority dateSep 14, 2018
Publication dateJan 21, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of this application disclose a target tracking method performed at an electronic device. The electronic device obtains a first video stream and detects candidate regions within a current video frame in the first video stream. The electronic device then extracts, from the candidate regions, a deep feature corresponding to each candidate region and calculates a feature similarity for each candidate region and a deep feature of a target detected in a previous video frame. Finally, the electronic device determines, based on the feature similarity corresponding to the candidate region, that the target is detected in the current video frame. Target detection is performed in a range of video frames by using a target detection model, and target tracking is performed based on the deep feature, so that occurrence of cases such as a target tracking drift or loss can be effectively prevented, to ensure the accuracy of target tracking.

First claim

Opening claim text (preview).

What is claimed is: 1 . A target tracking method, performed by an electronic device having a processor and memory connected to the processor and storing processor-executable instructions, the method comprising: obtaining, by the electronic device, a first video stream; detecting, by the electronic device, according to a target detection model and within a current video frame in the first video stream, candidate regions in the current video frame; extracting, by the electronic device, according to a feature extraction model and from the candidate regions, deep features corresponding to the candidate regions, the feature extraction model being an end-to-end neural network model that uses an image as an input and uses a deep feature of a movable body in the image as an output; calculating, by the electronic device, feature similarities between the deep features corresponding to the candidate regions in the current video frame and deep features corresponding to a target detected in a previous video frame; and determining, by the electronic device, that the target is detected in the current video frame according to the feature similarities. 2 . The method according to claim 1 , wherein the determining, by the electronic device that the target is detected in the current video frame according to the feature similarities comprises: selecting a candidate region having a highest feature similarity among the feature similarities corresponding to the candidate regions respectively, as a target region of the current video frame; and determining that the target is detected within the target region in the current video frame. 3 . The method according to claim 1 , wherein the determining, by the electronic device that the target is detected in the current video frame according to the feature similarities comprises: selecting a plurality of candidate regions whose associated feature similarities exceed a threshold according to the feature similarities corresponding to the candidate regions respectively, each of the plurality of candidate regions having an associated move direction; selecting, from the plurality of candidate regions, a candidate region whose associated motion direction most matches a motion direction of the target detected in the previous video frame, as a target region in the current video frame; and determining that the target is detected within the target region in the current video frame. 4 . The method according to claim 1 , wherein the determining, by the electronic device that the target is detected in the current video frame according to the feature similarities comprises: selecting a plurality of candidate regions whose associated feature similarities exceed a threshold according to the feature similarities corresponding to the candidate regions respectively, each of the plurality of candidate regions having an associated distance from a physical location of the target detected in the previous video frame; selecting, from the plurality of candidate regions, a candidate region having a smallest distance from the physical location of the target detected in the previous video frame, as a target region in the current video frame; and determining that the target is detected within the target region in the current video frame. 5 . The method according to claim 1 , wherein the determining, by the electronic device that the target is detected in the current video frame according to the feature similarities comprises: selecting a plurality of candidate regions whose associated feature similarities exceed a threshold according to the feature similarities corresponding to the candidate regions respectively, each of the plurality of candidate regions having an associated physical location and a move direction; selecting, from the plurality of candidate regions, a candidate region whose associated physical location and move direction most match a physical location and a motion direction of the target detected in the previous video frame respectively, as a target region of the current video frame; and determining that the target is detected within the target region in the current video frame. 6 . The method according to claim 1 , wherein the first video stream is shot by a first camera, and the method further comprises: after the tracking target disappears from the first camera: selecting, by the electronic device, a second camera from cameras adjacent to the first camera according to a region in which the target is last detected in the first video stream, the second camera configured to perform cross-screen target tracking; and obtaining, by the electronic device, a second video stream captured by the second camera, and tracking the target in the second video stream. 7 . The method according to claim 1 , wherein a physical location of a candidate region within the current video frame is determined by a location coordinate mapping model. 8 . The method according to claim 7 , wherein the location coordinate mapping model is generated by: calculating, by using a perspective transformation formula, a coordinate mapping matrix according to location coordinates of at least four location points on a preset calibration image and physical location coordinates of the at least four location points on a physical world ground; and generating the location coordinate mapping model according to the coordinate mapping matrix. 9 . The method according to claim 1 , further comprising: drawing, by the electronic device, a motion trajectory of the target on a map according to a physical location of the target detected in the first video stream. 10 . The method according to claim 1 , wherein the target detection model comprises: a basic network and an auxiliary network, wherein the basic network uses a lightweight convolutional neural network mobilenet, and the auxiliary network uses a detection layer formed by a convolution kernel, an input of the auxiliary network being a feature map outputted by different convolutional layers of the basic network. 11 . The method according to claim 1 , further comprising: obtaining, by the electronic device, an image sample, the image sample comprising a human body image and an image tag; and constructing, by the electronic device, a deep convolutional neural network initial model, and training the deep convolutional neural network initial model by using the image sample, to obtain a deep convolutional neural network model meeting a training ending condition as the feature extraction model. 12 . The method according to claim 1 , wherein the current video frame is a video frame located after a first video frame in the first video stream, and the deep feature of the target detected in the previous video frame is a deep feature of a target that is tracked in the first video frame and that is extracted by using the feature extraction model. 13 . An electronic device, comprising a processor, memory connected to the processor, and processor-executable instructions stored in the memory that, when executed by the processor, cause the electronic device to perform a plurality of operations including: obtaining a first video stream; detecting according to a target detection model and within a current video frame in the first video stream, candidate regions in the current video frame; extracting according to a feature extraction model and from the candidate regions, deep features corresponding to the candidate regions, the feature extraction model being an end-to-end neural network model that uses an image as an input and uses a deep feature of a movable body in the image as an output; calculating f

Assignees

Inventors

Classifications

  • Matching criteria, e.g. proximity measures · CPC title

  • G06T7/246Primary

    using feature-based methods, e.g. the tracking of corners or segments · CPC title

  • G06V20/46Primary

    Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021019627A1 cover?
Embodiments of this application disclose a target tracking method performed at an electronic device. The electronic device obtains a first video stream and detects candidate regions within a current video frame in the first video stream. The electronic device then extracts, from the candidate regions, a deep feature corresponding to each candidate region and calculates a feature similarity for …
Who is the assignee on this patent?
Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06T7/246. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 21 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).