Target detection method and device

US2019347485A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2019347485-A1
Application numberUS-201716347626-A
CountryUS
Kind codeA1
Filing dateNov 7, 2017
Priority dateNov 8, 2016
Publication dateNov 14, 2019
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present application disclose a target detection method and device, and relate to the technical field of video processing. The method comprises: obtaining an image sequence to be detected from a video to be detected according to an image sequence determining algorithm based on video timing (S101), extracting a first CNN feature of the image sequence to be detected based on a pre-trained CNN model, performing feature fusion on the first CNN feature based on a second CNN feature to obtain a first fused CNN feature of the image sequence to be detected (S102); inputting the first fused CNN feature into the first-level classifier, and obtaining first candidate target regions of the image sequence to be detected from an output of the first-level classifier (S103); determining a first input region of the second-level classifier based on the first candidate target regions (S104); obtaining a third CNN feature of the first input region based on the first fused CNN feature (S105); inputting the third CNN feature into the second-level classifier, and obtaining a target detection result for the image sequence to be detected based on the output of the second-level classifier (S106).

First claim

Opening claim text (preview).

1 . A target detection method, comprising: obtaining, from a video to be detected, an image sequence to be detected according to an image sequence determining algorithm based on video timing; extracting a first CNN feature of the image sequence to be detected based on a pre-trained Convolutional Neural Network CNN model, and performing feature fusion on the first CNN feature based on a second CNN feature to obtain a first fused CNN feature of the image sequence to be detected, wherein, the second CNN feature is a CNN feature of a detected image sequence in the video to be detected, and the CNN model comprises a first-level classifier and a second-level classifier, wherein, the first-level classifier is a classifier obtained by training a CNN based on a second fused CNN feature of a sample image sequence and a labeled region in the sample image sequence where a target is located, the second-level classifier is a classifier obtained by training the CNN based on the second fused CNN feature, the labeled region, and an output of the first-level classifier, and the sample image sequence is an image sequence obtained from a sample video according to the image sequence determining algorithm; inputting the first fused CNN feature into the first-level classifier, and obtaining, from the output of the first-level classifier, first candidate target regions in the image sequence to be detected; determining a first input region of the second-level classifier based on the first candidate target regions; obtaining a third CNN feature of the first input region based on the first fused CNN feature; inputting the third CNN feature into the second-level classifier, and obtaining a target detection result for the image sequence to be detected based on an output of the second-level classifier. 2 . The method of claim 1 , wherein, the step of obtaining, from a video to be detected, an image sequence to be detected according to an image sequence determining algorithm based on video timing comprises: obtaining, from the video to be detected, an image sequence to be detected comprising a preset number of images based on a preset image repeatability according to the video timing, wherein, the image repeatability represents the number of repeated images common to two adjacent image sequences obtained from the video to be detected. 3 . The method of claim 1 , wherein, the step of performing feature fusion on the first CNN feature based on a second CNN feature to obtain a first fused CNN feature of the image sequence to be detected comprises: obtaining a third fused CNN feature of a first detected image sequence, wherein, the first detected image sequence is an image sequence that has been detected and is adjacent to the image sequence to be detected according to the video timing, and the third fused CNN feature is determined based on a CNN feature of an image sequence that has been detected before the first detected image sequence; performing feature fusion on the first CNN feature using the third fused CNN feature to obtain the first fused CNN feature of the image sequence to be detected. 4 . The method of claim 3 , wherein, the step of performing feature fusion on the first CNN feature using the third fused CNN feature to obtain the first fused CNN feature of the image sequence to be detected comprises: performing feature fusion on the first CNN feature and the third fused CNN feature to obtain the first fused CNN feature of the image sequence to be detected based on a pre-trained Recurrent Neural Network RNN model, wherein, the RNN model is obtained by training an RNN based on a fused CNN feature of a first sample image sequence and a CNN feature of a second sample sequence, and the first sample image sequence is a sample image sequence adjacent to and before the second sample image sequence according to the video timing. 5 . The method of claim 1 , wherein, the first-level classifier is obtained by: determining the labeled region in the sample image sequence; obtaining the second fused CNN feature; determining initial sample regions in the sample image sequence based on the labeled region, wherein, for each labeled region, there is at least one sample region in the initial sample regions in which a coincidence between the at least one sample region and the labeled region is larger than a preset threshold; performing a first training on the CNN using the second fused CNN feature, the labeled region and the initial sample regions to obtain the first-level classifier and a result of the first training. 6 . The method of claim 5 , wherein, the result of the first training comprises second candidate target regions; the second-level classifier is obtained by: determining a second input region for the second-level classifier based on the second candidate target regions; obtaining a fourth CNN feature of the second input region based on the second fused CNN feature; performing a second training on the CNN based on the fourth CNN feature and the labeled region to obtain the second-level classifier. 7 . The method of claim 6 , wherein, the result of the first training further comprises first probabilities of the second candidate target regions containing the target; the step of determining a second input region for the second-level classifier based on the second candidate target regions comprises: selecting, from the second candidate sample regions, the second input region for the second-level classifier based on a preset non-maximum suppression algorithm and the first probabilities. 8 . The method of claim 7 , wherein, the output of the first-level classifier comprises second probabilities of the first candidate target regions containing the target; the step of determining a first input region for the second-level classifier based on the first candidate target regions comprises: selecting, from the first candidate sample regions, the first input region for the second-level classifier based on the non-maximum suppression algorithm and the second probabilities. 9 . A target detection device, comprising: a sequence obtaining module, configured for obtaining, from a video to be detected, an image sequence to be detected according to an image sequence determining algorithm based on video timing; a feature extracting module, configured for extracting a first CNN feature of the image sequence to be detected based on a pre-trained Convolutional Neural Network CNN model, wherein the CNN model comprises a first-level classifier and a second-level classifier, wherein, the first-level classifier is a classifier obtained by training a CNN based on a second fused CNN feature of a sample image sequence and a labeled region in the sample image sequence where a target is located, the second-level classifier is a classifier obtained by training the CNN based on the second fused CNN feature, the labeled region, and an output of the first-level classifier, and the sample image sequence is an image sequence obtained from a sample video according to the image sequence determining algorithm; a first feature obtaining module, configured for performing feature fusion on the first CNN feature based on a second CNN feature to obtain a first fused CNN feature of the image sequence to be detected, wherein, the second CNN feature is a CNN feature of a detected image sequence in the video to be detected; a region obtaining module, configured for inputting the first fused CNN feature into the first-level classifier, and obtaining, from the output of the first-level classifier, first candidate target regions in the image sequence to be detected; a region determining module, configured for determining a first input region of the second-level classifier based on the first can

Assignees

Inventors

Classifications

  • of extracted features · CPC title

  • Surveillance or monitoring of activities, e.g. for recognising suspicious objects (recognising microscopic objects G06V20/69) · CPC title

  • G06V10/82Primary

    using neural networks · CPC title

  • using classification, e.g. of video objects · CPC title

  • Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2019347485A1 cover?
Embodiments of the present application disclose a target detection method and device, and relate to the technical field of video processing. The method comprises: obtaining an image sequence to be detected from a video to be detected according to an image sequence determining algorithm based on video timing (S101), extracting a first CNN feature of the image sequence to be detected based on a p…
Who is the assignee on this patent?
Hangzhou Hikvision Digital Tec
What technology area does this patent fall under?
Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 14 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).