Enhanced siamese trackers

US2018129934A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018129934-A1
Application numberUS-201715621741-A
CountryUS
Kind codeA1
Filing dateJun 13, 2017
Priority dateNov 7, 2016
Publication dateMay 10, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one configuration, a visual object tracking apparatus is provided that receives a position of an object in a first frame of a video, and determines a current position of the object in subsequent frames of the video using a Siamese neural network To facilitate determining the current position of the object, the apparatus may adjust a spatial resolution of an image, adjust a size of a probe region, and/or adjust a scale of a plurality of sampled images. In one configuration, a visual object tracking using a Siamese neural network is provided. The apparatus feeds outputs from a plurality of subnetworks of the Siamese neural network to a comparison layer. In addition, the apparatus compares, at the comparison layer, inputs from the plurality of subnetworks to generate a comparison result. Further, the apparatus combines comparison results based on weights to obtain a final comparison result.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of visual object tracking, comprising: receiving a position of an object in a first frame of a video; and determining a current position of the object in subsequent frames of the video using a Siamese neural network, wherein the determining the current position of the object comprises one or more of: adjusting a spatial resolution of a first image from the first frame of the video and a second image sampled from a current frame under processing, the first image and the second image being inputs to the Siamese neural network; adjusting a size of a probe region on the current frame under processing based on a metric of movement of the object from one frame to another; or adjusting a scale of a plurality of images sampled from the current frame under processing, the plurality of images being inputs to the Siamese neural network. 2 . The method of claim 1 , wherein the spatial resolution of the first image and the second image is adjusted based on a size of the object and an amount of spatial reduction caused by the Siamese neural network. 3 . The method of claim 1 , wherein the adjusting the spatial resolution of the first image and the second image comprises up-sampling or down-sampling a first image region on the first frame and a second image region on the current frame. 4 . The method of claim 1 , wherein the scale of the plurality of images comprises sizes and number of the plurality of images. 5 . The method of claim 1 , wherein the scale of the plurality of images is adjusted based on an estimated scale in a frame immediately before the current frame. 6 . A method of visual object tracking using a Siamese neural network, comprising: feeding outputs from a plurality of layers of a first subnetwork of the Siamese neural network and a second subnetwork of the Siamese neural network to a comparison layer; comparing, at the comparison layer for each layer of the plurality of layers, a first input from the layer in the first subnetwork with a second input from the layer in the second subnetwork to obtain a comparison result for the layer; and combining comparison results for the plurality of layers based on weights dynamically generated for the plurality of layers to obtain a final comparison result. 7 . The method of claim 6 , wherein the first subnetwork and the second subnetwork are identical. 8 . The method of claim 6 , wherein the plurality of layers are penultimate layers of the first subnetwork and the second subnetwork. 9 . The method of claim 6 , wherein the weights are generated by a neural network that is trained concurrently with the Siamese neural network. 10 . The method of claim 6 , wherein the final comparison result is a weighted sum of the comparison results for the plurality of layers. 11 . The method of claim 6 , wherein the final comparison result is a weighted average of the comparison results for the plurality of layers. 12 . The method of claim 6 , further comprising: inputting, of an initial frame, a query region including a target into the layers of the first subnetwork of the Siamese neural network; inputting, of a current frame, at least a portion of a probe region into the layers of the second subnetwork of the Siamese neural network; and determining based on the final comparison result whether the at least the portion of the probe region includes the target. 13 . The method of claim 12 , further comprising: comparing a first plurality of subregions of a subregion of the probe region of the current frame with a second plurality of subregions of the query region of the initial frame; determining a similarity score for each of the first plurality of subregions based on the comparison; determining that a first set of subregions of the first plurality of subregions is occluded when the similarity score for each subregion of the first set of subregions is less than a first threshold and when the similarity score for each subregion of a second set of subregions is greater than a second threshold, the second threshold being greater than the first threshold, wherein the inputted at least the portion of the probe region comprises the second set of subregions of the first plurality of subregions. 14 . An apparatus for visual object tracking, comprising: a memory; and at least one processor coupled to the memory and configured to: receive a position of an object in a first frame of a video; and determine a current position of the object in subsequent frames of the video using a Siamese neural network, wherein, to determine the current position of the object, the at least one processor is configured to perform one or more of: adjusting a spatial resolution of a first image from the first frame of the video and a second image sampled from a current frame under processing, the first image and the second image being inputs to the Siamese neural network; adjusting a size of a probe region on the current frame under processing based on a metric of movement of the object from one frame to another; or adjusting a scale of a plurality of images sampled from the current frame under processing, the plurality of images being inputs to the Siamese neural network. 15 . The apparatus of claim 14 , wherein the spatial resolution of the first image and the second image is adjusted based on a size of the object and an amount of spatial reduction caused by the Siamese neural network. 16 . The apparatus of claim 14 , wherein, to adjust the spatial resolution of the first image and the second image, the at least one processor is configured to up-sample or down-sample a first image region on the first frame and a second image region on the current frame. 17 . The apparatus of claim 14 , wherein the scale of the plurality of images comprises sizes and number of the plurality of images. 18 . The apparatus of claim 14 , wherein the scale of the plurality of images is adjusted based on an estimated scale in a frame immediately before the current frame. 19 . An apparatus for visual object tracking using a Siamese neural network, comprising: a memory; and at least one processor coupled to the memory and configured to: feed outputs from a plurality of layers of a first subnetwork of the Siamese neural network and a second subnetwork of the Siamese neural network to a comparison layer; compare, at the comparison layer for each layer of the plurality of layers, a first input from the layer in the first subnetwork with a second input from the layer in the second subnetwork to obtain a comparison result for the layer; and combine comparison results for the plurality of layers based on weights dynamically generated for the plurality of layers to obtain a final comparison result. 20 . The apparatus of claim 19 , wherein the first subnetwork and the second subnetwork are identical. 21 . The apparatus of claim 19 , wherein the plurality of layers are penultimate layers of the first subnetwork and the second subnetwork. 22 . The apparatus of claim 19 , wherein the weights are generated by a neural network that is trained concurrently with the Siamese neural network. 23 . The apparatus of claim 19 , wherein the final comparison result is a weighted sum of the comparison results for the plurality of layers. 24 . The apparatus of claim 19 , wherein the final comparison result is a weighted average of the comparison results for the plurality of

Assignees

Inventors

Classifications

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • Combinations of networks · CPC title

  • Detecting or recognising potential candidate objects based on visual cues, e.g. shapes · CPC title

  • G06V10/82Primary

    using neural networks · CPC title

  • Supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018129934A1 cover?
In one configuration, a visual object tracking apparatus is provided that receives a position of an object in a first frame of a video, and determines a current position of the object in subsequent frames of the video using a Siamese neural network To facilitate determining the current position of the object, the apparatus may adjust a spatial resolution of an image, adjust a size of a probe re…
Who is the assignee on this patent?
Qualcomm Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 10 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).