Deep Image-to-Image Network Learning for Medical Image Analysis
US-2017200067-A1 · Jul 13, 2017 · US
US10339408B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10339408-B2 |
| Application number | US-201615388039-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 22, 2016 |
| Priority date | Dec 22, 2016 |
| Publication date | Jul 2, 2019 |
| Grant date | Jul 2, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure provides a method and device for visual appearance based person identity inference. The method may include obtaining a plurality of input images. The input images include a gallery set of images containing, persons-of-interest and a probe set of images containing person detections, and one input image corresponds to one person. The method may further include extracting N feature maps from the input images using a Deep Neural Network, N being a natural number; constructing N structure samples of the N feature maps using conditional random field (CRF) graphical models; learning the N structure samples from an implicit common latent feature space embedded in the N structure samples; and according to the learned structures, identifying one or more images from the probe set containing a same person-of-interest as an image in the gallery set.
Opening claim text (preview).
What is claimed is: 1. A method for visual appearance based person identity inference, comprising: obtaining a plurality of input images, wherein the input images include a gallery set of images containing persons-of-interest and a probe set of images containing person detections, and one input image corresponds to one person; extracting N feature maps from the input images using a Deep Neural Network (DNN), N being a natural number; constructing N structure samples of the N feature maps using conditional random field (CRF) graphical models, comprising: for a feature map, constructing an initial graph structure by K Nearest Neighbor (KNN) based on feature similarity in a feature space corresponding to the feature map, the graph model including nodes and edges, a node representing one person; performing structure permutations by a plurality of iterations of KNN computation in N feature spaces with a Quasi-Gibbs Structure Sampling (QGSS) process; assigning labels to the nodes that minimize a conditional random field (CRF) energy function over all possible labels, wherein the all possible labels represent all different persons-of-interest in the gallery set; and deriving the N structure samples from the plurality of iterations and the assigned labels; learning the N structure samples from an implicit common latent feature space embedded in the N structure samples; and according to the learned structures, identifying one or more images from the probe set containing a same person-of-interest as an image in the gallery set. 2. The method according to claim 1 , wherein: the plurality of iterations include first a iterations and later b iterations, wherein a and b are natural numbers; results of the first a iterations are discarded, and the N structure samples are derived from the later b iterations. 3. The method according to claim 1 , wherein: a node in the graph model has m possible states, m representing a quantity of different persons-of-interest in the gallery set. 4. The method according to claim 1 , wherein: the labels are assigned to the nodes according to the graph structure after the plurality of iterations are finished. 5. The method according to claim 1 , wherein: a graph of a CRF model representing a re-identification structure is learned through the N structure samples; and an energy minimization with sparse approach is performed to cut the graph into a plurality of clusters, each cluster containing images corresponding to one of the persons-of-interest. 6. The method according to claim 1 , wherein N different kernels are used in the DNN for convolutions with the images in the gallery set and the probe set; and the N feature maps are produced by a last couple of convolution layers in the DNN. 7. The method according to claim 5 , wherein the CRF model with pairwise potentials is: p ( Y | X ) = 1 Z ( X ) ∏ 〈 i , j 〉 ψ ij ( y i , y j , X ) ∏ i ψ i ( y i , X ) wherein: <i,j> is product over all edges in the graph, ψ i is a node potential and ψ i is an edge potential, X denotes the common latent features derived from the N structure samples implicitly; and Y denotes to labeling of person-of-interest candidates. 8. A device for visual appearance based person identity inference, comprising one or more processors configured to: obtain a plurality of input images, wherein the input images include a gallery set of images containing persons-of-interest and a probe set of images containing person detections, and one input image corresponds to one person; extract N feature maps from the input images using a Deep Neural Network (DNN), N being a natural number; construct N structure samples of the N feature maps using conditional random field (CRF) graphical models, comprising: for a feature map, constructing an initial graph structure by K Nearest Neighbor (KNN) based on feature similarity in a feature space corresponding to the feature map, the graph model including nodes and edges, a node representing one person; performing structure permutations by a plurality of iterations of KNN computation in N feature spaces with a Quasi-Gibbs Structure Sampling (QGSS) process; assigning labels to the nodes that minimize a conditional random field (CRF) energy function over all possible labels, wherein the all possible labels represent all different persons-of-interest in the gallery set; and deriving the N structure samples from the plurality of iterations and the assigned labels; learn the N structure samples from an implicit common latent feature space embedded in the N structure samples; and accor
using probabilistic graphical models from image or video features, e.g. Markov models or Bayesian networks · CPC title
using neural networks · CPC title
using classification, e.g. of video objects · CPC title
Graphical models, e.g. Bayesian networks · CPC title
Distances to closest patterns, e.g. nearest neighbour classification · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.