In-video product annotation with web information mining

US9355330B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9355330-B2
Application numberUS-201214111149-A
CountryUS
Kind codeB2
Filing dateApr 11, 2012
Priority dateApr 12, 2011
Publication dateMay 31, 2016
Grant dateMay 31, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system provides product annotation in a video to one or more users. The system receives a video from a user, where the video includes multiple video frames. The system extracts multiple key frames from the video and generates a visual representation of the key frame. The system compares the visual representation of the key frame with a plurality of product visual signatures, where each visual signature identifies a product. Based on the comparison of the visual representation of the key frame and a product visual signature, the system determines whether the key frame contains the product identified by the visual signature of the product. To generate the plurality of product visual signatures, the system collects multiple training images comprising multiple of expert product images obtained from an expert product repository, each of which is associated with multiple product images obtained from multiple web resources.

First claim

Opening claim text (preview).

We claim: 1. A computer method for providing product annotation in a video to one or more users, the method comprising: generating a product visual signature for a product by at least: collecting an unannotated expert product image of the product from an expert product repository, searching for a plurality of unannotated product images from a plurality of web resources different from the expert product repository, the plurality of unannotated product images related to the unannotated expert product image, selecting a subset of the plurality of unannotated product images by filtering the plurality of unannotated product images based on a similarity measure to the unannotated expert product image, and generating the product visual signature from the unannotated expert product image and the subset of the plurality of unannotated product images; receiving a video for product annotation, the video comprising a plurality of video frames; extracting a plurality of key frames from the video frames; and for each key frame: generating a visual representation of the key framed; comparing the visual representation with a plurality of product visual signatures including the product visual signature; and determining, based on the comparison, that the key frame contains the product identified by the product visual signature. 2. The method of claim 1 , wherein extracting a plurality of key frames from the video comprises: extracting each of the plurality of key frames at a fixed point of the video. 3. The method of claim 1 , wherein generating the visual signature of a key frame comprises: extracting a plurality of visual features from the key frame; grouping the plurality of visual features into a plurality of clusters; and generating multi-dimensional bag visual words histogram as the visual signature of the key frame. 4. The method of claim 3 , wherein the plurality of visual features of a key frame are scale invariance feature transform (SIFT) descriptors of the key frame. 5. The method of claim 1 , wherein generating the subset of the plurality of unannotated product images represent a set of training images for generating the product visual signature. 6. The method of claim 1 , wherein generating the product visual signature further comprises: applying a collective sparsification scheme to the, subset of the plurality of unannotated product images, wherein information unrelated to the product contained in a related product image is reduced in generating the product visual signature. 7. The method of claim 1 , wherein generating the product visual signature further comprises: iteratively updating the product visual signature through a pre-determined number of iterations, wherein each of the iterations computes a respective similarity measure. 8. The method of claim 1 , further comprising: collecting a plurality of unannotated expert product images of the product at different views of the product, wherein the subset of the subset of the plurality of unannotated product images comprise unannotated product images corresponding to the plurality of unannotated expert product images. 9. The method of claim 1 , wherein determining that the key frame contains the product identified comprises: estimating product relevance between the visual representation of the key frame with each product visual signature of the plurality of the product visual signatures; and determining that the key frame contains the product identified by the product visual signature based on the estimated product relevance. 10. A non-transitory computer-readable storage medium storing executable computer program instructions for providing on-demand digital assets hosting services to one or more users, the computer program instructions when executed by a processor cause a system to perform operations comprising: generating a product visual signature for a product by at least: collecting an unannotated expert product image of the product from an expert product repository, searching for a plurality of unannotated product images from a plurality of web resources different from the expert product repository, the plurality of unannotated product images related to the unannotated expert product image, selecting a subset of the plurality of unannotated product images by filtering the plurality of unannotated product images based on a similarity measure to the unannotated expert product image, and generating the product visual signature from the unannotated expert product image and the subset of the plurality of unannotated product images; receiving a video from a user for product annotation, the video comprising a plurality of video frames; extracting a plurality of key frames from the video; and for each key frame: extracting a plurality of visual features from the key frame; grouping the plurality of visual features into a plurality of clusters; and generating a multi-dimensional bag visual words histogram as a visual representation of the key frame; comparing the visual representation with a plurality of product visual signatures comprising the product visual signature; determining, based on the comparison, whether the key frame contains the product identified by the product visual signature. 11. The computer-readable storage medium of claim 10 , wherein the operations further comprise: extracting each of the plurality of key frames at a fixed point of the video. 12. The computer-readable storage medium of claim 10 , wherein the plurality of visual features of a key frame are scale invariance feature transform (SIFT) descriptors of the key frame. 13. The computer-readable storage medium of claim 10 , wherein generating the subset of the plurality of unannotated product images represent a set of training images for generating the product visual signature. 14. The computer-readable storage medium of claim 10 , wherein generating the product visual signature further comprises: applying a collective sparsification scheme to the, subset of the plurality of unannotated product images, wherein information unrelated to the product contained in a related product image is reduced in generating the product visual signature. 15. The computer-readable storage medium of claim 10 , wherein generating the product visual signature further comprises: iteratively updating the product visual signature through a pre-determined number of iterations, wherein each of the iterations computes a respective similarity measure. 16. The computer-readable storage medium of claim 10 , wherein the operations further comprise: collecting a plurality of unannotated expert product images of the product at different views of the product, wherein the subset of the subset of the plurality of unannotated product images comprise unannotated product images corresponding to the plurality of unannotated expert product images. 17. The computer-readable storage medium of claim 10 , wherein determining whether the key frame contains the product comprises: estimating product relevance between the visual representation of the key frame with each product visual signature of the plurality of the product visual signatures; and determining that thee key frame contains the product identified by the product visual signature based on the estimated product relevance.

Assignees

Inventors

Classifications

  • G06V20/46Primary

    Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title

  • Pattern recognition · CPC title

  • G06K9/4642Primary

    Physics · mapped topic

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9355330B2 cover?
A system provides product annotation in a video to one or more users. The system receives a video from a user, where the video includes multiple video frames. The system extracts multiple key frames from the video and generates a visual representation of the key frame. The system compares the visual representation of the key frame with a plurality of product visual signatures, where each visual…
Who is the assignee on this patent?
Chua Tat Seng, Li Guangda, Lu Zheng, and 2 more
What technology area does this patent fall under?
Primary CPC classification G06V20/46. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 31 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).