Method and System for Retrieving Video Temporal Segments
US-2021004605-A1 · Jan 7, 2021 · US
US11893792B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11893792-B2 |
| Application number | US-202117212687-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 25, 2021 |
| Priority date | Mar 25, 2021 |
| Publication date | Feb 6, 2024 |
| Grant date | Feb 6, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are disclosed for identifying and presenting video content that demonstrates features of a target product. The video content can be accessed, for example, from a media database of user-generated videos that demonstrate one or more features of the target product so that a user can see and hear the product in operation via a product webpage before making a purchasing decision. The product functioning videos supplement any static images of the target product and the textual product description to provide the user with additional context for each of the product's features, depending on the textual product description. The user can quickly and easily interact with the product webpage to access and playback the product functioning video to see and/or hear the product in operation.
Opening claim text (preview).
What is claimed is: 1. A method for identifying and presenting a product video, the method comprising: identifying, by a product identification module and using a neural network trained to identify a set of one or more products, a target product in a keyframe of one or more user-generated content videos to produce at least one candidate video; labeling, by a feature labeling module and using at least one keyframe of the at least one candidate video, the at least one candidate video with one or more feature keywords extracted from a product description of the target product to produce at least one candidate video labeled with features; selecting, by a video quality module and using a deep learning-based image aesthetics predictor model, at least one best quality candidate video labeled with features from the at least one candidate video labeled with features; and providing access to at least a portion of the at least one best quality candidate video labeled with features via a product webpage having the one or more feature keywords. 2. The method of claim 1 , further comprising extracting, by a pre-processing module and using a natural language processor, one or more descriptive words from the product description of the target product to produce an extracted product description, wherein the one or more feature keywords are based on the extracted product description. 3. The method of claim 2 , further comprising extracting, by a feature extraction module and using a part-of-speech tagger, one or more product features from the extracted product description to produce the one or more feature keywords. 4. The method of claim 1 , wherein the target product is identified in the keyframe of the one or more user-generated content videos using a region based convolutional neural network (R-CNN) trained to identify the set of products. 5. The method of claim 1 , further comprising dividing the at least one best quality candidate video labeled with features into at least one segment, wherein the portion of the at least one best quality candidate video labeled with features includes at least one segment of the at least one best quality candidate video labeled with features including the keyframe. 6. The method of claim 1 , wherein the labeling comprises labeling the at least one candidate video as including: a motion feature if (x, y) pixel coordinates of the target product in two or more adjacent keyframes of the at least one candidate video changes by more than a threshold value; an audio feature if a sound localization technique identifies audio within a same region of a keyframe of the at least one candidate video as the target product; and/or an appearance feature if one or more feature vectors of the target product in the two or more adjacent keyframes of the at least one candidate video change by more than a threshold value. 7. The method of claim 6 , further comprising dividing the at least one candidate video labeled with features into at least one segment, wherein the at least a portion of the at least one best quality candidate video labeled with features includes the at least one segment of the at least one candidate video labeled as including a motion feature, an audio feature, and/or an appearance feature. 8. The method of claim 1 , wherein providing access to at least a portion of the at least one best quality candidate video labeled with features via the product webpage includes adding a hyperlink from the one or more feature keywords in the product description to the at least a portion of the at least one best quality candidate video labeled with features, and user selection of the hyperlink causes playback of the at least a portion of the at least one best quality candidate video labeled with features within the product webpage. 9. A system for identifying and presenting a product video, the system comprising: a pre-processing module configured to extract one or more descriptive words from a product description of a target product to produce an extracted product description; a feature extraction module configured to extract one or more product features from the extracted product description to produce one or more feature keywords; a product identification module configured to identify the target product in a keyframe of one or more user-generated content videos to produce at least one candidate video; a feature labeling module configured to label the at least one candidate video with the one or more feature keywords to produce at least one candidate video labeled with features; a video quality module configured to select at least one best quality candidate video labeled with features from the at least one candidate video labeled with features using a deep learning-based image aesthetics predictor model; and an output module configured to provide access to at least a portion of the at least one best quality candidate video labeled with features via an interactive element of a product webpage having the one or more feature keywords. 10. The system of claim 9 , wherein the one or more descriptive words are extracted from the product description using a natural language processor and/or using a part-of-speech tagger. 11. The system of claim 9 , wherein the target product is identified in the keyframe of the one or more user-generated content videos using a region based convolutional neural network (R-CNN) trained to identify a set of products for sale in an e-commerce environment. 12. The system of claim 9 , further comprising dividing the at least one candidate video labeled with features into at least one segment having a user-configurable length, wherein the labeling comprises labeling the at least one candidate video as including: a motion feature if (x, y) pixel coordinates of the target product in two or more adjacent keyframes of the at least one candidate video changes by more than a threshold value; an audio feature if a sound localization technique identifies audio within a same region of a keyframe of the at least one candidate video as the target product; and an appearance feature if one or more feature vectors of the target product in the two or more adjacent keyframes of the at least one candidate video change by more than a threshold value, and wherein the at least a portion of the at least one best quality candidate video labeled with features includes the at least one segment of the at least one candidate video labeled as including a motion feature, an audio feature, and/or an appearance feature. 13. The system of claim 12 , wherein providing access to at least a portion of the at least one best quality candidate video labeled with features includes generating a hyperlink in the product webpage from at least one portion of the product description including the one or more feature keywords describing the motion feature, the audio feature, and/or the sound feature to the at least a portion of the at least one best quality candidate video labeled with features. 14. The system of claim 13 , wherein the at least a portion of the at least one best quality candidate video labeled with features is played in response to a user input selecting the hyperlink, and wherein the selecting includes clicking on the hyperlink or hovering over the hyperlink. 15. A computer program product including one or more non-transitory machine-readable mediums encoded with instructions that when executed by one or more processors cause a process to be carried out for identifying and presenting a product video, the process comprising: extracting one or more descriptive words from a product description of a target product t
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title
using motion, e.g. object motion or camera motion · CPC title
using audio features · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.