Enhancing review videos
US-2022115043-A1 · Apr 14, 2022 · US
US12417487B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12417487-B2 |
| Application number | US-202117562423-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 27, 2021 |
| Priority date | Dec 27, 2021 |
| Publication date | Sep 16, 2025 |
| Grant date | Sep 16, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system for assisting users in listing items for sale in an electronic marketplace is disclosed. A video is received from a user device associated with a user, the video including a video stream depicting a plurality of items to be listed for sale in the electronic marketplace. Respective images depicting respective items among the plurality of items are obtained from the video stream, and respective attributes of the respective items among the plurality of items are extracted from the video. Respective listings for sale of the respective items are generated based at least in part on the respective attributes of the respective items among the plurality of items, and the respective listings for sale of the respective items are displayed to the user.
Opening claim text (preview).
What is claimed is: 1. A system comprising: a processor; and memory including instructions which, when executed by the processor, cause the processor to: receive a video from a user device associated with a user, the video including a video stream and an audio stream, the video stream depicting a plurality of items to be listed in an electronic marketplace, the plurality of items being separate items and including a first item and a second item; process the audio stream of the video using a first trained machine learning model to convert at least a portion of the audio stream to text; determine a first portion of the video associated with the first item by: performing motion detection analysis of the video stream to identify a first hovering of a camera at a first portion of the video stream corresponding to the first portion of the video, and analyzing the text from a first portion of the audio stream to identify the first item, the first portion of the audio stream corresponding to the first portion of the video; determine a second portion of the video associated with the second item by: performing motion detection analysis of the video stream to identify a second hovering of the camera at a second portion of the video stream corresponding to the second portion of the video, and analyzing the text from a second portion of the audio stream to identify the second item, the second portion of the audio stream corresponding to the second portion of the video; responsive to determining the first portion of the video and the second portion of the video, generate a first listing for the first item and a second listing for the second item by: extracting a first frame from the first portion of the video stream and a second frame from the second portion of the video stream, the first frame providing a first image of the first item and the second frame providing a second image of the second item, analyzing the text using a second trained machine learning model to extract a first attribute category and value pair of an attribute of the first item from text associated with the first portion of the video, and a second attribute category and value pair of an attribute of the second item from text associated with the second portion of the video, and generating the first listing for the first item and the second listing for the second item, the first listing generated to include the first image and using the first attribute category and value pair to populate a corresponding field of the first listing, and the second listing generated to include the second image and using the second attribute category and value pair to populate a corresponding field of the second listing; and generating an electronic store, the electronic store including the first listing and the second listing, and causing the electronic store to be displayed to at least one potential buyer. 2. The system of claim 1 , wherein the first portion of the video associated with the first item is determined by: determining, based on the audio stream, a timestamp identifying a time, in the video stream, that depicts the first item among the plurality of items, and wherein the first image of the first item is extracted from the video stream based on the timestamp. 3. The system of claim 1 , wherein the instructions, when executed by the processor, cause the processor to convert the audio stream of the video to text using at least one selected from the following: i) a general purpose speech recognition model, ii) an electronic commerce language aware speech recognition model, and iii) a model trained to boost hot words associated with a product category. 4. The system of claim 3 , wherein the instructions, when executed by the processor, cause the processor to extract the first attribute category and value pair of the attribute of the first item at least by analyzing, using a named entity recognition model, the text associated with the first portion of the video corresponding to the first item. 5. The system of claim 1 , wherein the instructions, when executed by the processor, further cause the processor to generate, based on a plurality of modalities descriptive of the first item, a vector representing the first item. 6. The system of claim 5 , wherein the instructions, when executed by the processor, cause the processor to generate the vector by applying a trained multimodal model to: i) the first image of the first item; and ii) the first attribute category and value pair of the attribute of the first item. 7. The system of claim 6 , wherein the instructions, when executed by the processor, cause the processor to: search, using the vector representing the first item, a product catalogue to find one or more similar items listed in the electronic marketplace, and wherein the first listing for the first item is generated by: extracting one or more attributes of the one or more similar items; and populating one or more corresponding fields of the first listing using the one or more attributes. 8. The system of claim 1 , wherein the instructions, when executed by the processor, further cause the processor to generate an electronic marketplace store, the electronic marketplace store including the first listing and the second listing. 9. A method comprising: receiving a video from a user device associated with a user, the video including a video stream and an audio stream, the video stream depicting a plurality of items to be listed in an electronic marketplace, the plurality of items being separate items and including a first item and a second item; processing the audio stream of the video using a first trained machine learning model to convert at least a portion of the audio stream to text; determining a first portion of the video associated with the first item by: performing motion detection analysis of the video stream to identify a first hovering of a camera at a first portion of the video stream corresponding to the first portion of the video, and analyzing the text from a first portion of the audio stream to identify the first item, the first portion of the audio stream corresponding to the first portion of the video; determining a second portion of the video associated with the second item by: performing motion detection analysis of the video stream to identify a second hovering of the camera at a second portion of the video stream corresponding to the second portion of the video, and analyzing the text from a second portion of the audio stream to identify the second item, the second portion of the audio stream corresponding to the second portion of the video; responsive to determining the first portion of the video and the second portion of the video, generating a first listing for the first item and a second listing for the second item by: extracting a first frame from the first portion of the video stream and a second frame from the second portion of the video stream, the first frame providing a first image of the first item and the second frame providing a second image of the second item, analyzing the text using a second trained machine learning model to extract a first attribute category and value pair of an attribute of the first item from text associated with the first portion of the video, and a second attribute category and value pair of an attribute of the second item from text associated with the second portion of the video, and generating the first listing for the first item and the second listing for the second item, the first listing generated to include the first image and using the first attribute category and value pair to populate a corresponding field of the first listing, and the second listing generated to include the second image
using intermediate agents · CPC title
Extraction of image or video features · CPC title
Named entity recognition · CPC title
Advertisement creation · CPC title
graphically representing goods, e.g. 3D product representation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.