Synthetic-to-realistic image conversion using generative adversarial network (gan) or other machine learning model
US-2024428568-A1 · Dec 26, 2024 · US
US2021287718A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2021287718-A1 |
| Application number | US-202016814056-A |
| Country | US |
| Kind code | A1 |
| Filing date | Mar 10, 2020 |
| Priority date | Mar 10, 2020 |
| Publication date | Sep 16, 2021 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Implementations generally provide a user interface for video annotation tools. In some implementations, a method includes obtaining at least one video of at least one object performing at least one action displaying one or more portions of the at least one video in a user interface. The method further includes displaying a plurality of annotation tracks in the user interface, where each annotation track of the plurality of annotation tracks is associated with one or more of the at least one object and the at least one action in the at least one video. The method further includes obtaining one or more annotations associated with the at least one video based on the plurality of annotation tracks.
Opening claim text (preview).
1 . A system comprising: one or more processors; and logic encoded in one or more non-transitory computer-readable storage media for execution by the one or more processors and when executed operable to cause the one or more processors to perform operations comprising: obtaining at least one video of at least one object performing at least one action; displaying one or more portions of the at least one video in a user interface; displaying a plurality of annotation tracks in the user interface, wherein each annotation track of the plurality of annotation tracks shows one or more annotations that describe one or more of the at least one object and the at least one action in the at least one video, and wherein the plurality of annotation tracks are displayed in the user interface separately from the one or more portions of the at least one video; and obtaining the one or more annotations based on the plurality of annotation tracks. 2 . The system of claim 1 , wherein the logic when executed is further operable to cause the one or more processors to perform operations comprising: parsing the at least one video into a plurality of segments; identifying one or more segments for the at least one object; and associating each of the one or more segments with the at least one object. 3 . The system of claim 1 , wherein the logic when executed is further operable to cause the one or more processors to perform operations comprising: parsing the at least one video into a plurality of segments; identifying one or more segments for the at least one action; and associating each of the one or more segments with the at least one action. 4 . The system of claim 1 , wherein the logic when executed is further operable to cause the one or more processors to perform operations comprising enabling a user to selectively annotate one or more of the at least one object and the at least one action in the at least one video based on at least one corresponding annotation track of the plurality of annotation tracks. 5 . The system of claim 1 , wherein the at least one video comprises a plurality of videos, and wherein the logic when executed are further operable to cause the one or more processors to perform operations comprising enabling a user to annotate a plurality of videos of a same object to provide the one or more annotations. 6 . The system of claim 1 , wherein the logic when executed is further operable to cause the one or more processors to perform operations comprising generating training data from the at least one video and the one or more annotations. 7 . The system of claim 1 , wherein the one or more annotations include one or more of object information, localization information, and action information. 8 . A non-transitory computer-readable storage medium with program instructions stored thereon, the program instructions when executed by one or more processors are operable to cause the one or more processors to perform operations comprising: obtaining at least one video of at least one object performing at least one action; displaying one or more portions of the at least one video in a user interface; displaying a plurality of annotation tracks in the user interface, wherein each annotation track of the plurality of annotation tracks shows one or more annotations that describe one or more of the at least one object and the at least one action in the at least one video, and wherein the plurality of annotation tracks are displayed in the user interface separately from the one or more portions of the at least one video; and obtaining the one or more annotations based on the plurality of annotation tracks. 9 . The computer-readable storage medium of claim 8 , wherein the instructions when executed are further operable to cause the one or more processors to perform operations comprising: parsing the at least one video into a plurality of segments; identifying one or more segments for the at least one object; and associating each of the one or more segments with the at least one object. 10 . The computer-readable storage medium of claim 8 , wherein the instructions when executed are further operable to cause the one or more processors to perform operations comprising: parsing the at least one video into a plurality of segments; identifying one or more segments for the at least one action; and associating each of the one or more segments with the at least one action. 11 . The computer-readable storage medium of claim 8 , wherein the instructions when executed are further operable to cause the one or more processors to perform operations comprising enabling a user to selectively annotate one or more of the at least one object and the at least one action in the at least one video based on at least one corresponding annotation track of the plurality of annotation tracks. 12 . The computer-readable storage medium of claim 8 , wherein the at least one video comprises a plurality of videos, and wherein the instructions when executed are further operable to cause the one or more processors to perform operations comprising enabling a user to annotate a plurality of videos of a same object to provide the one or more annotations. 13 . The computer-readable storage medium of claim 8 , wherein the instructions when executed are further operable to cause the one or more processors to perform operations comprising generating training data from the at least one video and the one or more annotations. 14 . The computer-readable storage medium of claim 8 , wherein the one or more annotations include one or more of object information, localization information, and action information. 15 . A computer-implemented method comprising: obtaining at least one video of at least one object performing at least one action; displaying one or more portions of the at least one video in a user interface; displaying a plurality of annotation tracks in the user interface, wherein each annotation track of the plurality of annotation tracks shows one or more annotations that describe one or more of the at least one object and the at least one action in the at least one video, and wherein the plurality of annotation tracks are displayed in the user interface separately from the one or more portions of the at least one video; and obtaining the one or more annotations based on the plurality of annotation tracks. 16 . The method of claim 15 , further comprising: parsing the at least one video into a plurality of segments; identifying one or more segments for the at least one object; and associating each of the one or more segments with the at least one object. 17 . The method of claim 15 , further comprising: parsing the at least one video into a plurality of segments; identifying one or more segments for the at least one action; and associating each of the one or more segments with the at least one action. 18 . The method of claim 15 , further comprising enabling a user to selectively annotate one or more of the at least one object and the at least one action in the at least one video based on at least one corresponding annotation track of the plurality of annotation tracks. 19 . The method of claim 15 , wherein the at least one video comprises a plurality of videos, and wherein the method further comprises enabling a user to annotate a plurality of videos of a same object to provide the one or more annotations. 20 . The method of claim 15 , further comprising generating training data from the a least one vide
Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Indicating arrangements {(indicating means incorporated in magazine or cassette G11B23/046 and G11B23/0875; indicating measured values in general G01D)} · CPC title
Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums · CPC title
Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.