Providing a user interface for video annotation tools

US2021287718A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021287718-A1
Application numberUS-202016814056-A
CountryUS
Kind codeA1
Filing dateMar 10, 2020
Priority dateMar 10, 2020
Publication dateSep 16, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Implementations generally provide a user interface for video annotation tools. In some implementations, a method includes obtaining at least one video of at least one object performing at least one action displaying one or more portions of the at least one video in a user interface. The method further includes displaying a plurality of annotation tracks in the user interface, where each annotation track of the plurality of annotation tracks is associated with one or more of the at least one object and the at least one action in the at least one video. The method further includes obtaining one or more annotations associated with the at least one video based on the plurality of annotation tracks.

First claim

Opening claim text (preview).

1 . A system comprising: one or more processors; and logic encoded in one or more non-transitory computer-readable storage media for execution by the one or more processors and when executed operable to cause the one or more processors to perform operations comprising: obtaining at least one video of at least one object performing at least one action; displaying one or more portions of the at least one video in a user interface; displaying a plurality of annotation tracks in the user interface, wherein each annotation track of the plurality of annotation tracks shows one or more annotations that describe one or more of the at least one object and the at least one action in the at least one video, and wherein the plurality of annotation tracks are displayed in the user interface separately from the one or more portions of the at least one video; and obtaining the one or more annotations based on the plurality of annotation tracks. 2 . The system of claim 1 , wherein the logic when executed is further operable to cause the one or more processors to perform operations comprising: parsing the at least one video into a plurality of segments; identifying one or more segments for the at least one object; and associating each of the one or more segments with the at least one object. 3 . The system of claim 1 , wherein the logic when executed is further operable to cause the one or more processors to perform operations comprising: parsing the at least one video into a plurality of segments; identifying one or more segments for the at least one action; and associating each of the one or more segments with the at least one action. 4 . The system of claim 1 , wherein the logic when executed is further operable to cause the one or more processors to perform operations comprising enabling a user to selectively annotate one or more of the at least one object and the at least one action in the at least one video based on at least one corresponding annotation track of the plurality of annotation tracks. 5 . The system of claim 1 , wherein the at least one video comprises a plurality of videos, and wherein the logic when executed are further operable to cause the one or more processors to perform operations comprising enabling a user to annotate a plurality of videos of a same object to provide the one or more annotations. 6 . The system of claim 1 , wherein the logic when executed is further operable to cause the one or more processors to perform operations comprising generating training data from the at least one video and the one or more annotations. 7 . The system of claim 1 , wherein the one or more annotations include one or more of object information, localization information, and action information. 8 . A non-transitory computer-readable storage medium with program instructions stored thereon, the program instructions when executed by one or more processors are operable to cause the one or more processors to perform operations comprising: obtaining at least one video of at least one object performing at least one action; displaying one or more portions of the at least one video in a user interface; displaying a plurality of annotation tracks in the user interface, wherein each annotation track of the plurality of annotation tracks shows one or more annotations that describe one or more of the at least one object and the at least one action in the at least one video, and wherein the plurality of annotation tracks are displayed in the user interface separately from the one or more portions of the at least one video; and obtaining the one or more annotations based on the plurality of annotation tracks. 9 . The computer-readable storage medium of claim 8 , wherein the instructions when executed are further operable to cause the one or more processors to perform operations comprising: parsing the at least one video into a plurality of segments; identifying one or more segments for the at least one object; and associating each of the one or more segments with the at least one object. 10 . The computer-readable storage medium of claim 8 , wherein the instructions when executed are further operable to cause the one or more processors to perform operations comprising: parsing the at least one video into a plurality of segments; identifying one or more segments for the at least one action; and associating each of the one or more segments with the at least one action. 11 . The computer-readable storage medium of claim 8 , wherein the instructions when executed are further operable to cause the one or more processors to perform operations comprising enabling a user to selectively annotate one or more of the at least one object and the at least one action in the at least one video based on at least one corresponding annotation track of the plurality of annotation tracks. 12 . The computer-readable storage medium of claim 8 , wherein the at least one video comprises a plurality of videos, and wherein the instructions when executed are further operable to cause the one or more processors to perform operations comprising enabling a user to annotate a plurality of videos of a same object to provide the one or more annotations. 13 . The computer-readable storage medium of claim 8 , wherein the instructions when executed are further operable to cause the one or more processors to perform operations comprising generating training data from the at least one video and the one or more annotations. 14 . The computer-readable storage medium of claim 8 , wherein the one or more annotations include one or more of object information, localization information, and action information. 15 . A computer-implemented method comprising: obtaining at least one video of at least one object performing at least one action; displaying one or more portions of the at least one video in a user interface; displaying a plurality of annotation tracks in the user interface, wherein each annotation track of the plurality of annotation tracks shows one or more annotations that describe one or more of the at least one object and the at least one action in the at least one video, and wherein the plurality of annotation tracks are displayed in the user interface separately from the one or more portions of the at least one video; and obtaining the one or more annotations based on the plurality of annotation tracks. 16 . The method of claim 15 , further comprising: parsing the at least one video into a plurality of segments; identifying one or more segments for the at least one object; and associating each of the one or more segments with the at least one object. 17 . The method of claim 15 , further comprising: parsing the at least one video into a plurality of segments; identifying one or more segments for the at least one action; and associating each of the one or more segments with the at least one action. 18 . The method of claim 15 , further comprising enabling a user to selectively annotate one or more of the at least one object and the at least one action in the at least one video based on at least one corresponding annotation track of the plurality of annotation tracks. 19 . The method of claim 15 , wherein the at least one video comprises a plurality of videos, and wherein the method further comprises enabling a user to annotate a plurality of videos of a same object to provide the one or more annotations. 20 . The method of claim 15 , further comprising generating training data from the a least one vide

Assignees

Inventors

Classifications

  • Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title

  • G06V10/774Primary

    Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • G11B27/34Primary

    Indicating arrangements  {(indicating means incorporated in magazine or cassette G11B23/046 and G11B23/0875; indicating measured values in general G01D)} · CPC title

  • Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums · CPC title

  • Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021287718A1 cover?
Implementations generally provide a user interface for video annotation tools. In some implementations, a method includes obtaining at least one video of at least one object performing at least one action displaying one or more portions of the at least one video in a user interface. The method further includes displaying a plurality of annotation tracks in the user interface, where each annotat…
Who is the assignee on this patent?
Sony Corp
What technology area does this patent fall under?
Primary CPC classification G06V10/774. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Sep 16 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).