Automated Caption Generation from a Dataset

US2022147708A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022147708-A1
Application numberUS-202017094435-A
CountryUS
Kind codeA1
Filing dateNov 10, 2020
Priority dateNov 10, 2020
Publication dateMay 12, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A dataset captioning system is described that generates captions of text to describe insights identified from a dataset, automatically and without user intervention. To do so, given an input of a dataset the dataset captioning system determines which data insights are likely to support potential visualizations of the dataset, generates text based on these insights, orders the text, processes the ordered text for readability, and then outputs the text as a caption. These techniques also include adjustments made to the complexity of the text, globalization of the text, inclusion of links to outside sources of information, translation of the text, and so on as part of generating the caption.

First claim

Opening claim text (preview).

1 . In a digital medium automated caption generation environment, a method implemented by a computing device, the method comprising: generating, by the computing device automatically and without user intervention, a caption that textually describes a dataset having a plurality of data entries organized as a plurality of data subsets, the generating including: determining which datatypes are included in the plurality of data subsets, respectively; identifying a composition of the dataset based the datatypes; determining which data insights correspond to the composition; generating text, based on the determined data insights, from the plurality of data entries of the dataset; forming the caption based at least in part on the text. 2 . The method as described in claim 1 , wherein the forming includes: generating scores based on the text generated for the data insights; and ranking the text generated for the data insights based on the scores. 3 . The method as described in claim 2 , wherein the forming of the caption includes ordering the text based on the ranking. 4 . The method as described in claim 2 , wherein the scores quantify the text corresponding to the data insights based on degrees of specificity. 5 . The method as described in claim 1 , wherein the plurality of datatypes includes quantitative, nominal, ordinal, temporal, or semantic. 6 . The method as described in claim 1 , wherein the data insights include anomaly, cyclic pattern, derived value, relative value, threshold amount of change, or extremes based on a minimum amount or a maximum amount. 7 . The method as described in claim 1 , wherein the forming of the caption includes adjusting language complexity of the text. 8 . The method as described in claim 1 , wherein the forming of the caption includes editing text generated for a first said data insight based on text generated for a second said data insight as part of the caption. 9 . The method as described in claim 1 , wherein the forming of the caption includes generating a link included as part of the caption, the link generated based on at least a portion of the text and is user selectable to navigate to a network address. 10 . The method as described in claim 1 , wherein the identifying of the composition is based on which combination of the datatypes is included in the dataset. 11 . The method as described in claim 10 , wherein the composition is: temporal based on inclusion of a temporal datatype and a quantitative datatype as part of the datatypes of the plurality of data subsets; or segment comparison based on inclusion of a quantitative datatype and a quantitative datatype as part of the datatypes of the plurality of data subsets. 12 . The method as described in claim 1 , further comprising receiving a user input specifying the dataset via a user interface, the dataset including a portion of a table of a larger dataset in a user interface and the data subsets are configured as rows or columns of the table. 13 . In a digital medium automated caption generation environment, a system comprising: a dataset input module implemented at least partially in hardware of a computing device to receive a dataset having a plurality of data entries: a text generation module implemented at least partially in hardware of the computing device to generate text based on a plurality of data insights from the plurality of data entries of the dataset; and a caption formation module implemented at least partially in hardware of the computing device to generate a caption based on the text, the caption formation module including: a score generation module to generate scores corresponding to the data insights, respectively; a ranking module configured to rank the text based on the scores corresponding to respective said data insights; and a text ordering module configured to order the text as part of the caption based on respective said scores. 14 . The system as described in claim 13 , wherein the scores quantify the text based on degrees of specificity. 15 . The system as described in claim 13 , wherein the caption formation module further comprises a complexity adjustment module configured to adjust language complexity of the text as part of the caption. 16 . The system as described in claim 13 , wherein the caption formation module further comprises a readability module to edit the text generated for a first said data insight based on text generated for a second said data insight. 17 . The system as described in claim 13 , wherein the caption formation module further comprises a readability module to edit the text for safety. 18 . The system as described in claim 13 , wherein the caption formation module further comprises: a link generation module configured to generate a link as part of the caption, the link generated based on at least a portion of the text and is user selectable to navigate to a network address; and a translation module configured to translate the text. 19 . In a digital medium automated caption generation environment, a system comprising: means for generating, automatically and without user intervention, a caption that textually describes a dataset having a plurality of data entries, the generating means including: means for receiving a dataset having a plurality of data entries: means for generating text based on a plurality of data insights from the plurality of data entries of the dataset; means for ordering the text based on a ranking; and means for editing the ordered text for readability such that text generated for a first said data insight is edited based on text generated for a second said data insight. 20 . The system as described in claim 19 , further comprising: means for adjusting language complexity of the text as part of the caption; means for checking safety of the text as part of the caption; means for translating the text as part of the caption; or means for generating a link included as part of the caption, the link generated based on at least a portion of the text and is user selectable to navigate to a network address.

Assignees

Inventors

Classifications

  • Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title

  • Named entity recognition · CPC title

  • Annotation, e.g. comment data or footnotes · CPC title

  • G06F40/56Primary

    Natural language generation · CPC title

  • G06F40/216Primary

    using statistical methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022147708A1 cover?
A dataset captioning system is described that generates captions of text to describe insights identified from a dataset, automatically and without user intervention. To do so, given an input of a dataset the dataset captioning system determines which data insights are likely to support potential visualizations of the dataset, generates text based on these insights, orders the text, processes th…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/56. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 12 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).