Descriptive insight generation and presentation system
US-2021350068-A1 · Nov 11, 2021 · US
US2022147708A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2022147708-A1 |
| Application number | US-202017094435-A |
| Country | US |
| Kind code | A1 |
| Filing date | Nov 10, 2020 |
| Priority date | Nov 10, 2020 |
| Publication date | May 12, 2022 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A dataset captioning system is described that generates captions of text to describe insights identified from a dataset, automatically and without user intervention. To do so, given an input of a dataset the dataset captioning system determines which data insights are likely to support potential visualizations of the dataset, generates text based on these insights, orders the text, processes the ordered text for readability, and then outputs the text as a caption. These techniques also include adjustments made to the complexity of the text, globalization of the text, inclusion of links to outside sources of information, translation of the text, and so on as part of generating the caption.
Opening claim text (preview).
1 . In a digital medium automated caption generation environment, a method implemented by a computing device, the method comprising: generating, by the computing device automatically and without user intervention, a caption that textually describes a dataset having a plurality of data entries organized as a plurality of data subsets, the generating including: determining which datatypes are included in the plurality of data subsets, respectively; identifying a composition of the dataset based the datatypes; determining which data insights correspond to the composition; generating text, based on the determined data insights, from the plurality of data entries of the dataset; forming the caption based at least in part on the text. 2 . The method as described in claim 1 , wherein the forming includes: generating scores based on the text generated for the data insights; and ranking the text generated for the data insights based on the scores. 3 . The method as described in claim 2 , wherein the forming of the caption includes ordering the text based on the ranking. 4 . The method as described in claim 2 , wherein the scores quantify the text corresponding to the data insights based on degrees of specificity. 5 . The method as described in claim 1 , wherein the plurality of datatypes includes quantitative, nominal, ordinal, temporal, or semantic. 6 . The method as described in claim 1 , wherein the data insights include anomaly, cyclic pattern, derived value, relative value, threshold amount of change, or extremes based on a minimum amount or a maximum amount. 7 . The method as described in claim 1 , wherein the forming of the caption includes adjusting language complexity of the text. 8 . The method as described in claim 1 , wherein the forming of the caption includes editing text generated for a first said data insight based on text generated for a second said data insight as part of the caption. 9 . The method as described in claim 1 , wherein the forming of the caption includes generating a link included as part of the caption, the link generated based on at least a portion of the text and is user selectable to navigate to a network address. 10 . The method as described in claim 1 , wherein the identifying of the composition is based on which combination of the datatypes is included in the dataset. 11 . The method as described in claim 10 , wherein the composition is: temporal based on inclusion of a temporal datatype and a quantitative datatype as part of the datatypes of the plurality of data subsets; or segment comparison based on inclusion of a quantitative datatype and a quantitative datatype as part of the datatypes of the plurality of data subsets. 12 . The method as described in claim 1 , further comprising receiving a user input specifying the dataset via a user interface, the dataset including a portion of a table of a larger dataset in a user interface and the data subsets are configured as rows or columns of the table. 13 . In a digital medium automated caption generation environment, a system comprising: a dataset input module implemented at least partially in hardware of a computing device to receive a dataset having a plurality of data entries: a text generation module implemented at least partially in hardware of the computing device to generate text based on a plurality of data insights from the plurality of data entries of the dataset; and a caption formation module implemented at least partially in hardware of the computing device to generate a caption based on the text, the caption formation module including: a score generation module to generate scores corresponding to the data insights, respectively; a ranking module configured to rank the text based on the scores corresponding to respective said data insights; and a text ordering module configured to order the text as part of the caption based on respective said scores. 14 . The system as described in claim 13 , wherein the scores quantify the text based on degrees of specificity. 15 . The system as described in claim 13 , wherein the caption formation module further comprises a complexity adjustment module configured to adjust language complexity of the text as part of the caption. 16 . The system as described in claim 13 , wherein the caption formation module further comprises a readability module to edit the text generated for a first said data insight based on text generated for a second said data insight. 17 . The system as described in claim 13 , wherein the caption formation module further comprises a readability module to edit the text for safety. 18 . The system as described in claim 13 , wherein the caption formation module further comprises: a link generation module configured to generate a link as part of the caption, the link generated based on at least a portion of the text and is user selectable to navigate to a network address; and a translation module configured to translate the text. 19 . In a digital medium automated caption generation environment, a system comprising: means for generating, automatically and without user intervention, a caption that textually describes a dataset having a plurality of data entries, the generating means including: means for receiving a dataset having a plurality of data entries: means for generating text based on a plurality of data insights from the plurality of data entries of the dataset; means for ordering the text based on a ranking; and means for editing the ordered text for readability such that text generated for a first said data insight is edited based on text generated for a second said data insight. 20 . The system as described in claim 19 , further comprising: means for adjusting language complexity of the text as part of the caption; means for checking safety of the text as part of the caption; means for translating the text as part of the caption; or means for generating a link included as part of the caption, the link generated based on at least a portion of the text and is user selectable to navigate to a network address.
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
Named entity recognition · CPC title
Annotation, e.g. comment data or footnotes · CPC title
Natural language generation · CPC title
using statistical methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.