Data visualizations selection
US-2017060367-A1 · Mar 2, 2017 · US
US11620304B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11620304-B2 |
| Application number | US-201615299329-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 20, 2016 |
| Priority date | Oct 20, 2016 |
| Publication date | Apr 4, 2023 |
| Grant date | Apr 4, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for transforming strings includes identifying one or more candidate example input strings from a database including a set of input strings. The candidate example input strings are presented for example transformation. For one or more of the candidate example input strings, an example output string corresponding to that example input string is received, where each example input string and its corresponding example output string define a transformation example in an example set. A string transformation program is generated based on transformation examples in the example set.
Opening claim text (preview).
The invention claimed is: 1. At a computing device, a method for improving training of a string transformation program based on identification of transformation examples, the method comprising: from a dataset including a set of input strings, automatically selecting a plurality of input string examples for inclusion in an example set by identifying a plurality of string clusters in the dataset corresponding to different string formats represented in the dataset, and selecting one or more input strings from each cluster as input string examples for inclusion in the example set, wherein each of the plurality of input string examples in the example set are paired with a corresponding plurality of output string examples to define transformation examples in the example set; based at least in part on the transformation examples in the example set, generating first and second potential string transformation programs; identifying a plurality of ambiguous input string examples for inclusion in the example set, the plurality of ambiguous input string examples automatically identified by applying, to each of two or more input strings in the dataset, the first and second potential string transformation programs to the two or more input strings to transform the two or more input strings into first and second output strings for each of the two or more input strings, and identifying as ambiguous input string examples any of the two or more input strings for which content of the first output string and content of the second output string are different; receiving one or more disambiguating example output strings corresponding to one or more of the ambiguous input string examples, where each ambiguous input string example and its corresponding disambiguating example output string define a transformation example in the example set; and generating a string transformation program for transforming the set of input strings based on the transformation examples in the example set. 2. The method of claim 1 , further comprising applying the string transformation program to each of the set of input strings to transform the set of input strings into a corresponding set of output strings. 3. The method of claim 2 , further comprising, based on receiving an indication that one or more input strings were incorrectly transformed by the string transformation program, receiving additional transformation examples, and modifying the string transformation program based on the additional transformation examples. 4. The method of claim 1 , where selecting the one or more input strings from each cluster includes randomly selecting one input string from each cluster. 5. The method of claim 1 , where disambiguating example output strings corresponding to ambiguous input string examples are input by a user. 6. The method of claim 1 , where disambiguating example output strings corresponding to ambiguous input string examples are predicted based on a user input of a desired string transformation program. 7. The method of claim 1 , where the transformation examples in the example set include input strings selected by a user from among the set of input strings in the dataset and not identified as ambiguous input string examples. 8. The method of claim 1 , where the transformation examples in the example set include synthetic ambiguous input string examples provided by a user and not present in the set of input strings in the dataset. 9. The method of claim 1 , where the set of input strings are arrayed in one or more columns in a spreadsheet. 10. The method of claim 1 , where the transformation examples in the example set are viewable and manipulable separately from strings in the dataset. 11. A computing system for improving training of a string transformation program based on identification of transformation examples, the computing system comprising: a hardware processor; and a physical storage device holding instructions that are executable by the hardware processor to: automatically select, from a dataset including a plurality of input strings, a plurality of input string examples for inclusion in an example set by identifying a plurality of string clusters in the dataset corresponding to different string formats represented in the dataset, and selecting one or more input strings from each cluster as input string examples for inclusion in the example set, wherein each of the plurality of input string examples in the example set are paired with a corresponding plurality of output string examples to define transformation examples in the example set; generate first and second potential string transformation programs based at least in part on the transformation examples in the example set; identify a plurality of ambiguous input string examples for inclusion in the example set, the plurality of ambiguous input string examples automatically identified by applying, to two or more input strings in the dataset, the first and second potential string transformation programs to the two or more input strings to transform the two or more input strings into first and second output strings for each of the two or more input strings, and identifying as ambiguous input string examples any of the two or more input strings for which content of the first output string and content of the second output string are different; receive one or more disambiguating example output strings corresponding to one or more of the ambiguous input string examples, where each ambiguous input string example and its corresponding disambiguating example output string define a transformation example in the example set; and generate a string transformation program for transforming the set of input strings based on the transformation examples in the example set. 12. The computing system of claim 11 , the instructions being further executable by the hardware processor to apply the string transformation program to each of the set of input strings to transform the set of input strings into a corresponding set of output strings. 13. The computing system of claim 11 , the instructions being further executable by the hardware processor to receive additional transformation examples based on receiving an indication that one or more input strings were incorrectly transformed by the string transformation program, and modifying the string transformation program based on the additional transformation examples. 14. The computing system of claim 11 , where selecting the one or more input strings from each cluster includes randomly selecting one input string from each cluster. 15. The computing system of claim 11 , where the transformation examples in the example set include input strings selected by a user from among the set of input strings in the dataset and not identified as ambiguous input string examples. 16. The computing system of claim 11 , where the transformation examples in the example set include synthetic ambiguous input string examples provided by a user and not present in the set of input strings in the dataset.
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
Data format conversion from or to a database · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.