Example management for string transformation

US11620304B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11620304-B2
Application numberUS-201615299329-A
CountryUS
Kind codeB2
Filing dateOct 20, 2016
Priority dateOct 20, 2016
Publication dateApr 4, 2023
Grant dateApr 4, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for transforming strings includes identifying one or more candidate example input strings from a database including a set of input strings. The candidate example input strings are presented for example transformation. For one or more of the candidate example input strings, an example output string corresponding to that example input string is received, where each example input string and its corresponding example output string define a transformation example in an example set. A string transformation program is generated based on transformation examples in the example set.

First claim

Opening claim text (preview).

The invention claimed is: 1. At a computing device, a method for improving training of a string transformation program based on identification of transformation examples, the method comprising: from a dataset including a set of input strings, automatically selecting a plurality of input string examples for inclusion in an example set by identifying a plurality of string clusters in the dataset corresponding to different string formats represented in the dataset, and selecting one or more input strings from each cluster as input string examples for inclusion in the example set, wherein each of the plurality of input string examples in the example set are paired with a corresponding plurality of output string examples to define transformation examples in the example set; based at least in part on the transformation examples in the example set, generating first and second potential string transformation programs; identifying a plurality of ambiguous input string examples for inclusion in the example set, the plurality of ambiguous input string examples automatically identified by applying, to each of two or more input strings in the dataset, the first and second potential string transformation programs to the two or more input strings to transform the two or more input strings into first and second output strings for each of the two or more input strings, and identifying as ambiguous input string examples any of the two or more input strings for which content of the first output string and content of the second output string are different; receiving one or more disambiguating example output strings corresponding to one or more of the ambiguous input string examples, where each ambiguous input string example and its corresponding disambiguating example output string define a transformation example in the example set; and generating a string transformation program for transforming the set of input strings based on the transformation examples in the example set. 2. The method of claim 1 , further comprising applying the string transformation program to each of the set of input strings to transform the set of input strings into a corresponding set of output strings. 3. The method of claim 2 , further comprising, based on receiving an indication that one or more input strings were incorrectly transformed by the string transformation program, receiving additional transformation examples, and modifying the string transformation program based on the additional transformation examples. 4. The method of claim 1 , where selecting the one or more input strings from each cluster includes randomly selecting one input string from each cluster. 5. The method of claim 1 , where disambiguating example output strings corresponding to ambiguous input string examples are input by a user. 6. The method of claim 1 , where disambiguating example output strings corresponding to ambiguous input string examples are predicted based on a user input of a desired string transformation program. 7. The method of claim 1 , where the transformation examples in the example set include input strings selected by a user from among the set of input strings in the dataset and not identified as ambiguous input string examples. 8. The method of claim 1 , where the transformation examples in the example set include synthetic ambiguous input string examples provided by a user and not present in the set of input strings in the dataset. 9. The method of claim 1 , where the set of input strings are arrayed in one or more columns in a spreadsheet. 10. The method of claim 1 , where the transformation examples in the example set are viewable and manipulable separately from strings in the dataset. 11. A computing system for improving training of a string transformation program based on identification of transformation examples, the computing system comprising: a hardware processor; and a physical storage device holding instructions that are executable by the hardware processor to: automatically select, from a dataset including a plurality of input strings, a plurality of input string examples for inclusion in an example set by identifying a plurality of string clusters in the dataset corresponding to different string formats represented in the dataset, and selecting one or more input strings from each cluster as input string examples for inclusion in the example set, wherein each of the plurality of input string examples in the example set are paired with a corresponding plurality of output string examples to define transformation examples in the example set; generate first and second potential string transformation programs based at least in part on the transformation examples in the example set; identify a plurality of ambiguous input string examples for inclusion in the example set, the plurality of ambiguous input string examples automatically identified by applying, to two or more input strings in the dataset, the first and second potential string transformation programs to the two or more input strings to transform the two or more input strings into first and second output strings for each of the two or more input strings, and identifying as ambiguous input string examples any of the two or more input strings for which content of the first output string and content of the second output string are different; receive one or more disambiguating example output strings corresponding to one or more of the ambiguous input string examples, where each ambiguous input string example and its corresponding disambiguating example output string define a transformation example in the example set; and generate a string transformation program for transforming the set of input strings based on the transformation examples in the example set. 12. The computing system of claim 11 , the instructions being further executable by the hardware processor to apply the string transformation program to each of the set of input strings to transform the set of input strings into a corresponding set of output strings. 13. The computing system of claim 11 , the instructions being further executable by the hardware processor to receive additional transformation examples based on receiving an indication that one or more input strings were incorrectly transformed by the string transformation program, and modifying the string transformation program based on the additional transformation examples. 14. The computing system of claim 11 , where selecting the one or more input strings from each cluster includes randomly selecting one input string from each cluster. 15. The computing system of claim 11 , where the transformation examples in the example set include input strings selected by a user from among the set of input strings in the dataset and not identified as ambiguous input string examples. 16. The computing system of claim 11 , where the transformation examples in the example set include synthetic ambiguous input string examples provided by a user and not present in the set of input strings in the dataset.

Assignees

Inventors

Classifications

  • Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • G06F16/258Primary

    Data format conversion from or to a database · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11620304B2 cover?
A method for transforming strings includes identifying one or more candidate example input strings from a database including a set of input strings. The candidate example input strings are presented for example transformation. For one or more of the candidate example input strings, an example output string corresponding to that example input string is received, where each example input string a…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/258. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 04 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).