Generating tables based upon data extracted from tree-structured documents

US10691655B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10691655-B2
Application numberUS-201615299312-A
CountryUS
Kind codeB2
Filing dateOct 20, 2016
Priority dateOct 20, 2016
Publication dateJun 23, 2020
Grant dateJun 23, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Various technologies pertaining to extracting data encoded in a tree-structured document and generating a table based upon the extracted data are described herein. In a first embodiment, the table is generated without requiring input from a data cleaner. In a second embodiment, the table is generated based upon examples set forth by a data cleaner.

First claim

Opening claim text (preview).

What is claimed is: 1. A computing system comprising: at least one processor; and memory that stores a data cleaning tool, wherein the data cleaning tool, when executed by the at least one processor, is configured to: load a tree-structured document into the memory; receive a request to generate tabular data based upon the tree-structured document; responsive to receiving the request, select a conversion scheme from amongst a plurality of potential conversion schemes, the selected conversion scheme is configured to generate the tabular data when the tree-structured document is received as input to the conversion scheme, wherein the conversion scheme is selected from amongst the plurality of potential conversion schemes based upon historic structure of tabular data in an enterprise division of a user who initiated the request; and generate the tabular data based upon the selected conversion scheme. 2. The computing system of claim 1 , wherein the conversion scheme is selected from amongst the plurality of potential conversion schemes based upon a computer-implemented model of user behavior with respect to generation of tabular data from tree-structured documents. 3. The computing system of claim 1 , the data cleaning tool is further configured to: prior to selecting the conversion scheme from amongst the plurality of potential conversion schemes, construct a schema based upon a structure of the tree-structured document; and select the conversion scheme from amongst the plurality of potential conversion schemes based upon the constructed schema. 4. The computing system of claim 1 , wherein the tree-structured document comprises a first record and a second record, the first record includes a first field and the second record includes a second field, the first field includes a first list and the second field includes a second list of the same length as the first list, and further wherein the selected conversion scheme is configured to merge items in the first list with items in the second list such that a row-based entry in the tabular data includes a first item from the first list and a second item from the second list. 5. The computing system of claim 4 , wherein the selected conversion scheme, when applied to the tree-structured document, is configured to merge items from the first list with items the second list that are at the same level in a hierarchy of the tree-structured document. 6. The computing system of claim 1 , wherein the tree-structured document comprises a first record and a second record, the first record includes a first field and the second record includes a second field, the first field includes a first list and the second field includes a second list, and further wherein the selected conversion scheme is configured to generate a cross product of the first list and the second list, such that a column in the tabular data includes the cross product of the first list and the second list. 7. The computing system of claim 6 , wherein the selected conversion scheme is configured to generate the cross product of the first list and the second list only if the first list and the second list are at a same depth in the tree-structured document. 8. The computing system of claim 1 , wherein the tree-structured document is one of a JSON document or an XML document. 9. The computing system of claim 1 , the data cleaning tool is further configured to: prior to selecting the conversion scheme from the plurality of potential conversion schemes, receive, from a second user, a selection of a portion of the tree-structured document; and responsive to receiving the selection of the portion of the tree-structured document and based upon the portion of the tree-structured document, select the conversion scheme from the plurality of potential conversion schemes. 10. A computer-readable storage medium comprising instructions that, when executed by a processor, cause the processor to perform acts comprising: loading a JSON document into memory; receiving a request to generate tabular data based upon the JSON document; responsive to receiving the request, learning a schema for the JSON document based upon a structure of the JSON document; using the schema, selecting a conversion scheme from amongst a plurality of possible conversion schemes, wherein the conversion scheme, when receiving the JSON document as input, generates tabular data based upon at least a portion of the JSON document, wherein the conversion scheme is selected from amongst the plurality of potential conversion schemes based upon historic structure of tabular data in an enterprise division of a user who initiated the request; and generating tabular data based upon the selected conversion scheme. 11. The computer-readable storage medium of claim 10 , wherein the conversion scheme is selected from amongst the plurality of potential conversion schemes based upon a computer-implemented model of user behavior with respect to generation of tabular data from tree-structured documents. 12. A method executed by a processor of a computing system, the method comprising: loading a tree-structured document into memory of the computing system; receiving a request to generate tabular data based upon the tree-structured document; responsive to receiving the request, selecting a conversion scheme from amongst a plurality of potential conversion schemes, the selected conversion scheme is configured to generate the tabular data when the tree-structured document is received as input to the conversion scheme, wherein the conversion scheme is selected from amongst the plurality of potential conversion schemes based upon historic structure of tabular data in an enterprise division of a user who initiated the request; and generating the tabular data based upon the selected conversion scheme. 13. The method of claim 12 , wherein the conversion scheme is selected from amongst the plurality of potential conversion schemes based upon a computer-implemented model of user behavior with respect to generation of tabular data from tree-structured documents. 14. The method of claim 12 , further comprising: prior to selecting the conversion scheme from amongst the plurality of potential conversion schemes, constructing a schema based upon a structure of the tree-structured document; and selecting the conversion scheme from amongst the plurality of potential conversion schemes based upon the constructed schema. 15. The method of claim 12 , wherein the tree-structured document comprises a first record and a second record, the first record includes a first field and the second record includes a second field, the first field includes a first list and the second field includes a second list of the same length as the first list, and further wherein the selected conversion scheme is configured to merge items in the first list with items in the second list such that a row-based entry in the tabular data includes a first item from the first list and a second item from the second list. 16. The method of claim 15 , wherein the selected conversion scheme, when applied to the tree-structured document, is configured to merge items from the first list with items the second list that are at the same level in a hierarchy of the tree-structured document. 17. The method of claim 12 , wherein the tree-structured document comprises a first record and a second record, the first record includes a first field and the second record includes a second field, the first field includes a first list and the second field includes a second list, and further wherein th

Assignees

Inventors

Classifications

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • Data format conversion from or to a database · CPC title

  • Tablespace storage structures; Management thereof · CPC title

  • Querying · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10691655B2 cover?
Various technologies pertaining to extracting data encoded in a tree-structured document and generating a table based upon the extracted data are described herein. In a first embodiment, the table is generated without requiring input from a data cleaner. In a second embodiment, the table is generated based upon examples set forth by a data cleaner.
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 23 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).