Automatic partitioning
US-12164512-B2 · Dec 10, 2024 · US
US10268672B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10268672-B2 |
| Application number | US-201615051698-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 24, 2016 |
| Priority date | Mar 30, 2015 |
| Publication date | Apr 23, 2019 |
| Grant date | Apr 23, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Parsing XML (extensible markup language) data by performing the following operations: (i) dividing the piece of markup language into a plurality of pre-parsing segments; (ii) assigning the pre-parsing segment to a pre-parsing processor thread of a plurality of pre-parsing processor threads; (iii) determining any parsing division point(s) occurring in the pre-parsing segment so that data corresponding to a single tabular record is between each consecutive pair of parsing division points; (iv) dividing the piece of language into a plurality of parsing segments defined by the parsing division points so that each parsing segment corresponds to a single tabular record; (v) assigning the parsing segment to a parsing processor thread of a plurality of parsing processor threads; and (vi) parsing to generate a parsed tabular record corresponding to the parsing segment.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method (CIM) comprising: receiving an XML data file that includes a plurality of data record, which each data record including a start of record marker in XML format, a plurality of field values delimited in XML format, and an end of record marker in XML format; dividing the XML data file into N un-pre-parsed portions, where N is greater than two and at least N-I of the N un-pre-parsed data portions have an equal number of lines of XML code; for each given un-pre-parsed data portion of the N un-pre-parsed data portions, selecting a selected pre-parsing processor cores, from a plurality of processor core, to use to perform preparsing; performing parallel pre-parsing on the N un-pre-parsed data portions, with each un-pre-parsed data portion being pre-parsed by its associated selected pre-parsing processor core, with the pre-parsing including determining the locations of the end of record marker(s) are located in each un-pre-parsed data potion, and with the pre-parsing being parallel in the sense that more than one data portion is pre-parsed simultaneously during at least some portions of the preparsing; dividing the XML data file into M pre-parsed data portions, where M is equal to a number of records in the XML data file, based upon the determination of the locations of the end of record marker(s) in each of the un-pre-parsed data portions such that each pre-parsed data portion includes XML data corresponding to a single record; for each given pre-parsed data portion of the M pre-parsed data portions, selecting a selected parsing processor core, from the plurality of processor cores, to use to perform parsing; performing parallel parsing on the M pre-parsed data portions, with each pre-parsed data portion being parsed by its associated selected parsing processor core, with parsing including determining rows of a table respectively corresponding to records of the XML data file, and with the parsing being parallel in the sense that more than one pre-parsed data portion is parsed simultaneously during at least some portions of the parsing; and outputting a table data file including the parsed rows obtained by the parsing of the M pre-parsed data portions; wherein the CIM improves the operation of a computer by allowing both a pre-parsing operation and a parsing operation to be performed in parallel by multiple cores of a processor having multiple cores to avoid a bottleneck when obtaining a table data file from an XML data file having multiple records and to ensure scalability when a number of CPU cores. 2. The CIM of claim 1 wherein each end of record marker in the XML file has the following syntax: </name-of-record> with name-of-record corresponding to a predetermined alphanumeric string. 3. The CIM of claim 2 wherein the alphanumeric string corresponding to the name-or-record being CUSTOMER. 4. The CIM of claim 1 wherein: all of the N un-pre-parsed data portions except a last un-pre-parsed data portion of the N pre-parsed data portion has an equal number of lines of XML code; and the last un-pre-parsed data portion has fewer lines of code than the other un-pre-parsed data portions.
Parsing markup language streams (streaming G06F40/149) · CPC title
Selection or weighting of terms from queries, including natural language queries · CPC title
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.