Parallel parsing of markup language data

US10268672B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10268672-B2
Application numberUS-201615051698-A
CountryUS
Kind codeB2
Filing dateFeb 24, 2016
Priority dateMar 30, 2015
Publication dateApr 23, 2019
Grant dateApr 23, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Parsing XML (extensible markup language) data by performing the following operations: (i) dividing the piece of markup language into a plurality of pre-parsing segments; (ii) assigning the pre-parsing segment to a pre-parsing processor thread of a plurality of pre-parsing processor threads; (iii) determining any parsing division point(s) occurring in the pre-parsing segment so that data corresponding to a single tabular record is between each consecutive pair of parsing division points; (iv) dividing the piece of language into a plurality of parsing segments defined by the parsing division points so that each parsing segment corresponds to a single tabular record; (v) assigning the parsing segment to a parsing processor thread of a plurality of parsing processor threads; and (vi) parsing to generate a parsed tabular record corresponding to the parsing segment.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method (CIM) comprising: receiving an XML data file that includes a plurality of data record, which each data record including a start of record marker in XML format, a plurality of field values delimited in XML format, and an end of record marker in XML format; dividing the XML data file into N un-pre-parsed portions, where N is greater than two and at least N-I of the N un-pre-parsed data portions have an equal number of lines of XML code; for each given un-pre-parsed data portion of the N un-pre-parsed data portions, selecting a selected pre-parsing processor cores, from a plurality of processor core, to use to perform preparsing; performing parallel pre-parsing on the N un-pre-parsed data portions, with each un-pre-parsed data portion being pre-parsed by its associated selected pre-parsing processor core, with the pre-parsing including determining the locations of the end of record marker(s) are located in each un-pre-parsed data potion, and with the pre-parsing being parallel in the sense that more than one data portion is pre-parsed simultaneously during at least some portions of the preparsing; dividing the XML data file into M pre-parsed data portions, where M is equal to a number of records in the XML data file, based upon the determination of the locations of the end of record marker(s) in each of the un-pre-parsed data portions such that each pre-parsed data portion includes XML data corresponding to a single record; for each given pre-parsed data portion of the M pre-parsed data portions, selecting a selected parsing processor core, from the plurality of processor cores, to use to perform parsing; performing parallel parsing on the M pre-parsed data portions, with each pre-parsed data portion being parsed by its associated selected parsing processor core, with parsing including determining rows of a table respectively corresponding to records of the XML data file, and with the parsing being parallel in the sense that more than one pre-parsed data portion is parsed simultaneously during at least some portions of the parsing; and outputting a table data file including the parsed rows obtained by the parsing of the M pre-parsed data portions; wherein the CIM improves the operation of a computer by allowing both a pre-parsing operation and a parsing operation to be performed in parallel by multiple cores of a processor having multiple cores to avoid a bottleneck when obtaining a table data file from an XML data file having multiple records and to ensure scalability when a number of CPU cores. 2. The CIM of claim 1 wherein each end of record marker in the XML file has the following syntax: </name-of-record> with name-of-record corresponding to a predetermined alphanumeric string. 3. The CIM of claim 2 wherein the alphanumeric string corresponding to the name-or-record being CUSTOMER. 4. The CIM of claim 1 wherein: all of the N un-pre-parsed data portions except a last un-pre-parsed data portion of the N pre-parsed data portion has an equal number of lines of XML code; and the last un-pre-parsed data portion has fewer lines of code than the other un-pre-parsed data portions.

Assignees

Inventors

Classifications

  • Parsing markup language streams (streaming G06F40/149) · CPC title

  • Selection or weighting of terms from queries, including natural language queries · CPC title

  • Physics · mapped topic

  • G06F17/272Primary

    Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10268672B2 cover?
Parsing XML (extensible markup language) data by performing the following operations: (i) dividing the piece of markup language into a plurality of pre-parsing segments; (ii) assigning the pre-parsing segment to a pre-parsing processor thread of a plurality of pre-parsing processor threads; (iii) determining any parsing division point(s) occurring in the pre-parsing segment so that data corresp…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/3334. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 23 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).