What technology area does this patent fall under?

Primary CPC classification G06F16/3334. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 23 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Parallel parsing of markup language data

US10268672B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10268672-B2
Application number	US-201615051698-A
Country	US
Kind code	B2
Filing date	Feb 24, 2016
Priority date	Mar 30, 2015
Publication date	Apr 23, 2019
Grant date	Apr 23, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Parsing XML (extensible markup language) data by performing the following operations: (i) dividing the piece of markup language into a plurality of pre-parsing segments; (ii) assigning the pre-parsing segment to a pre-parsing processor thread of a plurality of pre-parsing processor threads; (iii) determining any parsing division point(s) occurring in the pre-parsing segment so that data corresponding to a single tabular record is between each consecutive pair of parsing division points; (iv) dividing the piece of language into a plurality of parsing segments defined by the parsing division points so that each parsing segment corresponds to a single tabular record; (v) assigning the parsing segment to a parsing processor thread of a plurality of parsing processor threads; and (vi) parsing to generate a parsed tabular record corresponding to the parsing segment.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method (CIM) comprising: receiving an XML data file that includes a plurality of data record, which each data record including a start of record marker in XML format, a plurality of field values delimited in XML format, and an end of record marker in XML format; dividing the XML data file into N un-pre-parsed portions, where N is greater than two and at least N-I of the N un-pre-parsed data portions have an equal number of lines of XML code; for each given un-pre-parsed data portion of the N un-pre-parsed data portions, selecting a selected pre-parsing processor cores, from a plurality of processor core, to use to perform preparsing; performing parallel pre-parsing on the N un-pre-parsed data portions, with each un-pre-parsed data portion being pre-parsed by its associated selected pre-parsing processor core, with the pre-parsing including determining the locations of the end of record marker(s) are located in each un-pre-parsed data potion, and with the pre-parsing being parallel in the sense that more than one data portion is pre-parsed simultaneously during at least some portions of the preparsing; dividing the XML data file into M pre-parsed data portions, where M is equal to a number of records in the XML data file, based upon the determination of the locations of the end of record marker(s) in each of the un-pre-parsed data portions such that each pre-parsed data portion includes XML data corresponding to a single record; for each given pre-parsed data portion of the M pre-parsed data portions, selecting a selected parsing processor core, from the plurality of processor cores, to use to perform parsing; performing parallel parsing on the M pre-parsed data portions, with each pre-parsed data portion being parsed by its associated selected parsing processor core, with parsing including determining rows of a table respectively corresponding to records of the XML data file, and with the parsing being parallel in the sense that more than one pre-parsed data portion is parsed simultaneously during at least some portions of the parsing; and outputting a table data file including the parsed rows obtained by the parsing of the M pre-parsed data portions; wherein the CIM improves the operation of a computer by allowing both a pre-parsing operation and a parsing operation to be performed in parallel by multiple cores of a processor having multiple cores to avoid a bottleneck when obtaining a table data file from an XML data file having multiple records and to ensure scalability when a number of CPU cores. 2. The CIM of claim 1 wherein each end of record marker in the XML file has the following syntax: </name-of-record> with name-of-record corresponding to a predetermined alphanumeric string. 3. The CIM of claim 2 wherein the alphanumeric string corresponding to the name-or-record being CUSTOMER. 4. The CIM of claim 1 wherein: all of the N un-pre-parsed data portions except a last un-pre-parsed data portion of the N pre-parsed data portion has an equal number of lines of XML code; and the last un-pre-parsed data portion has fewer lines of code than the other un-pre-parsed data portions.

Assignees

Inventors

Classifications

G06F40/221
Parsing markup language streams (streaming G06F40/149) · CPC title
G06F16/3334Primary
Selection or weighting of terms from queries, including natural language queries · CPC title
G06F17/30663
Physics · mapped topic
G06F17/272Primary
Physics · mapped topic
G06F17/2247
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 57015282

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10268672B2 cover?: Parsing XML (extensible markup language) data by performing the following operations: (i) dividing the piece of markup language into a plurality of pre-parsing segments; (ii) assigning the pre-parsing segment to a pre-parsing processor thread of a plurality of pre-parsing processor threads; (iii) determining any parsing division point(s) occurring in the pre-parsing segment so that data corresp…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F16/3334. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 23 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).