Who is the assignee on this patent?

Agarwal Manoj K, Bar-Or Amir, Bhide Manish Anand, and 3 more

What technology area does this patent fall under?

Primary CPC classification G06F17/2705. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 25 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Finding partition boundaries for parallel processing of markup language documents

US9477651B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9477651-B2
Application number	US-89324810-A
Country	US
Kind code	B2
Filing date	Sep 29, 2010
Priority date	Sep 29, 2010
Publication date	Oct 25, 2016
Grant date	Oct 25, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, a computer program product and a system identify partition locations within an extended markup language (XML) document without parsing so as to process portions of said document in parallel. The XML document includes sections required to remain continuous. The document is scanned for continuous sections without parsing, and boundaries of the initial partitions are adjusted to reside outside the continuous sections to determine resulting partitions for the document. The resulting partitions may be processed in parallel to provide the document information for storage.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of identifying partition locations within an XML document and performing parallel processing of the XML document, said method comprising: determining, by a processor, a partition node XPath in the XML based upon extract, transfer and load (ETL) job requirements and a schema of the XML document wherein the partition node XPath is a path occurring multiple times within the XML document at a main body portion; identifying, by the processor, a header context of the XML document by parsing the XML document from a start of the XML document to a point in the XML document before a first occurrence of a partition node in the partition node XPath; marking, by the processor, said XML document at a location prior to the first occurrence of said partition node with an indication of an end point of said header context; identifying, by the processor, a footer context of the XML document by reverse parsing of the XML document from an end of the XML document until a first occurrence of a close of a partition in the partition node XPath; marking, by the processor, said XML document at a location after the first occurrence of the close of said partition node with an indication of a start point of said footer context; and merging, by the processor, the header context and the footer context within said XML document, wherein the merging comprises moving values of the footer context to a marked location at an end of the header context while maintaining sequencing of level information within the header context and the footer context, and each resulting partition is processed with said merged header and said footer context; before parsing the main body portion of the XML document: determining, by the processor, locations within said XML document to form initial partitions, scanning without parsing, by the processor, said XML document to identify sections required to remain continuous based on the ETL job requirements and the schema of the XML document, adjusting, without parsing by the processor, boundaries of said initial partitions to reside outside said continuous sections to determine resulting partitions for said XML document; and performing parsing via parallel processing of the XML document, by a plurality of processors, using the adjusted boundaries of the resulting partitions. 2. The method of claim 1 , further comprising: processing said resulting partitions in parallel to provide document information for storage. 3. The method of claim 1 , wherein the adjusting boundaries of said initial partitions is performed to maintain at least one of a character data section, a comment section, and a nested node definition within a single continuous section. 4. The method of claim 1 , wherein scanning said document for said continuous sections without parsing and adjusting, without parsing, boundaries of said initial partitions to reside outside said continuous sections to determine resulting partitions for said document comprises: a) scanning said document from a start point of said XML document to a first partition point to determine whether the first partition point is located within a continuous section; b) in response to a determination that the first partition point is within a continuous section, moving the first partition point to a location within said XML document that is prior or subsequent to an occurrence of the continuous section; c) repeating steps a) and b) with subsequent partition points until reaching an end of said document, wherein said scanning occurs from an immediate prior partition point to a next partition point in said document. 5. The method of claim 1 , wherein said XML document has a memory size of at least 1 GB. 6. A computer program product for identifying partition locations within an XML document and performing parallel processing of the XML document, the computer program product comprising: a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code configured to: determine, by a processor, a partition node XPath in the XML based upon extract, transfer and load (ETL) job requirements and a schema of the XML document wherein the partition node XPath is a path occurring multiple times within the XML document at a main body portion; identify, by the processor, a header context of the XML document by parsing the XML document from a start of the XML document to a point in the XML document before a first occurrence of a partition node in the partition node XPath; mark, by the processor, said XML document at a location prior to the first occurrence of said partition node with an indication of an end point of said header context; identify, by the processor, a footer context of the XML document by reverse parsing of the XML document from an end of the XML document until a first occurrence of a close of a partition in the partition node XPath; mark, by the processor, said XML document at a location after the first occurrence of the close of said partition node with an indication of a start point of said footer context; and merge, by the processor, the header context and the footer context within said XML document, wherein the merging comprises moving values of the footer context to a marked location at an end of the header context while maintaining sequencing of level information within the header context and the footer context, and each resulting partition is processed with said merged header and said footer context; before parsing the main body portion of the XML document, the computer readable program code is further configured to: determine, by the processor, locations within said XML document to form initial partitions, scan, without parsing, said XML document to identify sections required to remain continuous based on the ETL job requirements and the schema of the XML document, and adjust, without parsing, boundaries of said initial partitions to reside outside said continuous sections to determine resulting partitions for said document; and perform parsing via parallel processing of the XML document, by a plurality of processors, using the adjusted boundaries of the resulting partitions. 7. The computer program product of claim 6 , wherein said computer readable program code is further configured to: process said resulting partitions in parallel to provide document information for storage. 8. The computer program product of claim 6 , wherein the computer readable program code is further configured to adjust boundaries of said initial partitions so as to maintain at least one of a character data section, a comment section, and a nested node definition within a single continuous section. 9. The computer program product of claim 6 , wherein said computer readable program code is configured to scan said XML document for said continuous sections without parsing and adjust boundaries of said initial partitions to reside outside said continuous sections to determine resulting partitions for said document by: a) scanning said XML document from a start point of said document to a first partition point to determine whether the first partition point is located within a continuous section; b) in response to a determination that the first partition point is within a continuous section, moving the first partition point to a location within said document that is prior or subsequent to an occurrence of the continuous section; c) repeating steps a) and b) with subsequent partition points until reaching an end of said XML document, wherein said scanning occurs from an immediate prior partition point to a next partition point in said document. 10. A system for identifying part

Assignees

Inventors

Classifications

G06F40/131
Fragmentation of text files, e.g. creating reusable text-blocks; Linking to fragments, e.g. using XInclude; Namespaces · CPC title
G06F8/456
Parallelism detection · CPC title
G06F17/2229
Physics · mapped topic
G06F17/2247
Physics · mapped topic
G06F17/2705Primary
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 44645693

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9477651B2 cover?: A method, a computer program product and a system identify partition locations within an extended markup language (XML) document without parsing so as to process portions of said document in parallel. The XML document includes sections required to remain continuous. The document is scanned for continuous sections without parsing, and boundaries of the initial partitions are adjusted to reside o…
Who is the assignee on this patent?: Agarwal Manoj K, Bar-Or Amir, Bhide Manish Anand, and 3 more
What technology area does this patent fall under?: Primary CPC classification G06F17/2705. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 25 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Efficient loading of data in databases

Parallel processing of ETL jobs involving extensible markup language documents

Flexible database schema

Frequently asked questions