Efficient loading of data in databases
US-9195729-B2 · Nov 24, 2015 · US
US9477651B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9477651-B2 |
| Application number | US-89324810-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 29, 2010 |
| Priority date | Sep 29, 2010 |
| Publication date | Oct 25, 2016 |
| Grant date | Oct 25, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, a computer program product and a system identify partition locations within an extended markup language (XML) document without parsing so as to process portions of said document in parallel. The XML document includes sections required to remain continuous. The document is scanned for continuous sections without parsing, and boundaries of the initial partitions are adjusted to reside outside the continuous sections to determine resulting partitions for the document. The resulting partitions may be processed in parallel to provide the document information for storage.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method of identifying partition locations within an XML document and performing parallel processing of the XML document, said method comprising: determining, by a processor, a partition node XPath in the XML based upon extract, transfer and load (ETL) job requirements and a schema of the XML document wherein the partition node XPath is a path occurring multiple times within the XML document at a main body portion; identifying, by the processor, a header context of the XML document by parsing the XML document from a start of the XML document to a point in the XML document before a first occurrence of a partition node in the partition node XPath; marking, by the processor, said XML document at a location prior to the first occurrence of said partition node with an indication of an end point of said header context; identifying, by the processor, a footer context of the XML document by reverse parsing of the XML document from an end of the XML document until a first occurrence of a close of a partition in the partition node XPath; marking, by the processor, said XML document at a location after the first occurrence of the close of said partition node with an indication of a start point of said footer context; and merging, by the processor, the header context and the footer context within said XML document, wherein the merging comprises moving values of the footer context to a marked location at an end of the header context while maintaining sequencing of level information within the header context and the footer context, and each resulting partition is processed with said merged header and said footer context; before parsing the main body portion of the XML document: determining, by the processor, locations within said XML document to form initial partitions, scanning without parsing, by the processor, said XML document to identify sections required to remain continuous based on the ETL job requirements and the schema of the XML document, adjusting, without parsing by the processor, boundaries of said initial partitions to reside outside said continuous sections to determine resulting partitions for said XML document; and performing parsing via parallel processing of the XML document, by a plurality of processors, using the adjusted boundaries of the resulting partitions. 2. The method of claim 1 , further comprising: processing said resulting partitions in parallel to provide document information for storage. 3. The method of claim 1 , wherein the adjusting boundaries of said initial partitions is performed to maintain at least one of a character data section, a comment section, and a nested node definition within a single continuous section. 4. The method of claim 1 , wherein scanning said document for said continuous sections without parsing and adjusting, without parsing, boundaries of said initial partitions to reside outside said continuous sections to determine resulting partitions for said document comprises: a) scanning said document from a start point of said XML document to a first partition point to determine whether the first partition point is located within a continuous section; b) in response to a determination that the first partition point is within a continuous section, moving the first partition point to a location within said XML document that is prior or subsequent to an occurrence of the continuous section; c) repeating steps a) and b) with subsequent partition points until reaching an end of said document, wherein said scanning occurs from an immediate prior partition point to a next partition point in said document. 5. The method of claim 1 , wherein said XML document has a memory size of at least 1 GB. 6. A computer program product for identifying partition locations within an XML document and performing parallel processing of the XML document, the computer program product comprising: a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code configured to: determine, by a processor, a partition node XPath in the XML based upon extract, transfer and load (ETL) job requirements and a schema of the XML document wherein the partition node XPath is a path occurring multiple times within the XML document at a main body portion; identify, by the processor, a header context of the XML document by parsing the XML document from a start of the XML document to a point in the XML document before a first occurrence of a partition node in the partition node XPath; mark, by the processor, said XML document at a location prior to the first occurrence of said partition node with an indication of an end point of said header context; identify, by the processor, a footer context of the XML document by reverse parsing of the XML document from an end of the XML document until a first occurrence of a close of a partition in the partition node XPath; mark, by the processor, said XML document at a location after the first occurrence of the close of said partition node with an indication of a start point of said footer context; and merge, by the processor, the header context and the footer context within said XML document, wherein the merging comprises moving values of the footer context to a marked location at an end of the header context while maintaining sequencing of level information within the header context and the footer context, and each resulting partition is processed with said merged header and said footer context; before parsing the main body portion of the XML document, the computer readable program code is further configured to: determine, by the processor, locations within said XML document to form initial partitions, scan, without parsing, said XML document to identify sections required to remain continuous based on the ETL job requirements and the schema of the XML document, and adjust, without parsing, boundaries of said initial partitions to reside outside said continuous sections to determine resulting partitions for said document; and perform parsing via parallel processing of the XML document, by a plurality of processors, using the adjusted boundaries of the resulting partitions. 7. The computer program product of claim 6 , wherein said computer readable program code is further configured to: process said resulting partitions in parallel to provide document information for storage. 8. The computer program product of claim 6 , wherein the computer readable program code is further configured to adjust boundaries of said initial partitions so as to maintain at least one of a character data section, a comment section, and a nested node definition within a single continuous section. 9. The computer program product of claim 6 , wherein said computer readable program code is configured to scan said XML document for said continuous sections without parsing and adjust boundaries of said initial partitions to reside outside said continuous sections to determine resulting partitions for said document by: a) scanning said XML document from a start point of said document to a first partition point to determine whether the first partition point is located within a continuous section; b) in response to a determination that the first partition point is within a continuous section, moving the first partition point to a location within said document that is prior or subsequent to an occurrence of the continuous section; c) repeating steps a) and b) with subsequent partition points until reaching an end of said XML document, wherein said scanning occurs from an immediate prior partition point to a next partition point in said document. 10. A system for identifying part
Fragmentation of text files, e.g. creating reusable text-blocks; Linking to fragments, e.g. using XInclude; Namespaces · CPC title
Parallelism detection · CPC title
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.