Parallel processing database tree structure
US-2015379078-A1 · Dec 31, 2015 · US
US10262000B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-10262000-B1 |
| Application number | US-201313921657-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jun 19, 2013 |
| Priority date | Jun 19, 2013 |
| Publication date | Apr 16, 2019 |
| Grant date | Apr 16, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are provided for globally appending data from a group of distributed processes to a shared file using a log-structured file system. Data generated by a plurality of processes in a parallel computing system are appended to a shared file by storing the data to the shared file using a log-structured file system (such as a Parallel Log-Structured File System (PLFS)); and generating an index entry for the data, the index entry comprising a logical offset entry and a timestamp entry indicating a time of the storage, wherein the logical offset entry is resolved at read time. The logical offset entry can be populated with an append placeholder that is resolved when the shared file is read. At read time, a plurality of the index entries associated with the shared file can be sorted using the timestamp entry to deliver the requested shared file to a requesting application.
Opening claim text (preview).
What is claimed is: 1. A method for appending data generated by a plurality of processes in a parallel computing system to a shared file, comprising the steps of: storing, using at least one processing device, said data from said plurality of processes to a non-deterministic logical end of said shared file in a storage medium using a log-structured file system; generating, using at least one processing device, an index entry for said data, said index entry comprising a logical offset entry and a timestamp entry indicating a time of said storage into said shared file in said storage medium; and constructing a view of the shared file at read time by (i) sorting, at said read time, a plurality of said timestamp entries for said shared file indicating said time of said storage of said data from said plurality of processes into said shared file in said storage medium, and (ii) determining, at said read time, a deterministic location for each of a plurality of data chunks in the shared file based on the sorted timestamp entries, wherein the shared file is shared by said plurality of processes. 2. The method of claim 1 , further comprising the step of populating said logical offset entry with an append placeholder that is resolved when said shared file is read. 3. The method of claim 1 , wherein said sorting further comprises the step of reconstructing multiple write streams from said plurality of processes to a single logical file in a single read stream. 4. The method of claim 1 , wherein said sorting defers a mapping of the deterministic location for each of a plurality of data chunks in said shared file until a reading application opens said shared file. 5. The method of claim 1 , wherein said log-structured file system comprises a Parallel Log-Structured File System. 6. The method of claim 1 , wherein said storing step further comprises the step of storing said data at a logical end of said shared file. 7. The method of claim 1 , wherein said storing step creates a write stream for each of said plurality of processes. 8. The method of claim 7 , wherein said write streams for said plurality of processes are reassembled into a single read stream at read time. 9. The method of claim 1 , wherein said plurality of processes are running on a plurality of compute nodes. 10. The method of claim 1 , wherein shared file is provided to a middleware virtual file system for storage. 11. The method of claim 1 , wherein said shared file is stored on a parallel file system comprised of one or more disks. 12. A computer program product comprising a tangible machine-readable recordable storage medium having encoded therein executable code of one or more software programs, wherein the one or more software programs when executed by the processor of the processing device implement the steps of the method of claim 1 . 13. An apparatus for appending data generated by a plurality of processes in a parallel computing system to a shared file, comprising: a memory; and at least one processing device operatively coupled to the memory and configured to: store, using at least one processing device, said data from said plurality of processes to a non-deterministic logical end of said shared file in a storage medium using a log-structured file system; generate, using at least one processing device, an index entry for said data, said index entry comprising a logical offset entry and a timestamp entry indicating a time of said storage into said shared file in said storage medium; and construct a view of the shared file at read time by (i) sorting, at said read time, a plurality of said timestamp entries for said shared file indicating said time of said storage of said data from said plurality of processes into said shared file in said storage medium, and (ii) determining, at said read time, a deterministic location for each of a plurality of data chunks in the shared file based on the sorted timestamp entries, wherein the shared file is shared by said plurality of processes. 14. The apparatus of claim 13 , wherein said at least one hardware device is further configured to populate said logical offset entry with an append placeholder that is resolved when said shared file is read. 15. The apparatus of claim 13 , wherein said sorting further comprises reconstructing multiple write streams from said plurality of processes to a single logical file in a single read stream. 16. The apparatus of claim 13 , wherein said sorting defers a mapping of the deterministic location for each of a plurality of data chunks in said shared file until a reading application opens said shared file. 17. The apparatus of claim 13 , wherein said log-structured file system comprises a Parallel Log-Structured File System. 18. The apparatus of claim 13 , wherein said data is stored at a logical end of said shared file. 19. The apparatus of claim 13 , wherein a write stream is created for each of said plurality of processes. 20. The apparatus of claim 13 , wherein said plurality of processes are running on a plurality of compute nodes. 21. The apparatus of claim 13 , wherein shared file is stored on one or more of a middleware virtual file system one or more disks of a parallel file system. 22. A data storage system for appending data generated by a plurality of processes in a parallel computing system to a shared file, comprising: a storage medium for storing said shared file and an index entry; and a hardware processing unit for (i) storing said data from said plurality of processes to a non-deterministic logical end of said shared file using a log-structured file system; and generating, using said hardware processing unit, said index entry for said data, said index entry comprising a logical offset entry and a timestamp entry indicating a time of said storage into said shared file in said storage medium, and (ii) constructing a view of the shared file at read time by (a) sorting, at said read time, a plurality of said timestamp entries for said shared file indicating said time of said storage of said data from said plurality of processes into said shared file in said storage medium, and (b) determining, at said read time, a deterministic location for each of a plurality of data chunks in the shared file based on the sorted timestamp entries, wherein the shared file is shared by said plurality of processes. 23. The data storage system of claim 22 , wherein said sorting further comprises reconstructing multiple write streams from said plurality of processes to a single logical file in a single read stream. 24. The data storage system of claim 22 , wherein said sorting defers a mapping of the deterministic location for each of a plurality of data chunks in said shared file until a reading application opens said shared file.
File system administration, e.g. details of archiving or snapshots (error detection or correction of the data by redundancy in operations G06F11/14) · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.