Temporal metadata track
US-2015356079-A1 · Dec 10, 2015 · US
US8949255B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-8949255-B1 |
| Application number | US-201213536384-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jun 28, 2012 |
| Priority date | Jun 28, 2012 |
| Publication date | Feb 3, 2015 |
| Grant date | Feb 3, 2015 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are provided for storing files in a parallel computing system using sub-files with semantically meaningful boundaries. A method is provided for storing at least one file generated by a distributed application in a parallel computing system. The file comprises one or more of a complete file and a plurality of sub-files. The method comprises the steps of obtaining a user specification of semantic information related to the file; providing the semantic information as a data structure description to a data formatting library write function; and storing the semantic information related to the file with one or more of the sub-files in one or more storage nodes of the parallel computing system. The semantic information provides a description of data in the file. The sub-files can be replicated based on semantically meaningful boundaries.
Opening claim text (preview).
What is claimed is: 1. A method for storing at least one file generated by a distributed application in a parallel computing system, wherein said file comprises a plurality of sub-files, said method comprising the steps of: obtaining a user specification of semantic information related to said file; determining semantically meaningful sub-file boundaries for said plurality of sub-files based on said semantic information; providing said semantic information as a data structure description to a data formatting library write function, wherein said semantic information is dependent upon a content of said file; and storing said semantic information related to said file with additional metadata related to said file and with one or more of said sub-files using said determined semantically meaningful sub-file boundaries in one or more storage nodes of said parallel computing system. 2. The method of claim 1 , wherein said semantic information provides a description of data in said file. 3. The method of claim 1 , wherein said sub-files are replicated based on said semantically meaningful boundaries. 4. The method of claim 1 , further comprising the step of processing a query using said semantic information. 5. The method of claim 1 , further comprising the step of performing an analysis of said file using said semantic information. 6. The method of claim 1 , wherein a replication strategy can be specified for each of said plurality of said sub-files. 7. The method of claim 1 , wherein said one or more storage nodes reside on one or more tiers of a multi-tier storage system. 8. The method of claim 1 , wherein said semantic information related to a given sub-file is stored with said corresponding sub-file. 9. An apparatus for storing at least one file generated by a distributed application in a parallel computing system, wherein said file comprises a plurality of sub-files, comprising: a memory; and at least one hardware device operatively coupled to the memory and configured to: obtain a user specification of semantic information related to said file; determine semantically meaningful sub-file boundaries for said plurality of sub-files based on said semantic information; provide said semantic information as a data structure description to a data formatting library write function, wherein said semantic information is dependent upon a content of said file; and store said semantic information related to said file with additional metadata related to said file and with one or more of said sub-files using said determined semantically meaningful sub-file boundaries in one or more storage nodes of said parallel computing system. 10. The apparatus of claim 9 , wherein said semantic information provides a description of data in said file. 11. The apparatus of claim 9 , wherein said sub-files are replicated based on said semantically meaningful boundaries. 12. The apparatus of claim 9 , wherein said at least one hardware device is further configured to process a query using said semantic information. 13. The apparatus of claim 9 , wherein said at least one hardware device is further configured to perform an analysis of said file using said semantic information. 14. The apparatus of claim 9 , wherein said one or more storage nodes reside on one or more tiers of a multi-tier storage system. 15. The apparatus of claim 9 , wherein said semantic information related to a given sub-file is stored with said corresponding sub-file. 16. A data storage system for storing at least one file generated by a distributed application in a parallel computing system, wherein said file comprises a plurality of sub-files, comprising: a hardware processing device for obtaining a user specification of semantic information related to said file; determining semantically meaningful sub-file boundaries for said plurality of sub-files based on said semantic information; and for providing said semantic information as a data structure description to a data formatting library write function, wherein said semantic information is dependent upon a content of said file; and a memory for storing said semantic information related to said file with additional metadata related to said file and with one or more of said sub-files using said determined semantically meaningful sub-file boundaries in one or more storage nodes of said parallel computing system. 17. The data storage system of claim 16 , wherein said semantic information provides a description of data in said file. 18. The data storage system of claim 16 , wherein said sub-files are replicated based on said semantically meaningful boundaries. 19. The data storage system of claim 16 , wherein said processing device is further configured to process a query using said semantic information. 20. The data storage system of claim 16 , wherein said processing device is further configured to perform an analysis of said file using said semantic information. 21. The data storage system of claim 16 , wherein said one or more storage nodes reside on one or more tiers of a multi-tier storage system. 22. The data storage system of claim 16 , wherein said semantic information related to a given sub-file is stored with said corresponding sub-file.
Details of searching files based on file metadata · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.