Methods and apparatus for capture and storage of semantic information with sub-files in a parallel computing system

US8949255B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-8949255-B1
Application numberUS-201213536384-A
CountryUS
Kind codeB1
Filing dateJun 28, 2012
Priority dateJun 28, 2012
Publication dateFeb 3, 2015
Grant dateFeb 3, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are provided for storing files in a parallel computing system using sub-files with semantically meaningful boundaries. A method is provided for storing at least one file generated by a distributed application in a parallel computing system. The file comprises one or more of a complete file and a plurality of sub-files. The method comprises the steps of obtaining a user specification of semantic information related to the file; providing the semantic information as a data structure description to a data formatting library write function; and storing the semantic information related to the file with one or more of the sub-files in one or more storage nodes of the parallel computing system. The semantic information provides a description of data in the file. The sub-files can be replicated based on semantically meaningful boundaries.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for storing at least one file generated by a distributed application in a parallel computing system, wherein said file comprises a plurality of sub-files, said method comprising the steps of: obtaining a user specification of semantic information related to said file; determining semantically meaningful sub-file boundaries for said plurality of sub-files based on said semantic information; providing said semantic information as a data structure description to a data formatting library write function, wherein said semantic information is dependent upon a content of said file; and storing said semantic information related to said file with additional metadata related to said file and with one or more of said sub-files using said determined semantically meaningful sub-file boundaries in one or more storage nodes of said parallel computing system. 2. The method of claim 1 , wherein said semantic information provides a description of data in said file. 3. The method of claim 1 , wherein said sub-files are replicated based on said semantically meaningful boundaries. 4. The method of claim 1 , further comprising the step of processing a query using said semantic information. 5. The method of claim 1 , further comprising the step of performing an analysis of said file using said semantic information. 6. The method of claim 1 , wherein a replication strategy can be specified for each of said plurality of said sub-files. 7. The method of claim 1 , wherein said one or more storage nodes reside on one or more tiers of a multi-tier storage system. 8. The method of claim 1 , wherein said semantic information related to a given sub-file is stored with said corresponding sub-file. 9. An apparatus for storing at least one file generated by a distributed application in a parallel computing system, wherein said file comprises a plurality of sub-files, comprising: a memory; and at least one hardware device operatively coupled to the memory and configured to: obtain a user specification of semantic information related to said file; determine semantically meaningful sub-file boundaries for said plurality of sub-files based on said semantic information; provide said semantic information as a data structure description to a data formatting library write function, wherein said semantic information is dependent upon a content of said file; and store said semantic information related to said file with additional metadata related to said file and with one or more of said sub-files using said determined semantically meaningful sub-file boundaries in one or more storage nodes of said parallel computing system. 10. The apparatus of claim 9 , wherein said semantic information provides a description of data in said file. 11. The apparatus of claim 9 , wherein said sub-files are replicated based on said semantically meaningful boundaries. 12. The apparatus of claim 9 , wherein said at least one hardware device is further configured to process a query using said semantic information. 13. The apparatus of claim 9 , wherein said at least one hardware device is further configured to perform an analysis of said file using said semantic information. 14. The apparatus of claim 9 , wherein said one or more storage nodes reside on one or more tiers of a multi-tier storage system. 15. The apparatus of claim 9 , wherein said semantic information related to a given sub-file is stored with said corresponding sub-file. 16. A data storage system for storing at least one file generated by a distributed application in a parallel computing system, wherein said file comprises a plurality of sub-files, comprising: a hardware processing device for obtaining a user specification of semantic information related to said file; determining semantically meaningful sub-file boundaries for said plurality of sub-files based on said semantic information; and for providing said semantic information as a data structure description to a data formatting library write function, wherein said semantic information is dependent upon a content of said file; and a memory for storing said semantic information related to said file with additional metadata related to said file and with one or more of said sub-files using said determined semantically meaningful sub-file boundaries in one or more storage nodes of said parallel computing system. 17. The data storage system of claim 16 , wherein said semantic information provides a description of data in said file. 18. The data storage system of claim 16 , wherein said sub-files are replicated based on said semantically meaningful boundaries. 19. The data storage system of claim 16 , wherein said processing device is further configured to process a query using said semantic information. 20. The data storage system of claim 16 , wherein said processing device is further configured to perform an analysis of said file using said semantic information. 21. The data storage system of claim 16 , wherein said one or more storage nodes reside on one or more tiers of a multi-tier storage system. 22. The data storage system of claim 16 , wherein said semantic information related to a given sub-file is stored with said corresponding sub-file.

Assignees

Inventors

Classifications

  • G06F16/14Primary

    Details of searching files based on file metadata · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US8949255B1 cover?
Techniques are provided for storing files in a parallel computing system using sub-files with semantically meaningful boundaries. A method is provided for storing at least one file generated by a distributed application in a parallel computing system. The file comprises one or more of a complete file and a plurality of sub-files. The method comprises the steps of obtaining a user specification …
Who is the assignee on this patent?
Faibish Sorin, Bent John M, Tzelnic Percy, and 4 more
What technology area does this patent fall under?
Primary CPC classification G06F16/14. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 03 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).