Storing data files in a file system

US9355108B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9355108-B2
Application numberUS-201314019014-A
CountryUS
Kind codeB2
Filing dateSep 5, 2013
Priority dateNov 7, 2012
Publication dateMay 31, 2016
Grant dateMay 31, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A mechanism is provided for storing data files in a file system. The file system provides a plurality of reference data files, where each reference data file in the plurality of data files represents a group of similar data files. The mechanism creates a new data file and associated the new data file with one reference data file in the plurality of data files thus defining an associated reference data file of the plurality of reference data files. The mechanism informs the file system about the association of the new data file with the associated reference data file. The mechanism compresses the new data file using the associated reference data file thereby forming a compressed data file. The mechanism stores the compressed data file together with information about the association of the new data file with the associated reference data file.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method, in a data processing system, for storing data files in a file system, wherein the file system provides a plurality of reference data files, wherein each reference data file in the plurality of reference data files represents a group of similar data files, the method comprising: creating a new data file that comprises a plurality of smaller data files; determining whether there is an association of each of the plurality of smaller data files in the new data file with a reference data file in the plurality reference of data files and, responsive to identifying a reference data filed associated with a smaller data file, defining an associated reference data file of the plurality of reference data files to the smaller data file of the new data file; informing the file system about each association of a smaller data file in the new data file to an associated reference data file in the plurality of reference data files; compressing the new data file thereby forming a compressed data file that is smaller in size due to each one of the one or more smaller data files in the new data file associated with the one or more associated reference data files being removed and replaced by information about the association of the one of the one or more smaller data files that are associated with a respective reference data file in the plurality of reference data files, wherein the new data file is compressed by the method comprising: receiving the new data file from an application; if information in the new data file indicates that one or more smaller data files in the new data file can be associated with one or more existing reference data files in the plurality of reference data files, for each smaller data file, determining the existing reference data file as the associated reference data file; if the information in the new data file indicates that one or more smaller data files in the new data file cannot be associated with an existing reference data file, associating each of the one or more smaller data files in the new data file that cannot be associated with an existing reference data file with a default reference data file as one or more new associated reference data files; compressing the new data file thereby forming a further compressed data file that is smaller in size due to each one of the one or more smaller data files in the new data file associated with the one or more associated reference data files being removed and replaced by information about the association of the one of the one or more smaller data files that are associated with a respective reference data file in the plurality of reference data files; providing a comparison result about a size of the further compressed data file to a size of the uncompressed data file; and deciding on storing the further compressed data file or the uncompressed data file depending on the comparison result; and storing the compressed data file together with the information about the association of the one of the one or more smaller data files in the new data file that are associated with a respective reference data file in the plurality of reference data files. 2. The method according to claim 1 , wherein the reference data file is created by the method comprising: comparing a plurality of data files concerning at least one of a part of a file content, a file type, an origin of the plurality of data files; determining a part of contents of the plurality of data files being common to the plurality of data files; and storing the part of contents in the reference data file. 3. The method according to claim 1 , wherein the reference data file is created by the method comprising: determining similar data files in the file system by a text analysis of the data files or an analysis of a file structure of the data files; determining a part of contents of the data files being common to the data files; and storing the part of contents in the reference data file. 4. The method according to claim 1 , wherein the reference data file is created by the method comprising: determining similar data files in the file system by determining a similarity in file names of the data files stored in a directory of the file system; determining a part of contents of the data files being common to the data files; and storing the part of contents in the reference data file. 5. The method according to claim 1 , wherein the new data is associated with the reference data file by the method comprising: responsive to no reference data file existing with respect to the smaller data files in the new data file, generating the associated reference data file for one or more of the smaller data files of the new data file to be stored in the file system; responsive to the reference data file existing with respect to one of the smaller data file in the new data file, selecting the reference data file as the associated reference data file for the smaller data file in the new data file to be stored in the file system; storing the associated reference data file and the smaller data file ojthe new data file in the file system; and informing the file system about the association of the smaller data file of the new data file with the associated reference data file. 6. The method according to claim 1 , wherein defining the association of the smaller data file of the new data file with the reference data file is performed by the method comprising: associating the associated reference data file with a sub-tree of the file system; and storing the new data file in the sub-tree. 7. The method according to claim 1 wherein informing the file system about each association of a smaller data filed in the new data file to an associated reference data file in the plurality of reference data files is performed by a file link command. 8. The method according to claim 1 , wherein compressing the new data file is performed by a delta compressing method. 9. The method according to claim 1 , wherein transliteration of the compressed data file associated with a previous associated reference data file with a new associated reference data file is performed by the method comprising: decompressing the compressed data file using the previous associated reference data file thereby creating a previous data file; and compressing the previous data file using the new associated reference data file thereby forming a new compressed data file; and storing the information about the new associated reference data file with the new compressed data file. 10. The method according to claim 9 , further comprising: identifying a set of compressed data files in the file system being compressed with the previous associated reference data file; decompressing the set of compressed data files using the previous associated reference data file thereby creating, for each compressed data file in the set of compressed data files, a previous data file thereby forming a set of previous data files; compressing each of the set of previous data files using the new associated reference data file thereby forming a set of new compressed data files; providing, for each of the new compressed data files, a comparison result about a size of the previous data file being compressed with the new associated reference data file to a size of the previous data file being compressed with the previous associated reference data file; and deciding for each of the new compressed data files on storing being compressed with the new associated reference data file or with the previous associated reference data file. 11. The method according to claim 1 , wherein the plurality of reference data files are

Assignees

Inventors

Classifications

  • Indexing; Data structures therefor; Storage structures · CPC title

  • based on delta files · CPC title

  • using compression, e.g. sparse files · CPC title

  • ICT programming tools or database systems specially adapted for bioinformatics · CPC title

  • File or folder operations, e.g. details of user interfaces specifically adapted to file systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9355108B2 cover?
A mechanism is provided for storing data files in a file system. The file system provides a plurality of reference data files, where each reference data file in the plurality of data files represents a group of similar data files. The mechanism creates a new data file and associated the new data file with one reference data file in the plurality of data files thus defining an associated referen…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/1756. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 31 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).