Determining trusted file awareness via loosely connected events and file attributes
US-2024364713-A1 · Oct 31, 2024 · US
US2016306799A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016306799-A1 |
| Application number | US-201615198345-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 30, 2016 |
| Priority date | Aug 30, 2012 |
| Publication date | Oct 20, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Augmenting data files in a repository of an append-only file system includes maintaining a companion metadata file for each corresponding data file in a map-reduce system using the append-only file system. Each companion metadata file tracks a logical end-of-file (EOF) for each data file. Global versioning of each companion metadata is maintained. A map-reduce append job is performed for a set of data files using a current global version number for the companion metadata file. The map-reduce job including multiple append tasks. For each successful append job, a logical EOF for each appended file is incremented to a new physical EOF. For each failed append task of the append job, a logical EOF is maintained for each failed append task by not incrementing the logical EOF for each failed append task.
Opening claim text (preview).
What is claimed is: 1 . A method of augmenting data files in a repository of an append-only file system, comprising: maintaining a companion metadata file for each corresponding data file in a map-reduce system using the append-only file system, wherein each companion metadata file tracks a logical end-of-file (EOF) for each data file; maintaining global versioning of each companion metadata; performing a map-reduce append job for a set of data files using a current global version number for the companion metadata file, wherein the map-reduce job including multiple append tasks; for each successful append job, incrementing a logical EOF for each appended file to a new physical EOF; and for each failed append task of the append job, maintaining a logical EOF for each failed append task by not incrementing the logical EOF for each failed append task. 2 . The method of claim 1 , wherein global versioning is used to increment a valid companion metadata file version for each data file appended, and said valid companion metadata file version indicates the logical EOF corresponding to the new physical EOF for each of the data files appended. 3 . The method of claim 2 , wherein subsequent append tasks that read a data file for retrying failed append tasks use metadata to stop reading upon reaching the logical EOF for the failed append task even when a current physical EOF is not reached. 4 . The method of claim 1 , further comprising: for a failed data file append task, maintaining a current companion metadata file version for the data file, wherein partially appended bytes are ignored. 5 . The method of claim 1 , further comprising: for a failed append task: in a next successful append task updating the companion metadata file to skip a region corresponding to a failed append task; and in subsequent tasks, referring to the skipped region as an invalid region; and after a failed append task, in a subsequent append task, incrementing the logical EOF to a new physical EOF. 6 . The method of claim 4 , further comprising: using a single writer for write instructions to avoid concurrent writers; upon a determination that an existing metadata file exists with a version value set to a new version value, deleting the metadata file and creating a new metadata file on completion of a write instruction; and creating a new metadata file with the version set to the new version value. 7 . The method of claim 4 , further comprising: for each data file being read: setting a local version value of a file to a maximum metadata version value; reading a metadata file having the local version value; configuring a record reader for each record, split with invalid regions and the logical EOF; and reading data up to the logical EOF while skipping over invalid regions. 8 . The method of claim 4 , further comprising: performing periodic garbage collection comprising rewriting a data file, omitting invalid regions, updating the metadata file to purge all of the invalid regions, and pointing to the new logical EOF. 9 . The method of claim 8 , wherein garbage collection is performed while all other read instructions are stopped. 10 . The method of claim 1 , wherein the file system comprises a Hadoop Distributed File System (HDFS). 11 . A computer program product for augmenting data files in a repository of an append-only file system, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, wherein the program instructions executable by a computer to cause the computer to: maintain, by the computer, a companion metadata file for each corresponding data file in a map-reduce system using the append-only file system, wherein each companion metadata file tracks a logical end-of-file (EOF) for each data file; maintain, by the computer, global versioning of each companion metadata file; perform, by the computer, a map-reduce append job for a set of data files a current global version number for the companion metadata file, wherein map-reduce job including multiple append tasks; for each successful append job, increment, by the computer, a logical EOF for each appended file to a new physical EOF; and for each failed append task of the append job, maintain, by the computer, a logical EOF for each failed append task by not incrementing the logical EOF for each failed append task. 12 . The computer program product of claim 11 , wherein the program instructions further cause the computer to: perform, by the computer, periodic garbage collection comprising rewriting a data file, omitting invalid regions, updating the metadata file to purge all of the invalid regions, and pointing to the new logical EOF. 13 . The computer program product of claim 12 , wherein garbage collection is performed while all other read instructions are stopped. 14 . A storage device comprising: a memory storing instructions; and a processor configured to execute the instructions including: maintaining a companion metadata file in the memory for each corresponding data file in an append-only file system, wherein each companion metadata file tracks a logical end-of-file (EOF) for each data file; maintaining global versioning of each companion metadata in the memory; performing a map-reduce append job for a set of data files using a current global version number for the companion metadata file, wherein the map-reduce job including multiple append tasks; for each successful append job, incrementing a logical EOF for each appended file to a new physical EOF; and for each failed append task of the append job, maintaining a logical EOF for each failed append task by not incrementing the logical EOF for each failed append task. 15 . The storage device of claim 14 , wherein: the processor uses global versioning to increment a valid companion metadata file version for each data file appended; said valid companion metadata file version indicates the logical EOF corresponding to the new physical EOF for each of the data files appended; and subsequent append tasks that read a data file for retrying failed append tasks use metadata to stop reading upon reaching the logical EOF for the failed append task even when a current physical EOF is not reached. 16 . The storage device of claim 14 , wherein the processor is further configured to perform further instructions including: for a failed append task: in a next successful append task updating the companion metadata file to skip a region corresponding to a failed append task; and in subsequent tasks, referring to said region as an invalid region; and after a failed append task, in a subsequent append task, incrementing the logical EOF to a new physical EOF. 17 . The storage device of claim 14 , wherein the processor is further configured to perform further instructions including: causing only a single writer to perform write instructions to avoid concurrent writers performing write instructions; upon a determination that an existing metadata file exists with a version value set to a new version value, deleting the metadata file and creating a new metadata file in the memory on completion of a write instruction; and creating a new metadata file in the memory with the version set to the new version value. 18 . The storage device of claim 14 , wherein the processor is further configured to perform further instructions including: for each data file being read: setting a local version value of a data file to a maximum
Management specifically adapted to NAS (management of storage area networks [SAN] G06F3/067) · CPC title
Versioning file systems, temporal file systems, e.g. file system supporting different historic versions of files · CPC title
File meta data generation · CPC title
Append-only file systems, e.g. using logs or journals to store data · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.