Atomic incremental load for map-reduce systems on append-only file systems

US9424271B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9424271-B2
Application numberUS-201213600181-A
CountryUS
Kind codeB2
Filing dateAug 30, 2012
Priority dateAug 30, 2012
Publication dateAug 23, 2016
Grant dateAug 23, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Augmenting data files in a repository of an append-only file system comprises maintaining metadata corresponding to each data file for tracking a logical end-of-file (EOF) for each data file for appending. A global versioning mechanism for the metadata allows selecting the current version of the metadata to read for performing an append job for a set of data files. Each append job comprises multiple append tasks. For each successful append job, the global versioning mechanism increments a valid metadata version to use for each data file appended. Said valid metadata version indicates the logical EOF corresponding to a new physical EOF for each of the data files appended.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of augmenting data files in a repository of an append-only file system, comprising: maintaining a companion metadata file for each corresponding data file in a map-reduce system using the append-only file system, wherein each companion metadata file tracks a logical end-of-file (EOF) for each data file; maintaining global versioning of each companion metadata file for selecting a current version of EOF metadata to read for a corresponding data file; performing an append job for a set of data files using a modified read protocol for each reading task of the repository using a current global version number for the companion metadata file, wherein the append job comprises a map-reduce job including multiple append tasks; and for each successful append job, incrementing a logical EOF for each appended file to a new physical EOF, wherein the global versioning is used to increment a valid companion metadata file version for each data file appended, and said valid companion metadata file version indicates the logical EOF corresponding to the new physical EOF for each of the data files appended; and for each failed append task of the append job, maintaining a logical EOF for each failed append task by not incrementing the logical EOF for each failed append task, wherein subsequent append tasks that read a data file for retrying failed append tasks use metadata to stop reading upon reaching the logical EOF for the failed append task even when a current physical EOF is not reached. 2. The method of claim 1 , further comprising: for a failed data file append task, maintaining a current companion metadata file version for the data file, wherein partially appended bytes are ignored. 3. The method of claim 1 , further comprising: for a failed append task, in a next successful append task updating the companion metadata file to skip a region corresponding to a failed append task. 4. The method of claim 3 , further comprising: for a failed append task, in subsequent tasks, referring to said region as an invalid region. 5. The method of claim 4 , further comprising: after a failed append task, in a subsequent append task, incrementing the logical EOF to a new physical EOF. 6. The method of claim 5 , further comprising: for subsequent successful append tasks, updating the companion metadata file for skipping the invalid regions corresponding to a failed append task. 7. The method of claim 6 , further comprising: updating a global version of a companion metadata file when the append job comprising multiple append tasks succeeds, wherein a modified write protocol is used for writing to the repository, the modified write protocol augments data files with metadata files, and the global version number for each current metadata file is stored in the repository in a separate file. 8. The method of claim 7 , further comprising: not updating the global version of the companion metadata file if the append job fails even if one or more of the constituent tasks of the job succeeded. 9. The method of claim 1 , wherein the file system comprises an HDFS file system. 10. A method of data storage, comprising: augmenting data files in a repository of an append-only file system as a map- reduce job in a map-reduce system by atomic incremental load, including: maintaining a separate end-of-file (EOF) metadata file for each corresponding data file, wherein each EOF metadata file tracks a logical EOF for each data file; maintaining global versioning of the EOF metadata files for selecting the current version of an EOF metadata file to read, wherein different versions of EOF metadata files replace a previous versioned EOF metadata file; performing an append task of an append job for a data file using a modified read protocol for each reading task of the repository using a current global version number for the EOF metadata files, wherein the append job comprises a map-reduce job including multiple append tasks; and for each successful append job, incrementing a logical EOF for each appended file to a new physical EOF, wherein the global versioning is used to increment a valid companion metadata file version for each data file appended, and the valid companion metadata file version indicates the logical EOF corresponding to the new physical EOF for each data file appended; and for each failed append task of the append job, maintaining a logical EOF for each failed append task by not incrementing the logical EOF for each failed append task, wherein subsequent append tasks that read a data file for retrying failed append tasks use metadata to stop reading upon reaching the logical EOF for the failed append task even when a current physical EOF is not reached. 11. A computer program product for augmenting data files in a repository of an append-only file system, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, wherein the program instructions executable by a computer to cause the computer to perform a method comprising: maintaining a companion metadata file for each corresponding data file in a map-reduce system using the append-only file system, wherein each companion metadata file tracks a logical end-of-file (EOF) for each data file; maintaining global versioning of each companion metadata file for selecting a current version of a companion metadata file to read EOF metadata, wherein different versioned companion metadata files replace previous versioned companion metadata files; performing an append job for a set of data files using a modified read protocol for each reading task of the repository using a current global version number for the companion metadata file, wherein the append job comprises a map-reduce job including multiple append tasks; and for each successful append job, incrementing a logical EOF for each appended file to a new physical EOF, wherein the global versioning is used to increment a valid companion metadata file version to use for each data file appended, and the valid companion metadata file version indicates the logical EOF corresponding to the new physical EOF for each of the data files appended; and for each failed append task of the append job, maintaining a logical EOF for each failed append task by not incrementing the logical EOF for each failed append task, wherein subsequent append tasks that read a data file for retrying failed append tasks use metadata to stop reading upon reaching the logical EOF for the failed append task even when a current physical EOF is not reached. 12. The computer program product of claim 11 , further comprising: for a failed append task, in a next successful append task updating the companion metadata file to skip a region corresponding to a failed append task. 13. The computer program product of claim 12 , further comprising: for a failed append task, in subsequent tasks, referring to said region as an invalid region. 14. The computer program product of claim 13 , further comprising: after a failed append task, in a subsequent append task, incrementing the logical EOF to a new physical EOF. 15. The computer program product of claim 14 , further comprising: for subsequent successful append tasks, updating the companion metadata file for skipping the invalid regions corresponding to a failed append task. 16. The computer program product of claim 15 , further comprising: updating a global version for a companion metadata file when the append job comprising multiple append tasks succeeds, wherein a modified write protocol is

Assignees

Inventors

Classifications

  • G06F16/164Primary

    File meta data generation · CPC title

  • Versioning file systems, temporal file systems, e.g. file system supporting different historic versions of files · CPC title

  • Management specifically adapted to NAS (management of storage area networks [SAN] G06F3/067) · CPC title

  • Append-only file systems, e.g. using logs or journals to store data · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9424271B2 cover?
Augmenting data files in a repository of an append-only file system comprises maintaining metadata corresponding to each data file for tracking a logical end-of-file (EOF) for each data file for appending. A global versioning mechanism for the metadata allows selecting the current version of the metadata to read for performing an append job for a set of data files. Each append job comprises mul…
Who is the assignee on this patent?
Tata Sandeep, IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/164. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 23 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).