Deduplicating data integrity checks across systems

US11983147B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11983147-B2
Application numberUS-202117337086-A
CountryUS
Kind codeB2
Filing dateJun 2, 2021
Priority dateJun 2, 2021
Publication dateMay 14, 2024
Grant dateMay 14, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method, according to one embodiment, includes: receiving, at a clustered filesystem from a formatted filesystem, a request to perform a data integrity check for a portion of data. A determination is made as to whether the request includes a filesystem type of the portion of data, and in response to determining that the request includes a filesystem type of the portion of data, another determination is made as to whether the clustered filesystem supports the data integrity check for the filesystem type. In response to determining the clustered filesystem supports the data integrity check, another determination is made as to whether the portion of data is currently available. Furthermore, the computer-implemented method includes causing the data integrity check to be performed in response to determining that the portion of data is currently available. Results of performing the data integrity check are also sent to the formatted filesystem.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: receiving, at a clustered filesystem from a formatted filesystem, a first request to perform a data integrity check for a portion of data; determining whether the first request includes a filesystem type of the portion of data, wherein determining whether the first request includes the filesystem type of the portion of data includes determining whether the portion of data has a metadata tag which corresponds to the portion of data; in response to a determination that the portion of data does not have a metadata tag, sending instructions to a FSCK daemon to add a metadata tag on a file that includes the portion of data; receiving a second request to perform the data integrity check for the portion of data, wherein the second request includes the metadata tag appended thereto, wherein the metadata tag appended to the second request is in a form of a file header; extracting the metadata tag appended to the second request from the second request; using the metadata tag extracted from the second request and the metadata tag added to the file for determining whether a second request includes the filesystem type of the portion of data, in response to a determination that the second request includes the filesystem type of the portion of data, determining whether the clustered filesystem supports the data integrity check for the filesystem type of the portion of data; in response to a determination the clustered filesystem supports the data integrity check for the filesystem type of the portion of data, determining whether the portion of data is currently available; in response to a determination that the portion of data is currently available, causing the data integrity check to be performed by the clustered filesystem on the portion of data, wherein the data integrity check is not performed by the formatted filesystem; and sending results of performing the data integrity check to the formatted filesystem. 2. The computer-implemented method of claim 1 , wherein the data integrity check is a FSCK operation, wherein the FSCK operation includes: updating an allocation map to indicate blocks as free that have been incorrectly allocated, creating directory entries for files and/or directories that have inodes allocated but for which no directory entries exist, removing directory entries that point to directory entries which include metadata that contradicts metadata stored in the respective inodes, and updating link counts on files and/or directories to reflect accurate numbers. 3. The computer-implemented method of claim 2 , comprising: establishing a communication channel between a FSCK process in the formatted filesystem and a FSCK process in the clustered filesystem, wherein the first request is received from the formatted filesystem along the communication channel, wherein causing the data integrity check to be performed by the clustered filesystem on the portion of data includes: sending one or more instructions to the FSCK daemon to perform the data integrity check. 4. The computer-implemented method of claim 1 , wherein the results of performing the data integrity check that are sent to the formatted filesystem include FSCK timestamps. 5. The computer-implemented method of claim 1 , wherein determining whether the portion of data is currently available includes: determining, from the metadata tag added to the file, whether the portion of data has been exported from the clustered filesystem and/or is currently used as local storage by the formatted filesystem. 6. The computer-implemented method of claim 1 , wherein the portion of data is determined to have a metadata tag which corresponds to the portion of data, wherein the determined metadata tag which corresponds to the portion of data includes information associated with the portion of data, wherein the information includes application container details. 7. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions are readable and/or executable by a processor to cause the processor to: receive, by the processor at a backend clustered filesystem from a formatted filesystem, a request to perform a data integrity check for a portion of data; determine, by the processor, whether an extended attribute appended to the request includes a filesystem type of the portion of data, wherein the extended attribute is in a form of a file header; in response to a determination that the extended attribute appended to the request includes a filesystem type of the portion of data, determine, by the processor, whether the backend clustered filesystem supports the data integrity check for the filesystem type of the portion of data; in response to a determination that the backend clustered filesystem supports the data integrity check for the filesystem type of the portion of data, determine, by the processor, whether the portion of data is currently available; in response to a determination that the portion of data is currently available, cause, by the processor, the data integrity check to be performed on the portion of data using the backend clustered filesystem, and in order to prevent operational duplicity, preventing the data integrity check from being performed in the formatted filesystem by not promoting the performance of the data integrity check in the formatted filesystem, wherein the data integrity check is a filesystem consistency check (FSCK) operation performed on a file that includes the portion of data; and send, by the processor, results of performing the data integrity check to the formatted filesystem. 8. The computer program product of claim 7 , wherein the program instructions are readable and/or executable by the processor to cause the processor to: establish, by the processor, a communication channel between a FSCK process in the formatted filesystem and a FSCK process in the backend clustered filesystem, wherein the request is received from the formatted filesystem along the communication channel, wherein causing the data integrity check to be performed on the portion of data includes: sending one or more instructions to a FSCK daemon associated with the FSCK process in the backend clustered filesystem to perform the data integrity check. 9. The computer program product of claim 8 , wherein causing the data integrity check to be performed on the portion of data includes: sending one or more instructions to an FSCK daemon to perform the FSCK operation. 10. The computer program product of claim 7 , wherein the results of performing the data integrity check include FSCK timestamps. 11. The computer program product of claim 7 , wherein determining whether the portion of data is currently available includes: determining whether the portion of data has been exported from the backend clustered filesystem and/or is currently used as local storage by the formatted filesystem. 12. The computer program product of claim 7 , wherein the portion of data has a metadata tag which corresponds to the portion of data, wherein the metadata tag includes information associated with the portion of data, wherein the information includes filesystem type. 13. A system, comprising: a processor; and logic integrated with the processor, executable by the processor, or integrated with and executable by the processor, the logic being configured to: receive, by the processor at a clustered filesystem of a first environment from a formatted filesystem of a second environment having data storage devices with data physically stored therein, a

Assignees

Inventors

Classifications

  • G06F16/174Primary

    Redundancy elimination performed by the file system (error detection or correction of the data by redundancy in operations G06F11/14) · CPC title

  • Saving storage space on storage systems · CPC title

  • in relation to data integrity, e.g. data losses, bit errors · CPC title

  • De-duplication techniques · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11983147B2 cover?
A computer-implemented method, according to one embodiment, includes: receiving, at a clustered filesystem from a formatted filesystem, a request to perform a data integrity check for a portion of data. A determination is made as to whether the request includes a filesystem type of the portion of data, and in response to determining that the request includes a filesystem type of the portion of …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/174. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 14 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).