Grouping of objects into clusters in an object-based storage environment
US-2020134082-A1 · Apr 30, 2020 · US
US12001411B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12001411-B2 |
| Application number | US-202318161592-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 30, 2023 |
| Priority date | Apr 10, 2019 |
| Publication date | Jun 4, 2024 |
| Grant date | Jun 4, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, computer program products, and computer systems for the management of data references in an efficient and effective manner are disclosed. Such methods, computer program products, and computer systems include receiving a change tracking stream at the computer system, identifying a data object group, and performing a deduplication management operation on the data object group. The change tracking stream is received from a client computing system. The change tracking stream identifies one or more changes made to a plurality of data objects of the client computing system. The identifying is based, at least in part, on at least a portion of the change tracking stream. The data object group represents the plurality of data objects.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, implemented in a computer system, comprising: grouping a plurality of data objects into a plurality of data object groups, wherein the grouping is performed based, at least in part, on one or more characteristics of the plurality of data object groups and one or more thresholds, the plurality of data objects is or will be stored in a deduplicated file system, the deduplicated file system is stored in a storage unit of the computer system, the deduplicated file system is configured to be configured for access as a user-space file system by another computer system, and one or more units of data by a deduplication service of a deduplication system, one or more data objects of the plurality of data objects are in each data object group of the plurality of data object groups, and the grouping the plurality of data objects comprises for each data object of the plurality of data objects, identifying an identified data object group of the plurality of data object groups, based, at least in part, on one or more characteristics of one or more of the plurality of data object groups, recording a reference in a group record of a plurality of group records in a catalog for the deduplicated file system, wherein the reference identifies the each data object, the group record is for the identified data object group; for the each data object group of the plurality of data object groups, adding a group reference for the each data object group to a reference list, when the group reference for the each data object group is not already added to the reference list, wherein the group reference represents the one or more data objects of the each data object group by identifying the each data object group; and for another data object, determining whether the another data object can be added to a data object group of the plurality of data object groups, in response to a determination that the another data object can be added to the data object group, adding the another data object to the data object group, and in response to a determination that the another data object cannot be added to the data object group, creating a new data object group, and adding the another data object to the new data object group. 2. The method of claim 1 , wherein the plurality of data objects are listed in a change tracking list generated by a change tracker that records write operations to the change tracking list, the group reference is one of a plurality of group references, each group reference of the plurality of group references is a data object group identifier of a plurality of data object group identifiers, and each data object group identifier of the plurality of data object group identifiers corresponds to a data group of the plurality of data object groups. 3. The method of claim 1 , further comprising: detecting a backup operation, wherein the backup operation is performed on the deduplicated file system, and in response to detection of the backup operation, performing the grouping and recording. 4. The method of claim 1 , wherein the grouping further comprises: receiving a change tracking stream, wherein the change tracking stream identifies one or more changes made to one or more data objects of the plurality of data objects; and for each data object of the one or more data objects of the plurality of data objects, adding the each data object to one of the plurality of data object groups. 5. The method of claim 4 , wherein the adding the each data object to the one of the plurality of data object croups comprises: associating the one or more data objects of the plurality of data objects with the each data object group, wherein the one or more data objects of the plurality of data objects are associated with the each data object group of the plurality of data object groups by the recording the reference in metadata for the one or more data objects. 6. The method of claim 4 , wherein the determining whether the each data object can be added to the one of the plurality of data object groups is based, at least in part, on at least one threshold. 7. The method of claim 6 , wherein the at least one threshold comprises at least one of a number of data objects that can be included in the one of the plurality of data object groups, or an amount of data of data objects of the one of the plurality of data object groups that can be included in the one of the plurality of data object groups. 8. The method of claim 6 , wherein the creating comprises creating a new data object group record for the new data object group, and associating the each data object with the new data object group, comprising storing a data object identifier for the each data object in the new data object group record, and storing a data object group identifier in metadata for the each data object. 9. The method of claim 8 , wherein the creating the new data object group record comprises generating the data object group identifier, and the data object identifier is retrieved from an entry of the change tracking stream corresponding to the each data object. 10. The method of claim 1 , further comprising: receiving a change tracking stream, wherein the change tracking stream identifies one or more changes made to one or more data objects of the plurality of data objects, and each entry of the change tracking stream comprises a data object identifier of a plurality of data object identifiers, the data object identifier identifying a corresponding one of the one or more data objects of the plurality of data objects, and information identifying a change to the corresponding one of the one or more data objects of the plurality of data objects. 11. The method of claim 1 , further comprising: performing, for one of the plurality of data object groups, at least one of a data object write operation for the one of the plurality of data object groups, a container reference update operation for the one of the plurality of data object groups, or a path object update operation for the one of the plurality of data object groups. 12. The method of claim 1 , further comprising: performing a deduplication management operation on one of the plurality of data object groups, wherein the deduplication management operation is one of a group deletion operation, a group update operation, or a group merge operation, and the deduplication management operation is performed on the one of the plurality of data object groups, rather than on one or more data objects of the one of the plurality of data object groups. 13. The method of claim 12 , wherein the group reference is one of a plurality of group references, the deduplication management operation is the group deletion operation, which comprises identifying a first data object group to be deleted, deleting the first data object group, and performing a dereference operation on a first group reference of the plurality of group references, and the first group reference refers to the first data object group. 14. The method of claim 12 , wherein the deduplication management operation is the group update operation, which comprises creating another data object group, deleting a first data object from a first data object group of the plurality of data object groups, and deleting a second data object from a second data object group of the plurality of data object groups. 15. The method of claim 12 , wherein the deduplication management operation is a group merge operation, comprising cr
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
Change logging, detection, and notification (replication G06F16/27) · CPC title
using de-duplication of the data · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.