Backup server selection based on data commonality
US-10496322-B2 · Dec 3, 2019 · US
US11226935B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11226935-B2 |
| Application number | US-201916359445-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 20, 2019 |
| Priority date | Apr 28, 2018 |
| Publication date | Jan 18, 2022 |
| Grant date | Jan 18, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Technique determine (or detect) duplicated data. The techniques involve: in response to determining that data at a first position in input data is the same as predetermined data, determining a feature value of a selected portion of input data; determining whether the feature value matches with a pre-stored duplicated data pattern in a duplicated data pattern list; and in response to determining that the feature value matches with the duplicated data pattern, determining an association of the input data with reference data which is associated with the matched pattern.
Opening claim text (preview).
We claim: 1. A method of determining duplicated data, comprising: in a first layer of comparison, determining that a first data portion of input data is the same as data from a plurality of predetermined locations of the input data; in response to determining that the first data portion of the input data is the same as the data from the plurality of predetermined locations of the input data, determining a feature value of the first data portion of the input data; in a second layer of comparison, determining that the feature value of the first data portion of the input data is matched with a pre-stored duplicated data pattern in a duplicated data pattern list; in response to determining that the feature value of the first data portion of the input data is matched with the pre-stored duplicated data pattern in the duplicated data pattern list, determining an association of the first data portion with corresponding reference data associated with the pre-stored duplicated data pattern; in a third layer of comparison, determining that the association indicates that the first data portion is not associated with the corresponding reference data; and in response to determining that the association indicates that the first data portion is not associated with the corresponding reference data, storing the input data. 2. The method according to claim 1 , wherein the plurality of predetermined locations are adjacent to a plurality of equidistant locations between a starting portion and an ending portion of the input data, respectively. 3. The method according to claim 1 , wherein the data from the plurality of predetermined locations are based on combining data in a second number of bytes at each of a first number of the plurality of predetermined locations of the input data. 4. The method according to claim 1 , wherein determining the association of the first data portion with the corresponding reference data comprises: determining, from the first data portion, a plurality of data portions having a predetermined length; determining, based on the reference data, a plurality of second data portions having a predetermined length; and determining the association based on a comparison of the plurality of first data portions determined from the first data portion and the plurality of second data portions determined based on the reference data. 5. An apparatus for determining duplicated data, comprising: a memory configured to store one or more programs; a processing unit coupled to the memory and configured to execute the one or more programs to cause the apparatus to perform acts comprising: in a first layer of comparison, determining that a first data portion of input data is the same as data from a plurality of predetermined locations of the input data; in response to determining that the first data portion of the input data is the same as the data from the plurality of predetermined locations of the input data, determining a feature value of the first data portion of the input data; in a second layer of comparison, determining that the feature value of the first data portion of the input data is matched with a pre-stored duplicated data pattern in a duplicated data pattern list; in response to determining that the feature value of the first data portion of the input data is matched with the pre-stored duplicated data pattern in the duplicated data pattern list, determining an association of the first data portion with corresponding reference data associated with the pre-stored duplicated data pattern; in a third layer of comparison, determining that the association indicates that the first data portion is not associated with the corresponding reference data; and in response to determining that the association indicates that the first data portion is not associated with the corresponding reference data, storing the input data. 6. The apparatus according to claim 5 , wherein the plurality of predetermined locations are adjacent to a plurality of equidistant locations between a starting portion and an ending portion of the input data, respectively. 7. The apparatus according to claim 5 , wherein the data from the plurality of predetermined locations are based on combining data in a second number of bytes at each of a first number of the plurality of predetermined locations of the input data. 8. The apparatus according to claim 5 , wherein determining the association of the first data portion with the corresponding reference data comprises: determining, from the first data portion, a plurality of data portions having a predetermined length; determining, based on the reference data, a plurality of second data portions having a predetermined length; and determining the association based on a comparison of the plurality of data portions determined from the first data portion and the plurality of second data portions determined based on the reference data. 9. A computer program product having a non-transitory computer readable medium that stores a set of instructions to detect duplicated data received by a data storage array; the set of instructions, when carried out by the data storage array, causing the data storage array to perform a method of: in a first layer of comparison, determining that a first data portion of input data is the same as from a plurality of predetermined locations of the input data; in response to determining that the first data portion of the input data is the same as the data from the plurality of predetermined locations of the input data, determining a feature value of the first data portion of the input data; in a second layer of comparison, determining that the feature value of the first data portion of the input data is matched with a pre-stored duplicated data pattern in a duplicated data pattern list; in response to determining that the feature value of the first data portion of the input data is matched with the pre-stored duplicated data pattern in the duplicated data pattern list, determining an association of the first data portion with corresponding reference data associated with the pre-stored duplicated data pattern; in a third layer of comparison, determining that the association indicates that the first data portion is not associated with the corresponding reference data; and in response to determining that the association indicates that the first data portion is not associated with the corresponding reference data, storing the input data. 10. The computer program product of claim 9 , further comprising: in response to determining that the feature value of the first data portion of the input data is matched with the pre-stored duplicated data pattern in the duplicated data pattern list, applying reclaimed processing cycles to provide other computerized services.
Design, administration or maintenance of databases · CPC title
De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title
Matching criteria, e.g. proximity measures · CPC title
by evaluating different subsets according to an optimisation criterion, e.g. class separability, forward selection or backward elimination · CPC title
Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.