What technology area does this patent fall under?

Primary CPC classification G06F16/21. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 18 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method, device and computer program product for determining duplicated data

US11226935B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11226935-B2
Application number	US-201916359445-A
Country	US
Kind code	B2
Filing date	Mar 20, 2019
Priority date	Apr 28, 2018
Publication date	Jan 18, 2022
Grant date	Jan 18, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Technique determine (or detect) duplicated data. The techniques involve: in response to determining that data at a first position in input data is the same as predetermined data, determining a feature value of a selected portion of input data; determining whether the feature value matches with a pre-stored duplicated data pattern in a duplicated data pattern list; and in response to determining that the feature value matches with the duplicated data pattern, determining an association of the input data with reference data which is associated with the matched pattern.

First claim

Opening claim text (preview).

We claim: 1. A method of determining duplicated data, comprising: in a first layer of comparison, determining that a first data portion of input data is the same as data from a plurality of predetermined locations of the input data; in response to determining that the first data portion of the input data is the same as the data from the plurality of predetermined locations of the input data, determining a feature value of the first data portion of the input data; in a second layer of comparison, determining that the feature value of the first data portion of the input data is matched with a pre-stored duplicated data pattern in a duplicated data pattern list; in response to determining that the feature value of the first data portion of the input data is matched with the pre-stored duplicated data pattern in the duplicated data pattern list, determining an association of the first data portion with corresponding reference data associated with the pre-stored duplicated data pattern; in a third layer of comparison, determining that the association indicates that the first data portion is not associated with the corresponding reference data; and in response to determining that the association indicates that the first data portion is not associated with the corresponding reference data, storing the input data. 2. The method according to claim 1 , wherein the plurality of predetermined locations are adjacent to a plurality of equidistant locations between a starting portion and an ending portion of the input data, respectively. 3. The method according to claim 1 , wherein the data from the plurality of predetermined locations are based on combining data in a second number of bytes at each of a first number of the plurality of predetermined locations of the input data. 4. The method according to claim 1 , wherein determining the association of the first data portion with the corresponding reference data comprises: determining, from the first data portion, a plurality of data portions having a predetermined length; determining, based on the reference data, a plurality of second data portions having a predetermined length; and determining the association based on a comparison of the plurality of first data portions determined from the first data portion and the plurality of second data portions determined based on the reference data. 5. An apparatus for determining duplicated data, comprising: a memory configured to store one or more programs; a processing unit coupled to the memory and configured to execute the one or more programs to cause the apparatus to perform acts comprising: in a first layer of comparison, determining that a first data portion of input data is the same as data from a plurality of predetermined locations of the input data; in response to determining that the first data portion of the input data is the same as the data from the plurality of predetermined locations of the input data, determining a feature value of the first data portion of the input data; in a second layer of comparison, determining that the feature value of the first data portion of the input data is matched with a pre-stored duplicated data pattern in a duplicated data pattern list; in response to determining that the feature value of the first data portion of the input data is matched with the pre-stored duplicated data pattern in the duplicated data pattern list, determining an association of the first data portion with corresponding reference data associated with the pre-stored duplicated data pattern; in a third layer of comparison, determining that the association indicates that the first data portion is not associated with the corresponding reference data; and in response to determining that the association indicates that the first data portion is not associated with the corresponding reference data, storing the input data. 6. The apparatus according to claim 5 , wherein the plurality of predetermined locations are adjacent to a plurality of equidistant locations between a starting portion and an ending portion of the input data, respectively. 7. The apparatus according to claim 5 , wherein the data from the plurality of predetermined locations are based on combining data in a second number of bytes at each of a first number of the plurality of predetermined locations of the input data. 8. The apparatus according to claim 5 , wherein determining the association of the first data portion with the corresponding reference data comprises: determining, from the first data portion, a plurality of data portions having a predetermined length; determining, based on the reference data, a plurality of second data portions having a predetermined length; and determining the association based on a comparison of the plurality of data portions determined from the first data portion and the plurality of second data portions determined based on the reference data. 9. A computer program product having a non-transitory computer readable medium that stores a set of instructions to detect duplicated data received by a data storage array; the set of instructions, when carried out by the data storage array, causing the data storage array to perform a method of: in a first layer of comparison, determining that a first data portion of input data is the same as from a plurality of predetermined locations of the input data; in response to determining that the first data portion of the input data is the same as the data from the plurality of predetermined locations of the input data, determining a feature value of the first data portion of the input data; in a second layer of comparison, determining that the feature value of the first data portion of the input data is matched with a pre-stored duplicated data pattern in a duplicated data pattern list; in response to determining that the feature value of the first data portion of the input data is matched with the pre-stored duplicated data pattern in the duplicated data pattern list, determining an association of the first data portion with corresponding reference data associated with the pre-stored duplicated data pattern; in a third layer of comparison, determining that the association indicates that the first data portion is not associated with the corresponding reference data; and in response to determining that the association indicates that the first data portion is not associated with the corresponding reference data, storing the input data. 10. The computer program product of claim 9 , further comprising: in response to determining that the feature value of the first data portion of the input data is matched with the pre-stored duplicated data pattern in the duplicated data pattern list, applying reclaimed processing cycles to provide other computerized services.

Assignees

Emc Ip Holding Co Llc

Inventors

Classifications

G06F16/21Primary
Design, administration or maintenance of databases · CPC title
G06F16/1748Primary
De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title
G06F18/22
Matching criteria, e.g. proximity measures · CPC title
G06F18/2115
by evaluating different subsets according to an optimisation criterion, e.g. class separability, forward selection or backward elimination · CPC title
G06F18/285
Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system · CPC title

Patent family

Related publications grouped by family.

View patent family 68292561

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11226935B2 cover?: Technique determine (or detect) duplicated data. The techniques involve: in response to determining that data at a first position in input data is the same as predetermined data, determining a feature value of a selected portion of input data; determining whether the feature value matches with a pre-stored duplicated data pattern in a duplicated data pattern list; and in response to determining…
Who is the assignee on this patent?: Emc Ip Holding Co Llc
What technology area does this patent fall under?: Primary CPC classification G06F16/21. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 18 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Backup server selection based on data commonality

Automated charge backup modelling

Method for replicating data in a backup storage system using a cost function

Method for replicating data in a backup storage system using a cost function

Methods and apparatus for data processing using data compression, linked lists and de-duplication techniques

Frequently asked questions