Method and device for deduplication
US-2019034449-A1 · Jan 31, 2019 · US
US12105973B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12105973-B2 |
| Application number | US-202016875981-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 15, 2020 |
| Priority date | Mar 25, 2020 |
| Publication date | Oct 1, 2024 |
| Grant date | Oct 1, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A storage device may include storage for data. A host interface may receive a write request from a host at the storage device. The write request may include a data chunk and a data identifier (ID). A class ID determiner circuitry may determine a class ID for the data chunk. A mapping table may map the data ID to the class ID.
Opening claim text (preview).
What is claimed is: 1. A lossy storage device, comprising: storage for a data; a host interface to receive a write request from a host at the storage device, the write request including a data chunk and a data identifier (ID); class ID determiner circuitry to generate a class ID value identifying a class ID from the data chunk, wherein the class ID represents the data chunk; and a mapping table to map the data ID to the class ID, wherein the class ID is different from the data ID and a physical address in the storage where the data chunk is stored. 2. The lossy storage device according to claim 1 , wherein: the write request includes a write data; and the storage device further comprises a data chunk circuitry to divide the write data into the data chunk and a second data chunk using a chunk size. 3. The lossy storage device according to claim 2 , wherein the chunk size is associated with one of at least a block size of a block-based storage device or a sector size of a hard disk drive. 4. The lossy storage device according to claim 1 , wherein: an accuracy level is associated with the write request by a machine, the accuracy level representing how similar the data chunk and a representative data chunk for the class ID should be for the data chunk to be assigned to the class ID; and the class ID determiner circuitry is configured to generate the class ID value identifying the class ID from the data chunk within the accuracy level associated with the write request. 5. The lossy storage device according to claim 1 , wherein: the class ID determiner circuitry determines a first confidence level for the class ID; and the storage device further comprises a second class ID determiner to generate a second class ID value identifying a second class ID and a second confidence level from the data chunk, wherein the first confidence level represents a first degree of certainty in the class ID by the class ID determiner circuitry, and wherein the second confidence level represents a second degree of certainty in the second class ID by the second class ID determiner circuitry. 6. The lossy storage device according to claim 5 , further comprising a class ID selector circuitry to select between the class ID and the second class ID using the first confidence level and the second confidence level. 7. The lossy storage device according to claim 1 , further comprising a second mapping table to map the class ID to the physical address in the storage, where the data chunk is stored at the physical address in the storage of the storage device. 8. The lossy storage device according to claim 1 , wherein: the storage includes a representative data chunk assigned to the class ID stored at the physical address; and the storage device further comprises: a persistence policy; and an update circuitry to update the representative data chunk to a second representative data chunk assigned to the class ID stored at the physical address using the persistence policy. 9. The storage device according to claim 1 , wherein the class ID determiner circuitry is configured to generate a second class ID value identifying the class ID from a second data chunk, wherein the second data chunk is different from the data chunk. 10. The lossy storage device according to claim 1 , wherein for each class ID, only one data chunk is stored in the storage. 11. A method, comprising: receiving a write request from a host at a lossy storage device, the write request including a data chunk; generating a class identifier (ID) value identifying a class ID from the data chunk; and storing a mapping from a data ID for the data chunk to the class ID in the lossy storage device, wherein the class ID represents the data chunk and may be a logical representation of where the data is stored on the lossy storage device, wherein the class ID is different from the data ID and a physical address in the storage where the data chunk is stored. 12. The method according to claim 11 , wherein generating the class ID value identifying the class ID from the data chunk includes generating the class ID value identifying the class ID from the data chunk using an accuracy level associated with the write request, the write request including the accuracy level, wherein the accuracy level represents how similar the data chunk and a representative data chunk for the class ID should be for the data chunk to be assigned to the class ID. 13. The method according to claim 11 , wherein generating the class ID value identifying the class ID from the data chunk includes: generating a first class ID value identifying a first class ID from the data chunk using a first classification approach; and generating a second class ID value identifying a second class ID from the data chunk using a second classification approach. 14. The method according to claim 13 , wherein: generating the first class ID value identifying the first class ID from the data chunk using a first classification approach includes determining a first confidence level for the first class ID; and generating the second class ID value identifying the second class ID from the data chunk using a second classification approach includes determining a second confidence level for the second class ID, wherein the first confidence level represents a first degree of certainty in the class ID by the class ID determiner circuitry, and wherein the second confidence level represents a second degree of certainty in the second class ID by the second class ID determiner circuitry. 15. The method according to claim 14 , wherein generating the class ID value identifying the class ID from the data chunk further includes selecting the first class ID based on the first confidence level being greater than the second confidence level. 16. The method according to claim 11 , further comprising: storing the data chunk at the physical address in the lossy storage device; and storing a second mapping from the class ID to the physical address. 17. The method according to claim 11 , wherein generating the class ID value identifying the class ID from the data chunk includes generating the class ID value identifying the class ID from the data chunk using a class ID determiner circuitry, the class ID determiner circuitry including at least one of a similarity function, a difference function, a classifier, or a neural network. 18. The method according to claim 11 , further comprising: receiving a read request from the host at the lossy storage device, the read request including the data ID; mapping the data ID to the class ID; mapping the class ID to the physical address; reading a data at the physical address; and returning the data to the host from the lossy storage device. 19. The method according to claim 11 , further comprising: receiving a delete request from the host at the lossy storage device, the delete request including the data ID; and deleting a mapping from the data ID to the class ID. 20. The method according to claim 19 , further comprising: determining that there is no mapping from a second data ID to the class ID; deleting a mapping from the class ID to the physical address on the lossy storage device; and deleting a data at the physical address on the lossy storage device. 21. An article, comprising a non-transitory storage medium, the non-transitory storage medium having stored thereon instructions that, when executed by a machine, result in: receiving a write request from a h
Single storage device · CPC title
Saving storage space on storage systems · CPC title
Machine learning · CPC title
for multiple virtual address spaces, e.g. segmentation (G06F12/1036 takes precedence) · CPC title
in block erasable memory, e.g. flash memory · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.