Efficient Reference Counting in Content Addressable Storage

US2016170987A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016170987-A1
Application numberUS-201615051612-A
CountryUS
Kind codeA1
Filing dateFeb 23, 2016
Priority dateJul 26, 2013
Publication dateJun 16, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A process manages database storage. The process receives a first object comprising one or more content chunks. The first object is identified by a unique object ID and each content chunk has a unique offset within the first object. For each chunk, the process inserts a record into a reference table. The record includes a content hash and the object ID. The process stores each of the chunks in content storage. Later, the process obtains a request to delete a first chunk from storage. The first chunk has a corresponding first content hash. The process determines whether the reference table includes a reference record corresponding to the first content hash. When the reference table does not include any reference records corresponding to the first content hash, the process deletes the first chunk. When the reference table includes a corresponding reference record, the process does not delete the first chunk.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of managing database storage, comprising: at a database storage system having one or more processors and memory: receiving for storage a first object comprising one or more content chunks, wherein the first object is identified by a unique object ID and each content chunk of the one or more content chunks has a unique offset within the first object; for each respective content chunk of the one or more content chunks, inserting a respective reference record into a reference table, wherein the respective reference record includes a respective content hash and the unique object ID; storing each of the one or more content chunks in content storage within the database storage system; obtaining a request to delete a first content chunk from the content storage, wherein the first content chunk has a corresponding first content hash; determining whether the reference table includes at least one reference record corresponding to the first content hash; in accordance with a determination that the reference table does not include at least one reference record corresponding to the first content hash, deleting the first content chunk from the content storage; and in accordance with a determination that the reference table includes at least one reference record corresponding to the first content hash, forgoing deleting object content corresponding to the first content chunk. 2 . The method of claim 1 , wherein the reference table is partitioned into a plurality of distinct shards, and wherein inserting a respective reference record into the reference table comprises: computing a respective prefix that is based, at least in part, on the object ID; and inserting the respective reference record into a reference table shard corresponding to the respective prefix. 3 . The method of claim 2 , wherein the object ID is an integer, the number of distinct shards is a positive integer N, and the respective prefix is object ID (mod N). 4 . The method of claim 1 , wherein inserting the respective reference record into the reference table further comprises computing a respective prefix that is based, at least in part, on the object ID; and wherein the respective reference record includes the respective prefix. 5 . The method of claim 4 , wherein the respective reference record comprises a concatenation of the respective prefix, the respective content hash, and the respective object ID, in that order. 6 . The method of claim 1 , wherein each respective content hash is computed using a hash function whose output is a fixed size integer. 7 . The method of claim 1 , wherein each respective content hash includes a sampling of content from the respective content chunk. 8 . The method of claim 1 , wherein each respective reference record includes the offset of the respective content chunk within the first object when the offset is greater than 0. 9 . The method of claim 1 , wherein the database storage system is a distributed database that comprises a plurality of instances, at least some of which are at distinct geographic locations, and each instance has its own distinct local content storage. 10 . The method of claim 9 , wherein each instance has its own distinct local content index and local reference table. 11 . The method of claim 1 , further comprising: modifying the set of one or more locations where a first content chunk is stored in the content storage; and updating an entry in the content index that uniquely corresponds to the first content chunk, thereby specifying the modified set of one or more locations where the first content chunk is stored in the content storage. 12 . A computer system for managing database storage, comprising: one or more processors; memory; content storage and a reference table both stored in the memory, wherein the reference table stores references to each content chunk; and one or more programs stored in the memory, the one or more programs comprising instructions executable by the one or more processors for: receiving for storage a first object comprising one or more content chunks, wherein the first object is identified by a unique object ID and each content chunk of the one or more content chunks has a unique offset within the first object; for each respective content chunk of the one or more content chunks, inserting a respective reference record into a reference table, wherein the respective reference record includes a respective content hash and the unique object ID; storing each of the one or more content chunks in content storage within the database storage system; obtaining a request to delete a first content chunk from the content storage, wherein the first content chunk has a corresponding first content hash; determining whether the reference table includes at least one reference record corresponding to the first content hash; in accordance with a determination that the reference table does not include at least one reference record corresponding to the first content hash, deleting the first content chunk from the content storage; and in accordance with a determination that the reference table includes at least one reference record corresponding to the first content hash, forgoing deleting object content corresponding to the first content chunk. 13 . The computer system of claim 12 , wherein the reference table is partitioned into a plurality of distinct shards, and wherein the instructions for inserting a respective reference record into the reference table further comprise instructions for: computing a respective prefix that is based, at least in part, on the object ID; and inserting the respective reference record into a reference table shard corresponding to the respective prefix. 14 . The computer system of claim 12 , wherein the instructions for inserting the respective reference record into the reference table further comprise instructions for computing a respective prefix that is based, at least in part, on the object ID; and wherein the respective reference record includes the respective prefix. 15 . The computer system of claim 14 , wherein the respective reference record comprises a concatenation of the respective prefix, the respective content hash, and the respective object ID, in that order. 16 . The computer system of claim 12 , wherein each respective reference record includes the offset of the respective content chunk within the first object when the offset is greater than 0. 17 . A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a computer system to manage database storage in a database storage system, wherein the database storage system has content storage, a content index that identifies content chunks, and a reference table that stores references to each content chunk, and wherein the one or more programs comprise instructions for: receiving for storage a first object comprising one or more content chunks, wherein the first object is identified by a unique object ID and each content chunk of the one or more content chunks has a unique offset within the first object; for each respective content chunk of the one or more content chunks, inserting a respective reference record into a reference table, wherein the respective reference record includes a respective content hash and the unique object ID; storing each of the one or more content chunks in content storage within the database storage system; obtaining a request to delete a first content chunk from the c

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016170987A1 cover?
A process manages database storage. The process receives a first object comprising one or more content chunks. The first object is identified by a unique object ID and each content chunk has a unique offset within the first object. For each chunk, the process inserts a record into a reference table. The record includes a content hash and the object ID. The process stores each of the chunks in c…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06F17/3033. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 16 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).