Content-addressable data storage

US9830345B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9830345-B1
Application numberUS-201615275681-A
CountryUS
Kind codeB1
Filing dateSep 26, 2016
Priority dateSep 26, 2016
Publication dateNov 28, 2017
Grant dateNov 28, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for storing data in a version control system. One of the methods includes maintaining, in a data store, a tree-structured index of files in which each leaf node stores an entry for each file in a plurality of files for a snapshot that includes a unique file identifier for the respective file; receiving a request for a particular file; generating a hash of a particular file path for the particular file; identifying, using the hash of the particular file path, a leaf node in the tree-structured index that includes an entry for the particular file; identifying, in the leaf node, an entry for the particular file path; obtaining, from the entry, the unique file identifier for the particular file in the data store; and using the unique file identifier for the particular file in response to the request.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: maintaining, in a content-addressable data store, a tree-structured index that maps file paths of files in a directory structure to unique file identifiers of files in the content-addressable data store, wherein each node in the tree-structured index is a file in the content-addressable data store, wherein each file of a project snapshot has a respective unique file path in the directory structure of a file system separate from the content-addressable data store and the tree-structured index includes an identifier for each of a plurality of files in the project snapshot, wherein each content-addressable file is stored in the content-addressable data store at a location identified by a unique file identifier that is generated based on contents of the file, wherein the tree-structured index comprises a root node representing a root level of the tree-structured index, a plurality of parent nodes representing N−1 intermediate levels of the tree-structured index, and a plurality of leaf nodes representing a leaf level of the tree-structured index, wherein each non-leaf node stores a plurality of unique file identifiers of nodes descendant from the non-leaf node in the tree-structured index, the plurality of unique file identifiers being stored at corresponding offsets within the non-leaf node, wherein each leaf node stores an entry for each file of one or more files of the project snapshot, wherein each entry for each file comprises a unique file identifier for retrieving the file from the content-addressable data store, and a file path hash comprising N hash characters of a hash of the file path of the file, and wherein each leaf node stores respective entries for files having different file paths and N identical characters in respective file path hashes for the files; receiving a request to retrieve a unique file identifier corresponding to a file path of a particular file in the directory structure of the file system separate from the content-addressable data store; generating a file path hash of the file path; obtaining at least N hash characters from the file path hash of the file path; obtaining, for the project snapshot, a unique file identifier for the root node of the tree-structured index; retrieving, from the content-addressable data store, the root node for the tree-structured index using the unique file identifier for the root node; setting the root node as a current node; performing the following operations, beginning with the root node in the tree-structured index, until retrieving a leaf node in the tree-structured index: obtaining, from the N hash characters of the file path hash, a next hash character of the file path hash, using the obtained next hash character of the file path hash as an offset into the current node to obtain a unique file identifier of a subsequent node at a subsequent level of the tree-structured index, retrieving, from the content-addressable data store, the subsequent node using the unique file identifier, and setting the subsequent node as the current node; identifying, in the leaf node, an entry corresponding to the file path; obtaining, from the entry, a unique file identifier for the particular file in the content-addressable data store; and providing the unique file identifier for the particular file in response to the request. 2. The method of claim 1 , wherein obtaining the at least N hash characters from the file path hash of the file path comprises obtaining N leftmost hash characters from the file path hash of the file path. 3. The method of claim 1 , wherein: maintaining, in the content-addressable data store, the tree-structured index that maps file paths of files in the directory structure to unique file identifiers of files in the content-addressable data store comprises maintaining, in the content-addressable data store, the tree-structured index that maps file paths of files in the directory structure to unique file identifiers of files in the content-addressable data store, wherein each non-leaf node stores a list of a predetermined number of identifiers of nodes descendant from the non-leaf node in the tree-structured index; obtaining the at least N hash characters from the hash of the file path comprises obtaining the at least N hash characters each of which are for a numeral system with a radix comprising the predetermined number; and using the obtained next hash character of the file path hash as an offset into the current node to obtain the unique file identifier of the subsequent node at the subsequent level of the tree-structured index comprises using the obtained next hash character of the file path hash as an offset into the list of the predetermined number of identifiers stored in the current node to obtain the unique file identifier of the subsequent node at the subsequent level of the tree-structured index. 4. The method of claim 1 , wherein each entry in a leaf node comprises the hash of the file path of the file for the file system separate from the content-addressable data store, the file path of the file for the file system separate from the content-addressable data store, and a unique file identifier for the file. 5. The method of claim 1 , wherein each entry in a leaf node comprises only the hash of the file path of the file for the file system separate from the content-addressable data store, and a unique file identifier for the file. 6. The method of claim 1 , wherein receiving the request to retrieve the unique file identifier corresponding to a file path of the particular file comprises receiving the request that identifies i) the unique file identifier for the root node of the tree-structured index for the project snapshot and ii) the file path of the particular file. 7. The method of claim 1 , comprising: generating, for a file from the files of the project snapshot, a file path hash of a file path that identifies a unique location of the file in the file system separate from the content-addressable data store; determining a path through the tree-structured index that identifies a leaf node in the tree-structured index at which to store a unique file identifier for the file using the file path hash, the path identifying one or more non-leaf nodes including the root node in which to store identifiers for subsequent nodes in the path and a leaf node in which to store the unique file identifier for the file; generating the leaf node that includes the unique file identifier for the file; storing, in the content-addressable data store, data representing the contents of the leaf node including the unique file identifier for the file; receiving a leaf node identifier for the leaf node; performing the following operations for each of the non-leaf nodes in the path through the tree-structured index until receiving a root node identifier for the root node in the tree-structured index: generating a subsequent node in the path through the tree-structured index, the subsequent node including a node identifier for the previous node; storing, in the content-addressable data store, data representing the contents of the subsequent node; and receiving a node identifier for the subsequent node; and providing the root node identifier to be used for retrieving the file with the root node identifier and the file path. 8. The method of claim 7 , wherein determining the path through the tree-structured index that identifies a leaf node in the tree-structured index at which to store the unique file identifier for the file using the file path hash comprises selecting N hash characters from the file path hash to use to determine the path through the tree-structured index that identifies a leaf node in the tree-structured index at

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9830345B1 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for storing data in a version control system. One of the methods includes maintaining, in a data store, a tree-structured index of files in which each leaf node stores an entry for each file in a plurality of files for a snapshot that includes a unique file identifier for the respective file; receivi…
Who is the assignee on this patent?
Semmle Ltd
What technology area does this patent fall under?
Primary CPC classification G06F17/3033. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).