Fast object fingerprints
US-9223840-B2 · Dec 29, 2015 · US
US9830345B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9830345-B1 |
| Application number | US-201615275681-A |
| Country | US |
| Kind code | B1 |
| Filing date | Sep 26, 2016 |
| Priority date | Sep 26, 2016 |
| Publication date | Nov 28, 2017 |
| Grant date | Nov 28, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for storing data in a version control system. One of the methods includes maintaining, in a data store, a tree-structured index of files in which each leaf node stores an entry for each file in a plurality of files for a snapshot that includes a unique file identifier for the respective file; receiving a request for a particular file; generating a hash of a particular file path for the particular file; identifying, using the hash of the particular file path, a leaf node in the tree-structured index that includes an entry for the particular file; identifying, in the leaf node, an entry for the particular file path; obtaining, from the entry, the unique file identifier for the particular file in the data store; and using the unique file identifier for the particular file in response to the request.
Opening claim text (preview).
What is claimed is: 1. A method comprising: maintaining, in a content-addressable data store, a tree-structured index that maps file paths of files in a directory structure to unique file identifiers of files in the content-addressable data store, wherein each node in the tree-structured index is a file in the content-addressable data store, wherein each file of a project snapshot has a respective unique file path in the directory structure of a file system separate from the content-addressable data store and the tree-structured index includes an identifier for each of a plurality of files in the project snapshot, wherein each content-addressable file is stored in the content-addressable data store at a location identified by a unique file identifier that is generated based on contents of the file, wherein the tree-structured index comprises a root node representing a root level of the tree-structured index, a plurality of parent nodes representing N−1 intermediate levels of the tree-structured index, and a plurality of leaf nodes representing a leaf level of the tree-structured index, wherein each non-leaf node stores a plurality of unique file identifiers of nodes descendant from the non-leaf node in the tree-structured index, the plurality of unique file identifiers being stored at corresponding offsets within the non-leaf node, wherein each leaf node stores an entry for each file of one or more files of the project snapshot, wherein each entry for each file comprises a unique file identifier for retrieving the file from the content-addressable data store, and a file path hash comprising N hash characters of a hash of the file path of the file, and wherein each leaf node stores respective entries for files having different file paths and N identical characters in respective file path hashes for the files; receiving a request to retrieve a unique file identifier corresponding to a file path of a particular file in the directory structure of the file system separate from the content-addressable data store; generating a file path hash of the file path; obtaining at least N hash characters from the file path hash of the file path; obtaining, for the project snapshot, a unique file identifier for the root node of the tree-structured index; retrieving, from the content-addressable data store, the root node for the tree-structured index using the unique file identifier for the root node; setting the root node as a current node; performing the following operations, beginning with the root node in the tree-structured index, until retrieving a leaf node in the tree-structured index: obtaining, from the N hash characters of the file path hash, a next hash character of the file path hash, using the obtained next hash character of the file path hash as an offset into the current node to obtain a unique file identifier of a subsequent node at a subsequent level of the tree-structured index, retrieving, from the content-addressable data store, the subsequent node using the unique file identifier, and setting the subsequent node as the current node; identifying, in the leaf node, an entry corresponding to the file path; obtaining, from the entry, a unique file identifier for the particular file in the content-addressable data store; and providing the unique file identifier for the particular file in response to the request. 2. The method of claim 1 , wherein obtaining the at least N hash characters from the file path hash of the file path comprises obtaining N leftmost hash characters from the file path hash of the file path. 3. The method of claim 1 , wherein: maintaining, in the content-addressable data store, the tree-structured index that maps file paths of files in the directory structure to unique file identifiers of files in the content-addressable data store comprises maintaining, in the content-addressable data store, the tree-structured index that maps file paths of files in the directory structure to unique file identifiers of files in the content-addressable data store, wherein each non-leaf node stores a list of a predetermined number of identifiers of nodes descendant from the non-leaf node in the tree-structured index; obtaining the at least N hash characters from the hash of the file path comprises obtaining the at least N hash characters each of which are for a numeral system with a radix comprising the predetermined number; and using the obtained next hash character of the file path hash as an offset into the current node to obtain the unique file identifier of the subsequent node at the subsequent level of the tree-structured index comprises using the obtained next hash character of the file path hash as an offset into the list of the predetermined number of identifiers stored in the current node to obtain the unique file identifier of the subsequent node at the subsequent level of the tree-structured index. 4. The method of claim 1 , wherein each entry in a leaf node comprises the hash of the file path of the file for the file system separate from the content-addressable data store, the file path of the file for the file system separate from the content-addressable data store, and a unique file identifier for the file. 5. The method of claim 1 , wherein each entry in a leaf node comprises only the hash of the file path of the file for the file system separate from the content-addressable data store, and a unique file identifier for the file. 6. The method of claim 1 , wherein receiving the request to retrieve the unique file identifier corresponding to a file path of the particular file comprises receiving the request that identifies i) the unique file identifier for the root node of the tree-structured index for the project snapshot and ii) the file path of the particular file. 7. The method of claim 1 , comprising: generating, for a file from the files of the project snapshot, a file path hash of a file path that identifies a unique location of the file in the file system separate from the content-addressable data store; determining a path through the tree-structured index that identifies a leaf node in the tree-structured index at which to store a unique file identifier for the file using the file path hash, the path identifying one or more non-leaf nodes including the root node in which to store identifiers for subsequent nodes in the path and a leaf node in which to store the unique file identifier for the file; generating the leaf node that includes the unique file identifier for the file; storing, in the content-addressable data store, data representing the contents of the leaf node including the unique file identifier for the file; receiving a leaf node identifier for the leaf node; performing the following operations for each of the non-leaf nodes in the path through the tree-structured index until receiving a root node identifier for the root node in the tree-structured index: generating a subsequent node in the path through the tree-structured index, the subsequent node including a node identifier for the previous node; storing, in the content-addressable data store, data representing the contents of the subsequent node; and receiving a node identifier for the subsequent node; and providing the root node identifier to be used for retrieving the file with the root node identifier and the file path. 8. The method of claim 7 , wherein determining the path through the tree-structured index that identifies a leaf node in the tree-structured index at which to store the unique file identifier for the file using the file path hash comprises selecting N hash characters from the file path hash to use to determine the path through the tree-structured index that identifies a leaf node in the tree-structured index at
Physics · mapped topic
by selection of backup contents · CPC title
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.