Robust data transfer to cloud and on-premise storage systems
US-2025036515-A1 · Jan 30, 2025 · US
US12393634B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-12393634-B1 |
| Application number | US-202418759330-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jun 28, 2024 |
| Priority date | Jun 28, 2024 |
| Publication date | Aug 19, 2025 |
| Grant date | Aug 19, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are provided for generating an index of a set of vectors as a distributed proximity-based graph data structure comprising representative vectors corresponding to subsets of the set of vectors, storing the set of vectors across a plurality of storage devices of a distributed storage system based on the distributed proximity-based graph data structure, and loading, in response to a vector query, a plurality of subsets of the set of vectors from the distributed storage system based on the distributed proximity-based graph data structure.
Opening claim text (preview).
What is claimed is: 1. A vector management system comprising: a distributed object storage service comprising a plurality of object storage devices; a processing service comprising one or more processors configured to execute specific instructions, the processing service configured to generate a set of vectors from a set of source data; an index service comprising a set of one or more processors configured to execute specific instructions, the index service configured to: generate an index of the set of vectors as a distributed hierarchical proximity-based graph data structure, wherein the distributed hierarchical proximity-based graph data structure comprises: a different fine proximity-based graph data structures for each subset of a plurality of subsets of vectors of the set of vectors; and a course proximity-based graph data structure comprising at least one representative vector for each subset of the plurality of subsets of vectors, wherein the course proximity-based graph data structure links the different fine proximity-based graph data structures to form the distributed hierarchical proximity-based graph data structure; and store the set of vectors across the plurality of object storage devices of the distributed object storage service based on the distributed hierarchical proximity-based graph data structure, wherein a subset of the set of vectors is stored as a leaf object comprising a serialized version of a fine proximity-based graph data structure; and a query service comprising a set of one or more processors configured to execute specific instructions, the query service configured to, in response to receiving a vector query: load a plurality of subsets of the set of vectors from the distributed object storage service based on the distributed hierarchical proximity-based graph data structure; determine, based on evaluation of similarities between a query vector and at least a portion of vectors in the distributed hierarchical proximity-based graph data structure, a plurality of objects to be obtained from the distributed object storage service; and load a plurality of subsets of vectors from the distributed object storage service by loading the plurality of objects from the distributed object storage service at least partially in parallel, wherein a first object of the plurality of objects is obtained from a first subset of storage devices of the distributed object storage service and a second object of the plurality of objects is obtained from a second subset of storage devices of the distributed object storage service. 2. The vector management system of claim 1 , wherein index service is further configured to: cluster the set of vectors into the plurality of subsets of vectors based on evaluation of similarities between vectors of the set of vectors; and generate the different fine proximity-based graph data structures for each subset of the plurality of subsets of vectors. 3. The vector management system of claim 1 , wherein the index service is further configured to: determine a common traversal path through the distributed hierarchical proximity-based graph data structure based on processing of a plurality of vector queries; and generate a storage object comprising vectors in the common traversal path. 4. The vector management system of claim 3 , wherein the index service is further configured to: determine a second common traversal path through the distributed hierarchical proximity-based graph data structure based on processing of the plurality of vector queries; and generate a second storage object comprising vectors in the common traversal path, wherein the second storage object is stored on a different object subset of storage devices of the distributed object storage service than the storage object. 5. A computer-implemented method comprising: under control of a computing system comprising a computer-readable memory and a processor configured to execute specific instructions: clustering a set of vectors into subsets of the set of vectors based on evaluation of similarities between vectors of the set of vectors; generating based on the clustering of the set of vectors, a different bottom-level proximity-based graph data structure for each subset of the subsets of the set of vectors; generating an index of the set of vectors as a distributed proximity-based graph data structure using the different bottom-level proximity-based graph data structures and a top-level proximity-based graph data structure, wherein the distributed proximity-based graph data structure comprises representative vectors corresponding to subsets of the set of vectors, and wherein the top-level proximity-based graph data structure links the different bottom-level proximity-based graph data structures to form a distributed hierarchical proximity-based graph data structure; storing the set of vectors across a plurality of storage devices of a distributed storage system based on the distributed proximity-based graph data structure; and in response to receiving a vector query, loading different subsets of the subsets of the set of vectors from different storage devices of the distributed storage system at least partially in parallel based on the distributed proximity-based graph data structure. 6. The computer-implemented method of claim 5 , further comprising selecting a first representative vector for a first subset of the set of vectors based on a distance of the first representative vector from a centroid or a vector of the first subset of the set of vectors. 7. The computer-implemented method of claim 5 , further comprising generating a first representative vector for a first subset of the set of vectors as one of a centroid or a vector of the first subset of the set of vectors. 8. The computer-implemented method of claim 5 , wherein evaluation of similarities of between vectors comprises determining one of: a cosine similarity, or a Euclidian distance. 9. The computer-implemented method of claim 5 , further comprising: determining a common traversal path through the distributed proximity-based graph data structure based on processing of a plurality of vector queries; and generating a storage object comprising vectors in the common traversal path. 10. The computer-implemented method of claim 9 , further comprising: determining a second common traversal path through the distributed proximity-based graph data structure based on processing of the plurality of vector queries; and generating a second storage object comprising vectors in the common traversal path, wherein the second storage object is stored on a different subset of storage devices of the distributed storage system than the storage object. 11. The computer-implemented method of claim 5 , further comprising: determining, based on evaluation of similarities between a query vector and at least a portion of the representative vectors in the distributed proximity-based graph data structure, a plurality of objects to be obtained from the distributed storage system, wherein loading the plurality of subsets of the set of vectors from the distributed storage system comprises loading the plurality of objects from the distributed storage system at least partly in parallel, and wherein a first object of the plurality of objects is obtained from a first subset of storage devices of a plurality of storage devices of the distributed storage system and a second object of the plurality of objects is obtained from a second subset of storage devices of the plurality of storage devices of the distributed storage system. 12. The computer-implemented method of claim 11 , further comprising: deserializing
Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title
Vectors, bitmaps or matrices · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.