Sparsity-driven matrix representation to optimize operational and storage efficiency
US-2015113031-A1 · Apr 23, 2015 · US
US10067909B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10067909-B2 |
| Application number | US-201414314750-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 25, 2014 |
| Priority date | Jun 25, 2014 |
| Publication date | Sep 4, 2018 |
| Grant date | Sep 4, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments relate to storing sparse matrices in an in-memory column-oriented database system. Specifically, recent hardware shifts of primary storage from disc into memory, allow execution of linear algebra queries directly in the database engine. Dynamic matrix manipulation operations (like online insertion or deletion of elements) are not covered by most linear algebra frameworks. Therefore a hybrid architecture comprises a read-optimized main structure, and a write-optimized delta structure. The resulting system layout derived from the Compressed Sparse Row (CSR) representation, integrates well with a columnar database design. Moreover, the resulting architecture is amenable to a wide range of non-numerical use cases when dictionary encoding is used. Performance in specific examples is evaluated for dynamic sparse matrix workloads, by applying work flows of nuclear science and network graphs. Embodiments allow performing linear algebra operations on large, sparse matrices commonly associated with scientific computations and analytical business applications.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: causing an engine to store an updatable column representation of data including a table in a main structure and in a delta structure of an in-memory database, the data comprising a matrix defined by an extension of structured query language (SQL) by a matrix stack outside of the engine to include a matrix data type storing matrix dimensions as metadata recognized by a data manipulation primitive of the extension, the data manipulation primitive specifying a shrinking of the matrix dimensions; causing the engine to merge the delta structure into the main structure when a delta exceeds a threshold comprising a fixed percentage of the main structure; causing the engine to perform a reorganization to, improve the compression ratio by reordering rows of the table, and sort columns of the updatable column representation according to values of the table; causing the engine to derive an index according to the sorted columns; causing the engine to store the index within the engine; causing the matrix stack to reference the index to perform an algebraic operation comprising iterative matrix-vector multiplication in an inclusive breadth-first search; causing the engine to store a result of the algebraic operation; and causing the engine to update the delta structure by appending a triple utilizing the extension of SQL. 2. A method as in claim 1 wherein the column representation comprises a Compressed Sparse Row (CSR) representation and a value comprises a row pointer. 3. A method as in claim 1 wherein the matrix is fetched using a topological reference according to an access pattern. 4. A method as in claim 1 wherein the column representation utilizes dictionary encoding. 5. A method as in claim 1 further comprising updating the delta structure utilizing a validity control vector. 6. A non-transitory computer readable storage medium embodying a computer program for performing a method, said method comprising: causing an engine to store an updatable column representation of data including a table in a main structure and in a delta structure of an in-memory database, the data comprising a matrix defined by an extension of structured query language (SQL) by a matrix stack outside of the engine to include a matrix data type storing matrix dimensions as metadata recognized by a data manipulation primitive of the extension, the data manipulation primitive specifying a shrinking of the matrix dimensions; causing the engine to merge the delta structure into the main structure when a delta exceeds a threshold comprising a fixed percentage of the main structure; causing the engine to perform a reorganization to, improve the compression ratio by reordering rows of the table, and sort columns of the updatable column representation according to values of the table; causing the engine to derive an index according to the sorted columns; causing the engine to store the index within the engine; causing the matrix stack to reference the index to perform an algebraic operation comprising iterative matrix-vector multiplication in an inclusive breadth-first search; causing the engine to store a result of the algebraic operation; and causing the engine to update the delta structure by appending a triple utilizing the extension of SQL. 7. A non-transitory computer readable storage medium as in claim 6 wherein the column representation comprises a Compressed Sparse Row (CSR) representation and a value comprises a row pointer. 8. A non-transitory computer readable storage medium as in claim 6 wherein the matrix is fetched using a topological reference according to an access pattern. 9. A non-transitory computer readable storage medium as in claim 6 wherein the column representation utilizes dictionary encoding. 10. A non-transitory computer readable storage medium as in claim 6 further comprising updating the delta structure utilizing a validity control vector. 11. A computer system comprising: one or more processors; a software program, executable on said computer system, the software program configured to: cause an engine to store an updatable column representation of data including a table in a main structure and in a delta structure of an in-memory database, the data comprising a matrix defined by an extension of structured query language (SQL) by a matrix stack outside of the engine to include a matrix data type storing matrix dimensions as metadata recognized by a data manipulation primitive of the extension, the data manipulation primitive specifying a shrinking of the matrix dimensions; cause the engine to merge the delta structure into the main structure when a delta exceeds a threshold comprising a fixed percentage of the main structure; cause the engine to perform a reorganization to, improve the compression ratio by reordering rows of the table, and sort columns of the updatable column representation according to values of the table; cause the engine to derive an index according to the sorted columns; cause the engine to store the index within the engine; cause the engine to reference the matrix stack to perform an algebraic operation; cause the engine to store a result of the algebraic operation comprising iterative matrix-vector multiplication in an inclusive breadth-first search; and cause the engine to update the delta structure by appending a triple utilizing the extension of SQL. 12. A computer system as in claim 11 wherein the column representation comprises a Compressed Sparse Row (CSR) representation and a value comprises a row pointer. 13. A computer system as in claim 11 wherein the matrix is fetched using a topological reference according to an access pattern. 14. A computer system as in claim 11 wherein the column representation utilizes dictionary encoding. 15. A computer system as in claim 11 wherein the engine is further caused to update the delta structure utilizing a validity control vector. 16. A computer system as in claim 11 wherein the engine references a transposed structure of the matrix stack to perform the algebraic operation. 17. A method as in claim 1 wherein the engine references a transposed structure of the matrix stack to perform the algebraic operation. 18. A non-transitory computer readable storage medium as in claim 6 wherein wherein the engine references a transposed structure of the matrix stack to perform the algebraic operation.
Column-oriented storage; Management thereof · CPC title
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
using compression, e.g. sparse files · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.