Systems and methods for object to relational mapping extensions
US-8954461-B2 · Feb 10, 2015 · US
US9916313B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9916313-B2 |
| Application number | US-201414339391-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 23, 2014 |
| Priority date | Feb 14, 2014 |
| Publication date | Mar 13, 2018 |
| Grant date | Mar 13, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Data including a text file is received. The text file is arranged in an extensible format and includes a plurality of metadata lines, a header line, and a plurality of content lines. Metadata from the metadata lines is mapped to a plurality of metadata tables in a database that are formed according to a relational database schema using prefix parameters from each metadata line. Content from the content lines is mapped to a plurality of content tables in the database that are formed according to the relational database schema using the header line. A first subset of the content tables have a static structure and a second subset of the content tables have a dynamic structure. Related apparatus, systems, techniques and articles are also described.
Opening claim text (preview).
What is claimed is: 1. A computer implemented method comprising: receiving data comprising a text file, the text file being arranged in an extensible format and comprising a plurality of metadata lines, a header line, and a plurality of content lines; generating, based at least on metadata from the metadata lines, a plurality of metadata tables in a database, the plurality of metadata tables being formed according to a relational database schema, and the metadata from the plurality of metadata lines being mapped, based at least on prefix parameters from each metadata line, to the plurality of metadata tables; generating, based at least on content from the content lines, a plurality of content tables in the database, the plurality of content tables being formed according to the relational database schema, the content from the plurality of content lines being mapped, based at least on the header line, to the plurality of content tables, a first table of the plurality of content tables having a static structure that includes a fixed number of columns for storing content lines mapped to the first table, a second table of the plurality of content tables having a dynamic structure that enables an addition of at least one new column during runtime, the at least one new column accommodating an additional field in data received subsequent to the generation of the plurality of content tables; and performing, based at least on the plurality of metadata tables and/or the plurality of content tables, a database operation with respect to the data comprising the text file, the performance of the database operation comprising adding, to the second table, the at least one new column, the performance of the database operation further comprising generating a corresponding entry in a database log, the database log being used during a recovery to replay one or more operations performed on the data comprising the text file since a last savepoint. 2. The method of claim 1 , wherein the mapping of the content from the content lines to the plurality of content tables comprises mapping gene sequence variations to corresponding reference genome position where the gene sequence variations occur. 3. The method of claim 1 , wherein the text file is arranged according to the genomic variant call format. 4. The method of claim 1 , wherein the generating of the plurality of metadata tables comprises the prefix parameters determining at least one of a metadata table name, number of columns and data types for the columns. 5. The method of claim 1 , wherein the mapping of the metadata from the metadata lines to the plurality of metadata tables comprises: identifying key-value pairs from each metadata line; storing contents from metadata lines that contain only one key-value pair in single key-value pair metadata tables comprising a single column for the keys from all the metadata lines and a corresponding column for values from all the metadata lines; and storing contents from metadata lines that contain multiple key-value pairs in multiple key-value pair metadata tables comprising a column for each unique key, wherein the column is named after the key, and wherein the corresponding values are mapped into the rows of the columns named after the keys. 6. The method of claim 1 , wherein the mapping of the content from the content lines to the plurality of content tables comprises: identifying header parameters from the header line; and generating content tables with the header parameters from the header line defining the content tables names within the relational database; wherein each content table comprises at least a column storing contents associated with the header parameter defining the name of the content table. 7. The method of claim 6 , further comprising: determining that the header parameter has more than one corresponding value; and generating, in the content table, an additional row under the header parameter for each corresponding value, and incrementing a record count by one with each additional row. 8. The method of claim 6 , wherein the header parameters from the header line comprise at least one of a position, identification, alternate, quality and filter parameter. 9. The method of claim 6 , wherein an alternates content table stores reference alleles and corresponding alternate alleles, wherein the alternates content table comprises at least one column storing both the reference allele and corresponding alternate allele at a particular chromosome position, wherein each reference allele and corresponding alternate alleles at a particular chromosome position are mapped into separate rows in the column, wherein a record count value increments by one with each additional row, and wherein an index number, starting at zero to represent the reference allele increments by one for each alternate allele mapped into each additional row, and wherein the alternate content table includes an allele length column. 10. The method of claim 1 , wherein the database comprises a columnar data store storing database tables as sections of columns. 11. The method of claim 1 , wherein the database is an in-memory database storing the metadata tables and the content tables in main memory. 12. The method of claim 1 , further comprising: generating a lookup table, the lookup table providing a mapping between the at least one new column and the keys. 13. A non-transitory computer program product storing instructions which, when executed by at least one data processor forming part of at least one computing system, result in operations comprising: receiving data comprising a text file, the text file being arranged in an extensible format and comprising a plurality of metadata lines, a header line, and a plurality of content lines; generating, based at least on metadata from the metadata lines, a plurality of metadata tables in a database, the plurality of metadata tables being formed according to a relational database schema, and the metadata from the plurality of metadata lines being mapped, based at least on prefix parameters from each metadata line, to the plurality of metadata tables; generating, based at least on content from the content lines, a plurality of content tables in the database, the plurality of content tables being formed according to the relational database schema, the content from the plurality of content lines being mapped, based at least on the header line, to the plurality of content tables, a first table of the plurality of content tables having a static structure that includes a fixed number of columns for storing content lines mapped to the first table, a second table of the plurality of content tables having a dynamic structure that enables an addition of at least one new column during runtime, the at least one new column accommodating an additional field in data received subsequent to the generation of the plurality of content tables; and performing, based at least on the plurality of metadata tables and/or the plurality of content tables, a database operation with respect to the data comprising the text file, the performance of the database operation comprising adding, to the second table, the at least one new column, the performance of the database operation further comprising generating a corresponding entry in a database log, the database log being used during a recovery to replay one or more operations performed on the data comprising the text file since a last savepoint. 14. The non-transitory computer program product as in claim 13 , wherein the mapping of the content from the content lines to the plurality of content tables comprises mapping g
Column-oriented storage; Management thereof · CPC title
Subject matter not provided for in other groups of this subclass · CPC title
File meta data generation · CPC title
ICT programming tools or database systems specially adapted for bioinformatics · CPC title
Mapping to a database · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.