Processing of delimiter-separated value (dsv) data
US-2024220726-A1 · Jul 4, 2024 · US
US12499102B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12499102-B2 |
| Application number | US-202318218986-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 6, 2023 |
| Priority date | Jul 6, 2023 |
| Publication date | Dec 16, 2025 |
| Grant date | Dec 16, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An example system for parsing and transforming input data that includes processing circuitry and memory, the memory configured to store the input data. The processing circuitry is configured to determine a first delimiter in the input data. The processing circuitry is configured to determine a plurality of second delimiter hypotheses and parse the input data according to the first delimiter and the plurality of second delimiter hypotheses to generate a plurality of tables that are each associated with a respective one of the plurality of second delimiter hypotheses. The processing circuitry is configured to determine a respective consistency score for each of the plurality of tables and select a table from among the plurality of tables based on the respective consistency score associated with the table. The processing circuitry is configured to format the input data based on the selected table to generate formatted data and output the formatted data.
Opening claim text (preview).
What is claimed is: 1 . A computing system comprising: one or more processors; and one or more memories storing processor-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: obtaining input data; determining a first delimiter within the input data; generating, from the input data and using the first delimiter, an arranged string; determining an N most frequent characters in the arranged string, wherein Nis greater than one; determining a plurality of second delimiter hypotheses, the plurality of second delimiter hypotheses comprising the N most frequent characters in the arranged string, each of the plurality of second delimiter hypotheses comprising a respective potential delimiter to use with the first delimiter; parsing the arranged string using each of the plurality of second delimiter hypotheses; generating, based on parsing the arranged string using each of the plurality of second delimiter hypotheses, a plurality of tables that are each associated with a respective one of the plurality of second delimiter hypotheses; determining a respective consistency score associated with each respective table of the plurality of tables, wherein the respective consistency score is based on at least one of a total number of patterns in the respective table, a total number of tuples in the respective table, a total number of delimiters per pattern in the respective table, a total number of columns or rows in the respective table, or a total number cells defined by the rows and the columns of the respective table filled by values; selecting a table from among the plurality of tables based on the respective consistency score associated with the table; formatting the input data based on the selected table to generate formatted data; and outputting the formatted data. 2 . The computing system of claim 1 , wherein formatting the input data comprises formatting the input data into payload data labeled according to a column and a row of the selected table. 3 . The computing system of claim 1 , wherein selecting the table from among the plurality of tables comprises selecting the table having a highest consistency score among the plurality of tables. 4 . The computing system of claim 1 , wherein the operations further comprise: determining a plurality of child delimiter hypotheses within the input data for a column or a row of the selected table, the column including column data or the row including row data, each of the plurality of child delimiter hypotheses comprising a respective potential child delimiter; parsing the column data or the row data according to each of the plurality of child delimiter hypotheses; generating, based on parsing the column data or the row data, a plurality of child tables, each of the plurality of child tables associated with a respective one of the plurality of child delimiter hypotheses; determining a respective child consistency score for each of the plurality of child delimiter hypotheses; and selecting a child delimiter hypothesis from among the plurality of child delimiter hypotheses or no child delimiter hypothesis for the column or the row based on the respective child consistency score for each of the plurality of child delimiter hypotheses. 5 . The computing system of claim 4 , wherein selecting the child delimiter hypothesis from among the plurality of child delimiter hypotheses or no child delimiter hypothesis for the column or the row comprises selecting no child delimiter hypothesis based on the respective child consistency score for each of the plurality of child delimiter hypotheses being equal to zero. 6 . The computing system of claim 4 , wherein selecting the child delimiter hypothesis from among the plurality of child delimiter hypotheses or no child delimiter hypothesis for the column or the row comprises selecting the child delimiter hypothesis based on the child delimiter hypothesis having a highest consistency score among the respective child consistency score of each of the plurality of child delimiter hypotheses for the column or the row. 7 . The computing system of claim 1 , wherein the first delimiter comprises a row delimiter, the plurality of second delimiter hypotheses comprises a plurality of column delimiter hypotheses, and wherein determining the respective consistency score comprises determining: P ( x , θ ) = 1 k ∑ k = 1 k N k ( M k ( M k + 1 ) ) * ( M col R C filled ) , where P is a function yielding the respective consistency score, x is a block of input text, θ is a hypothetical delimiter applied to the input text, k is a total number of unique patterns found while processing for θ, N k is a total number of tuples, M k is a total number of delimiters per pattern, M col is a total number of columns created and RC filled is a total number of rows and columns filled by values. 8 . The computing system of claim 1 , wherein the operations further comprise: determining a second delimiter hypothesis as a respective one of the plurality of second delimiter hypotheses associated with the selected table; and outputting at least two of the first delimiter, the second delimiter hypothesis, or a child delimiter hypothesis. 9 . The computing system of claim 1 , wherein determining the first delimiter comprises determining that the first delimiter comprises an only potential first delimiter from a plurality of potential first delimiters to appear in input text. 10 . The computing system of claim 1 , wherein determining the first delimiter comprises determining the firs
Ensuring data consistency and integrity · CPC title
Tablespace storage structures; Management thereof · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.