What technology area does this patent fall under?

Primary CPC classification G06F16/258. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Automatically executing tasks and configuring access control lists in a data transformation system

US11468083B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11468083-B2
Application number	US-202016915693-A
Country	US
Kind code	B2
Filing date	Jun 29, 2020
Priority date	Dec 28, 2016
Publication date	Oct 11, 2022
Grant date	Oct 11, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented system or process is programmed or configured to use a configuration file to specify one or more tasks to apply to raw ingested data. A task may be a sequence of instructions programmed or configured to format raw ingested data into a dataset in a CSV format. Examples of tasks may include: a parser to parse Cobol data into a CSV, a parser to parse XML into a CSV, a parser to parse text using fixed-width fields to a CSV, a parser to parse files in a zip archive into a CSV, a regular expression search/replace function, or formatting logic to remove lines or blank lines from raw ingested data. In one embodiment, the configuration file may specify a schema definition for a task to use for generating a dataset. In one embodiment, the configuration file may also include one or more access control list (ACL) definitions for the generated dataset. In one embodiment, the building of datasets using the configuration file is automated, for example, on a nightly basis.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: retrieving at least one configuration file, the at least one configuration file comprising: a plurality of different data transformation tasks, each of the tasks denoted with a task identifier that identifies a particular task to apply to a set of input data and associated with task-specific criteria for execution of the particular task; a schema definition for a dataset, wherein the schema definition defines a plurality of columns; receiving an input file that includes an input dataset comprising a single text column with fixed-width fields; in response to receiving the input file, based on reading the at least one configuration file, applying the plurality of different data transformation tasks to the input dataset to generate an output dataset that is formatted differently from the input dataset, wherein the applying comprises using an array of fixed-width values specified in the at least one configuration file to map the fixed-width fields of the input dataset to the output dataset, and wherein the output dataset is formatted according to the task-specific criteria and aligns with the plurality of columns as defined by the schema definition; wherein the method is performed using one or more processors. 2. The method of claim 1 , wherein the output dataset is formatted as a comma separated value (CSV) file. 3. The method of claim 1 , wherein the input file is a text file, and further comprising using the at least one configuration file to apply the particular task to the input dataset, the using comprises: applying a search-and-replace regular expression, specified in the at least one configuration file, to each line of the input dataset. 4. The method of claim 1 , wherein the input file is a COBOL binary file, and further comprising using the at least one configuration file to apply the particular task to the input dataset, the using comprises: using an expected byte size specified in the at least one configuration file to identify a location of a field in the input dataset; and retrieving the field from the input dataset. 5. The method of claim 1 , wherein the input file is an extensible markup language (XML) file, and further comprising using the at least one configuration file to apply the particular task to the input dataset, the using comprises: mapping tagged fields of the XML file to the output dataset. 6. The method of claim 1 , wherein the input file is a zip archive comprising text files, and further comprising using the at least one configuration file to apply the particular task to the input dataset, the using comprises: mapping content of the text files to the output dataset. 7. The method of claim 6 , wherein the zip archive is encrypted, and wherein the using further comprises: decrypting the zip archive. 8. The method of claim 6 , wherein the input file is a text file, and further comprising using the at least one configuration file to apply the particular task to the input dataset, the using comprises: removing all blank lines from the input dataset. 9. The method of claim 6 , wherein the input dataset is formatted as rows of text, and further comprising using the at least one configuration file to apply the particular task to the input dataset, the using comprises: using a first setting specified in the at least one configuration file to identify a number of header rows to remove from the input dataset; and using a second setting specified in the at least one configuration file to identify a number of footer rows to remove from the input dataset. 10. The method of claim 6 , the input dataset being a single-column dataset, and further comprising using the at least one configuration file to transform the single-column dataset into a multi-column dataset that is delimited according to the schema definition. 11. The method of claim 6 , wherein the at least one configuration file further comprises an access control list that defines one or more access control permissions for the dataset, and further comprising, in response to receiving the input file, based on reading the at least one configuration file, determining output access control permissions for the output dataset based on the access control list. 12. One or more non-transitory computer-readable media storing instructions, which when executed by one or more processors cause: retrieving at least one configuration file, the at least one configuration file comprising: a plurality of different data transformation tasks, each of the tasks denoted with a task identifier that identifies a particular task to apply to a set of input data and associated with task-specific criteria for execution of the particular task; a schema definition for a dataset, wherein the schema definition defines a plurality of columns; receiving an input file that includes an input dataset comprising a single text column with fixed-width fields; in response to receiving the input file, based on reading the at least one configuration file, applying the plurality of different data transformation tasks to the input dataset to generate an output dataset that is formatted differently from the input dataset, wherein the applying comprises using an array of fixed-width values specified in the at least one configuration file to map the fixed-width fields of the input dataset to the output dataset, and wherein the output dataset is formatted according to the task-specific criteria and aligns with the plurality of columns as defined by the schema definition. 13. The one or more non-transitory computer-readable media of claim 12 , wherein the input file is a text file, and further comprising using the at least one configuration file to apply the particular task to the input dataset, the using comprises: applying a search-and-replace regular expression, specified in the at least one configuration file, to each line of the input dataset. 14. The one or more non-transitory computer-readable media of claim 12 , wherein the input file is a COBOL binary file, and further comprising using the at least one configuration file to apply the particular task to the input dataset, the using comprises: using an expected byte size specified in the at least one configuration file to identify a location of a field in the input dataset; and retrieving the field from the input dataset. 15. The one or more non-transitory computer-readable media of claim 12 , wherein the input file is an extensible markup language (XML) file, and further comprising using the at least one configuration file to apply the particular task to the input dataset, the using comprises: mapping tagged fields of the XML file to the output dataset. 16. The one or more non-transitory computer-readable media of claim 12 , wherein the input file is a zip archive comprising text files, and further comprising using the at least one configuration file to apply the particular task to the input dataset, the using comprises: mapping content of the text files to the output dataset. 17. The one or more non-transitory computer-readable media of claim 12 , wherein the input dataset is formatted as rows of text, and further comprising using the at least one configuration file to apply the particular task to the input dataset, the using comprises: using a first setting specified in the at least one configuration file to identify a number of header rows to remove from the input dataset; and using a second setting specified in the at least one configuration file to identify a number of footer rows to remove from the input dataset.

Assignees

Palantir Technologies Inc

Inventors

Classifications

G06F16/83
Querying · CPC title
G06F16/258Primary
Data format conversion from or to a database · CPC title
G06F16/254
Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title
G06F16/215
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
G06F16/86
Mapping to a database · CPC title

Patent family

Related publications grouped by family.

View patent family 60942840

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11468083B2 cover?: A computer-implemented system or process is programmed or configured to use a configuration file to specify one or more tasks to apply to raw ingested data. A task may be a sequence of instructions programmed or configured to format raw ingested data into a dataset in a CSV format. Examples of tasks may include: a parser to parse Cobol data into a CSV, a parser to parse XML into a CSV, a parser…
Who is the assignee on this patent?: Palantir Technologies Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/258. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).