Platform and source agnostic data processing for structured and unstructured data sources

US11900149B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11900149-B2
Application numberUS-202117357710-A
CountryUS
Kind codeB2
Filing dateJun 24, 2021
Priority dateJun 24, 2021
Publication dateFeb 13, 2024
Grant dateFeb 13, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Data queries that are agnostic to any particular data source may include a data source alias. The data source alias may be replaced with a data source identifier to obtain a data query configured for a target data source. Data processing jobs may be agnostic to any particular data processing platform. A data processing job may include a data processing task that is agnostic to any particular data processing platform. A code library may provide platform-specific code configured to implement a data processing task on a data processing platform. A data query configured for a particular data source and a data processing task configured for a particular data processing platform may be used to create a data processing job. Configurations that restrict execution of a data processing job to execution via an interactive development environment may be removed to allow its execution directly at the data processing platform itself.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving a platform-agnostic data processing job comprising at least one platform-agnostic data processing task; receiving a data source-agnostic data query comprising a data source alias; creating, for a target data source, a data source-specific data query at least by replacing, using an alias mapping and based on the target data source, the data source alias with a data source identifier that indicates the target data source; obtaining, for a target data processing platform and based on the at least one platform-agnostic data processing task, at least one platform-specific data processing task; creating, for the target data processing platform and using the data source-specific data query and the at least one platform-specific data processing task, a restricted platform-specific data processing job that is restricted to execution within an interactive development environment (IDE) connected to the target data processing platform; obtaining an unrestricted platform-specific data processing job configured for execution outside of the IDE at least by removing, from the restricted platform-specific data processing job, one or more configurations that restrict execution of the restricted platform-specific data processing job to execution within the IDE; and causing execution, outside of the IDE, of the unrestricted platform-specific data processing job at the target data processing platform, wherein the target data processing platform accesses, during execution of the unrestricted platform-specific data processing job and based on the data source-specific data query, the target data source. 2. The computer-implemented method of claim 1 wherein the obtaining the at least one platform-specific data processing task comprises retrieving, from a code library, platform-specific code configured to perform at least one data processing task at the target data processing platform, wherein the code library comprises first code configured to perform the at least one data processing task at a first data processing platform of a first type and second code configured to perform the at least one data processing task at a second data processing platform of a second type that is different from the first type. 3. The computer-implemented method of claim 1 , further comprising causing execution, within the IDE, of the restricted platform-specific data processing job. 4. The computer-implemented method of claim 1 , wherein the removing the one or more configurations from the restricted platform-specific data processing job comprises modifying at least one of the one or more configurations of the restricted platform-specific data processing job. 5. The computer-implemented method of claim 1 , wherein the removing the one or more configurations of the restricted platform-specific data processing job comprises removing, from the restricted platform-specific data processing job, at least one of: at least one formatting element associated with the IDE; at least one input element associated with an input function of the IDE; or at least one output element associated with an output function of the IDE. 6. The computer-implemented method of claim 1 , further comprising storing, in a version control repository, at least one of the platform-agnostic data processing job or the data source-agnostic data query, wherein the version control repository is accessible to each of a data processing job development environment and a data processing job production environment, wherein each of the platform-agnostic processing job and the data source-agnostic data query are received from the version control repository. 7. The computer-implemented method of claim 1 , wherein the creating the platform-specific data processing job comprises creating the platform-specific data processing job further using a view definition that defines, for the target data source, a first data view that corresponds to a second data view of a second data source. 8. The computer-implemented method of claim 1 , wherein the at least one data processing task comprises at least one of an extraction task, a transform task, a formatting task, an encryption task, a tokenization task, and a scheduling task. 9. The computer-implemented method of claim 1 , wherein the alias mapping comprises a first mapping between a first data source alias and a first data source identified of a first data source comprising a data warehouse of structured data and a second mapping between a second alias and a second data source identifier of a second data source comprising a data lake of unstructured data. 10. A system comprising: a first data source comprising a data warehouse of structured data; a second data source comprising a data lake of unstructured data; a code library comprising: first code configured to perform at least one data processing task at a first data processing platform of a first type; and second code configured to perform the at least one data processing task at a second data processing platform of a second type that is different from the first type; one or more processors; and memory storing computer-readable instructions that, when executed by the one or more processors, cause the system to: receive a platform-agnostic data processing job comprising at least one platform-agnostic data processing task; receive a data source-agnostic data query comprising a data source alias; create, for a target data source, a data source-specific data query at least by replacing, using the alias mapping and based on the target data source, the data source alias with a data source identifier that indicates the target data source; obtain, for a target data processing platform and based on the at least one platform-agnostic data processing task, at least one platform-specific data processing task at least by retrieving, from the code library, platform-specific code configured to perform the at least one data processing task at the target data processing platform; create, for the target data processing platform and using the data source-specific data query and the at least one platform-specific data processing task, a restricted platform-specific data processing job that is restricted to execution within an interactive development environment (IDE) connected to the target data processing platform; obtain an unrestricted platform-specific data processing job configured for execution outside of the IDE at least by removing, from the restricted platform-specific data processing job, one or more configurations that restrict execution of the platform-specific data processing job to execution within the IDE, wherein the one or more configurations removed from the platform-specific data processing job comprise at least one of: at least one formatting element associated with the IDE; at least one input element associated with an input function of the IDE; or at least one output element associated with an output function of the IDE; and cause execution, outside of the IDE, of the unrestricted platform-specific data processing job at the target data processing platform, wherein the target data processing platform accesses, during execution of the unrestricted platform-specific data processing job and based on the data source-specific data query, the target data source. 11. The system of claim 10 , wherein the instructions, when executed by the one or more processors, further cause the system to remove the one or more configurations of the restricted platform-specific data processing job at least by modifying at least one of the one or more configurations of the restricted platform-specific data processing j

Assignees

Inventors

Classifications

  • G06F9/48Primary

    Program initiating; Program switching, e.g. by interrupt · CPC title

  • Integrating or interfacing systems involving database management systems · CPC title

  • G06F16/256Primary

    in federated or virtual databases · CPC title

  • Distributed queries · CPC title

  • Access plan code generation and invalidation; Reuse of access plans · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11900149B2 cover?
Data queries that are agnostic to any particular data source may include a data source alias. The data source alias may be replaced with a data source identifier to obtain a data query configured for a target data source. Data processing jobs may be agnostic to any particular data processing platform. A data processing job may include a data processing task that is agnostic to any particular da…
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification G06F9/48. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 13 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).