Techniques for data extraction

US10133782B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10133782-B2
Application numberUS-201615225437-A
CountryUS
Kind codeB2
Filing dateAug 1, 2016
Priority dateAug 1, 2016
Publication dateNov 20, 2018
Grant dateNov 20, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Computer-implemented techniques for data extraction are described. The techniques include a method and system for retrieving an extraction job specification, wherein the extraction job specification comprises a source repository identifier that identifies a source repository comprising a plurality of data records; a data recipient identifier that identifies a data recipient; and a schedule that indicates a timing of when to retrieve the plurality of data records. The method and system further include retrieving the plurality of data records from the source repository based on the schedule, creating an extraction transaction from the plurality of data records, wherein the extraction transaction comprises a subset of the plurality of data records and metadata, and sending the extraction transaction to the data recipient.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method, comprising: retrieving an extraction job specification from an extraction job specification repository implemented on a first computing device, wherein the extraction job specification is defined in one or more configuration files, and wherein the extraction job specification comprises: a source repository identifier that identifies a source repository comprising a plurality of data records; a data recipient identifier that identifies a data recipient; a schedule that indicates a timing of when to retrieve the plurality of data records; wherein the one or more configuration files are implemented in one or more of: extensible markup language (XML), YAML Ain't Markup Language (YAML), JavaScript Object Notation (JSON), and/or a markup language; using the extraction job specification, retrieving, by a second computing device, the plurality of data records from the source repository based on the schedule; using the extraction job specification, creating, by the second computing device, an extraction transaction from the plurality of data records, wherein the extraction transaction comprises a subset of the plurality of data records and metadata; sending, by the second computing device, the extraction transaction to the data recipient; and wherein the method is performed using one or more processors. 2. The method of claim 1 wherein the extraction job specification further comprises an inline processor and the method further comprises, using the extraction job specification, applying the inline processor to the plurality of data records before creating the extraction transaction from the plurality of data records. 3. The method of claim 2 wherein the inline processor comprises instructions that specify one or more processes for filtering the plurality of data records. 4. The method of claim 3 wherein the instructions for filtering the plurality of data records comprise one or more regular expressions. 5. The method of claim 3 wherein the instructions for filtering the plurality of data records comprise a structured query language (SQL) expression. 6. The method of claim 2 wherein the inline processor comprises instructions that specify one or more processes for grouping a subset of the plurality of data records into a single transaction. 7. The method of claim 1 wherein the extraction job specification further comprises a completion strategy data processor and the method further comprises: using the extraction job specification, applying the completion strategy data processor to the plurality of data records after sending the extraction transaction to the data recipient. 8. The method of claim 7 wherein the completion strategy data processor comprises instructions which when execute cause performing one or more of: deleting the plurality of data records; encrypting the plurality of data records; and moving the plurality of data records to a storage location. 9. The method of claim 1 wherein the extraction job specification comprises a dynamic link library (DLL), Java Archive (JAR) file, or a device driver for accessing the data recipient. 10. A computer system, comprising: one or more digital data storage media; one or more processors that are communicatively coupled to the storage media; one or more programs stored in the storage media and configured for execution by the one or more processors, the one or more programs comprising instructions which when executed using the one or more processors cause the one or more processors to perform: retrieving an extraction job specification from an extraction job specification repository implemented on a first computing device, wherein the extraction job specification is defined in one or more configuration files, and wherein the extraction job specification comprises: a source repository identifier that identifies a source repository comprising a plurality of data records; a data recipient identifier that identifies a data recipient; a schedule that indicates a timing of when to retrieve the plurality of data records; wherein the one or more configuration files are implemented in one or more of: extensible markup language (XML), YAML Ain′t Markup Language (YAML), JavaScript Object Notation (JSON), and/or a markup language; using the extraction job specification, retrieving, by a second computing device, the plurality of data records from the source repository based on the schedule; using the extraction job specification, creating, by the second computing device, an extraction transaction from the plurality of data records, wherein the extraction transaction comprises a subset of the plurality of data records and metadata; and sending, by the second computing device, the extraction transaction to the data recipient. 11. The system of claim 10 wherein the extraction job specification further comprises an inline processor and the instructions further comprise instructions which when executed cause, using the extraction job specification, applying the inline processor to the plurality of data records before creating the extraction transaction from the plurality of data records. 12. The system of claim 11 wherein the inline processor comprises additional instructions that specify one or more processes for filtering the plurality of data records. 13. The system of claim 12 wherein the additional instructions for filtering the plurality of data records comprise one or more regular expressions. 14. The system of claim 12 wherein the additional instructions for filtering the plurality of data records comprise a structured query language (SQL) expression. 15. The system of claim 11 wherein the inline processor comprises additional instructions for grouping a subset of the plurality of data records into a single transaction. 16. The system of claim 10 wherein the extraction job specification further comprises a completion strategy data processor and the instructions further comprise instructions which when executed cause, using the extraction job specification, applying the completion strategy data processor to the plurality of data records after sending the extraction transaction to the data recipient. 17. The system of claim 16 wherein the completion strategy data processor comprises additional instructions which when executed cause performing one or more of: deleting the plurality of data records; encrypting the plurality of data records; and moving the plurality of data records to a storage location. 18. The system of claim 10 wherein the extraction job specification comprises a dynamic link library (DLL), Java Archive (JAR) file, or a device driver for accessing the data recipient.

Assignees

Inventors

Classifications

  • G06F16/254Primary

    Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title

  • Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries · CPC title

  • Query execution · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10133782B2 cover?
Computer-implemented techniques for data extraction are described. The techniques include a method and system for retrieving an extraction job specification, wherein the extraction job specification comprises a source repository identifier that identifies a source repository comprising a plurality of data records; a data recipient identifier that identifies a data recipient; and a schedule that…
Who is the assignee on this patent?
Palantir Technologies Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/254. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).