Who is the assignee on this patent?

Linkedin Corp, Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification H04L67/02. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Nov 06 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Dynamic data-ingestion pipeline

US10122783B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10122783-B2
Application number	US-201514944934-A
Country	US
Kind code	B2
Filing date	Nov 18, 2015
Priority date	Nov 18, 2015
Publication date	Nov 6, 2018
Grant date	Nov 6, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In order to ingest data from an arbitrary source in a set of sources, a computer system accesses predefined configuration instructions. Then, the computer system generates a dynamic data-ingestion pipeline that is compatible with a Hadoop file system based on the predefined configuration instructions. This dynamic data-ingestion pipeline includes a modular arrangement of operators from a set of operators that includes: an extraction operator for extracting the data of interest from the source, a converter operator for transforming the data, and a quality-checker operator for checking the transformed data. Moreover, the computer system receives the data from the source. Next, the computer system processes the data using the dynamic data-ingestion pipeline as the data is received without storing the data in memory for the purpose of subsequent ingestion processing.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-system-implemented method for ingesting data from a source in a set of sources, the method comprising: accessing predefined configuration instructions; generating a dynamic data-ingestion pipeline compatible with a Hadoop file system based on the predefined configuration instructions, wherein the dynamic data-ingestion pipeline includes a modular arrangement of operators from a set of operators that includes: an extraction operator for extracting the data of interest from the source, a converter operator for transforming the data, and a quality-checker operator for checking the transformed data; receiving the data from the source; and processing the data, by a computer system, using the dynamic data-ingestion pipeline implemented on the computer system by: using the converter operator to transform the received data into a common format compatible with the Hadoop file system; wherein the data is processed in real-time as the data is received, without storing the data in memory for the purpose of subsequent ingestion processing, thereby reducing storage requirements of the computer system. 2. The method of claim 1 , wherein the set of sources is compatible with one of: a database, a message broker, a distributed key-value storage system, a Simple Storage Service (S3) file system, a first file system on a first network accessible via HyperText Transfer Protocol, a second file system on a second network accessible via File Transfer Protocol, and a local file system. 3. The method of claim 1 , wherein the dynamic data-ingestion pipeline egresses the data to a sink in a set of sinks; wherein the sink is compatible with the Hadoop file system; and wherein the set of operators includes a writer operator for outputting the data to the sink. 4. The method of claim 1 , wherein the set of operators includes a publisher operator for outputting the data to an output directory. 5. The method of claim 4 , wherein the publisher operator outputs the data when all of the operators in the dynamic data-ingestion pipeline are successfully completed. 6. The method of claim 4 , wherein the publisher operator outputs the data when a subset of the operators in the dynamic data-ingestion pipeline is successfully completed. 7. The method of claim 1 , wherein the processing of the data using the dynamic data-ingestion pipeline is performed as a batch process. 8. The method of claim 1 , wherein the processing of the data using the dynamic data-ingestion pipeline involves parallel processing of workunits. 9. The method of claim 1 , wherein the quality-checker operator checks one of: a record-level policy, and a task-level policy. 10. An apparatus, comprising: one or more processors; memory; and a program module, wherein the program module is stored in the memory and, during operation of the apparatus, is executed by the one or more processors to ingest data from a source in a set of sources, the program module including: instructions for accessing predefined configuration instructions; instructions for generating a dynamic data-ingestion pipeline compatible with a Hadoop file system based on the predefined configuration instructions, wherein the dynamic data-ingestion pipeline includes a modular arrangement of operators from a set of operators that includes: an extraction operator for extracting the data of interest from the source, a converter operator for transforming the data, and a quality-checker operator for checking the transformed data; instructions for receiving the data from the source; and instructions for processing the data using the dynamic data-ingestion pipeline by: using the converter operator to transform the received data into a common format compatible with the Hadoop file system; wherein the data is processed in real-time as the data is received, without storing the data in memory for the purpose of subsequent ingestion processing, thereby reducing storage requirements on the memory. 11. The apparatus of claim 10 , wherein the set of sources is compatible with one of: a database, a message broker, a distributed key-value storage system, a Simple Storage Service (S3) file system, a first file system on a first network accessible via HyperText Transfer Protocol, a second file system on a second network accessible via File Transfer Protocol, and a local file system. 12. The apparatus of claim 10 , wherein the dynamic data-ingestion pipeline egresses the data to a sink in a set of sinks; wherein the sink is compatible with the Hadoop file system; and wherein the set of operators includes a writer operator for outputting the data to the sink. 13. The apparatus of claim 10 , wherein the set of operators includes a publisher operator for outputting the data to an output directory. 14. The apparatus of claim 13 , wherein the publisher operator outputs the data when all of the operators in the dynamic data-ingestion pipeline are successfully completed. 15. The apparatus of claim 13 , wherein the publisher operator outputs the data when a subset of the operators in the dynamic data-ingestion pipeline is successfully completed. 16. The apparatus of claim 10 , wherein the processing of the data using the dynamic data-ingestion pipeline is performed as a batch process. 17. The apparatus of claim 10 , wherein the processing of the data using the dynamic data-ingestion pipeline involves parallel processing of workunits. 18. The apparatus of claim 10 , wherein the quality-checker operator checks one of: a record-level policy, and a task-level policy. 19. A system, comprising: a processing module comprising one or more processors and a non-transitory computer readable medium storing instructions that, when executed by the one or more processors, cause the system to: access predefined configuration instructions; generate a dynamic data-ingestion pipeline compatible with a Hadoop file system based on the predefined configuration instructions, wherein the dynamic data-ingestion pipeline includes a modular arrangement of operators from a set of operators that includes: an extraction operator for extracting the data of interest from the source, a converter operator for transforming the data, and a quality-checker operator for checking the transformed data; receive the data from a source in a set of sources; and process the data using the dynamic data-ingestion pipeline by: using the converter operator to transform the received data into a common format compatible with the Hadoop file system; wherein the data is processed in real-time as the data is received, without storing the data in memory for the purpose of subsequent ingestion processing, thereby reducing storage requirements of the system. 20. The system of claim 19 , wherein the dynamic data-ingestion pipeline egresses the data to a sink in a set of sinks; wherein the sink is compatible with the Hadoop file system; and wherein the set of operators includes a writer operator for outputting the data to the sink.

Assignees

Inventors

Classifications

H04L67/02Primary
based on web technology, e.g. hypertext transfer protocol [HTTP] · CPC title
G06F17/3007
Physics · mapped topic
G06F16/11Primary
File system administration, e.g. details of archiving or snapshots (error detection or correction of the data by redundancy in operations G06F11/14) · CPC title

Patent family

Related publications grouped by family.

View patent family 58690084

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10122783B2 cover?: In order to ingest data from an arbitrary source in a set of sources, a computer system accesses predefined configuration instructions. Then, the computer system generates a dynamic data-ingestion pipeline that is compatible with a Hadoop file system based on the predefined configuration instructions. This dynamic data-ingestion pipeline includes a modular arrangement of operators from a set of…
Who is the assignee on this patent?: Linkedin Corp, Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification H04L67/02. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Nov 06 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).