Method and apparatus for traffic probing
US-2024430168-A1 · Dec 26, 2024 · US
US10122783B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10122783-B2 |
| Application number | US-201514944934-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 18, 2015 |
| Priority date | Nov 18, 2015 |
| Publication date | Nov 6, 2018 |
| Grant date | Nov 6, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In order to ingest data from an arbitrary source in a set of sources, a computer system accesses predefined configuration instructions. Then, the computer system generates a dynamic data-ingestion pipeline that is compatible with a Hadoop file system based on the predefined configuration instructions. This dynamic data-ingestion pipeline includes a modular arrangement of operators from a set of operators that includes: an extraction operator for extracting the data of interest from the source, a converter operator for transforming the data, and a quality-checker operator for checking the transformed data. Moreover, the computer system receives the data from the source. Next, the computer system processes the data using the dynamic data-ingestion pipeline as the data is received without storing the data in memory for the purpose of subsequent ingestion processing.
Opening claim text (preview).
What is claimed is: 1. A computer-system-implemented method for ingesting data from a source in a set of sources, the method comprising: accessing predefined configuration instructions; generating a dynamic data-ingestion pipeline compatible with a Hadoop file system based on the predefined configuration instructions, wherein the dynamic data-ingestion pipeline includes a modular arrangement of operators from a set of operators that includes: an extraction operator for extracting the data of interest from the source, a converter operator for transforming the data, and a quality-checker operator for checking the transformed data; receiving the data from the source; and processing the data, by a computer system, using the dynamic data-ingestion pipeline implemented on the computer system by: using the converter operator to transform the received data into a common format compatible with the Hadoop file system; wherein the data is processed in real-time as the data is received, without storing the data in memory for the purpose of subsequent ingestion processing, thereby reducing storage requirements of the computer system. 2. The method of claim 1 , wherein the set of sources is compatible with one of: a database, a message broker, a distributed key-value storage system, a Simple Storage Service (S3) file system, a first file system on a first network accessible via HyperText Transfer Protocol, a second file system on a second network accessible via File Transfer Protocol, and a local file system. 3. The method of claim 1 , wherein the dynamic data-ingestion pipeline egresses the data to a sink in a set of sinks; wherein the sink is compatible with the Hadoop file system; and wherein the set of operators includes a writer operator for outputting the data to the sink. 4. The method of claim 1 , wherein the set of operators includes a publisher operator for outputting the data to an output directory. 5. The method of claim 4 , wherein the publisher operator outputs the data when all of the operators in the dynamic data-ingestion pipeline are successfully completed. 6. The method of claim 4 , wherein the publisher operator outputs the data when a subset of the operators in the dynamic data-ingestion pipeline is successfully completed. 7. The method of claim 1 , wherein the processing of the data using the dynamic data-ingestion pipeline is performed as a batch process. 8. The method of claim 1 , wherein the processing of the data using the dynamic data-ingestion pipeline involves parallel processing of workunits. 9. The method of claim 1 , wherein the quality-checker operator checks one of: a record-level policy, and a task-level policy. 10. An apparatus, comprising: one or more processors; memory; and a program module, wherein the program module is stored in the memory and, during operation of the apparatus, is executed by the one or more processors to ingest data from a source in a set of sources, the program module including: instructions for accessing predefined configuration instructions; instructions for generating a dynamic data-ingestion pipeline compatible with a Hadoop file system based on the predefined configuration instructions, wherein the dynamic data-ingestion pipeline includes a modular arrangement of operators from a set of operators that includes: an extraction operator for extracting the data of interest from the source, a converter operator for transforming the data, and a quality-checker operator for checking the transformed data; instructions for receiving the data from the source; and instructions for processing the data using the dynamic data-ingestion pipeline by: using the converter operator to transform the received data into a common format compatible with the Hadoop file system; wherein the data is processed in real-time as the data is received, without storing the data in memory for the purpose of subsequent ingestion processing, thereby reducing storage requirements on the memory. 11. The apparatus of claim 10 , wherein the set of sources is compatible with one of: a database, a message broker, a distributed key-value storage system, a Simple Storage Service (S3) file system, a first file system on a first network accessible via HyperText Transfer Protocol, a second file system on a second network accessible via File Transfer Protocol, and a local file system. 12. The apparatus of claim 10 , wherein the dynamic data-ingestion pipeline egresses the data to a sink in a set of sinks; wherein the sink is compatible with the Hadoop file system; and wherein the set of operators includes a writer operator for outputting the data to the sink. 13. The apparatus of claim 10 , wherein the set of operators includes a publisher operator for outputting the data to an output directory. 14. The apparatus of claim 13 , wherein the publisher operator outputs the data when all of the operators in the dynamic data-ingestion pipeline are successfully completed. 15. The apparatus of claim 13 , wherein the publisher operator outputs the data when a subset of the operators in the dynamic data-ingestion pipeline is successfully completed. 16. The apparatus of claim 10 , wherein the processing of the data using the dynamic data-ingestion pipeline is performed as a batch process. 17. The apparatus of claim 10 , wherein the processing of the data using the dynamic data-ingestion pipeline involves parallel processing of workunits. 18. The apparatus of claim 10 , wherein the quality-checker operator checks one of: a record-level policy, and a task-level policy. 19. A system, comprising: a processing module comprising one or more processors and a non-transitory computer readable medium storing instructions that, when executed by the one or more processors, cause the system to: access predefined configuration instructions; generate a dynamic data-ingestion pipeline compatible with a Hadoop file system based on the predefined configuration instructions, wherein the dynamic data-ingestion pipeline includes a modular arrangement of operators from a set of operators that includes: an extraction operator for extracting the data of interest from the source, a converter operator for transforming the data, and a quality-checker operator for checking the transformed data; receive the data from a source in a set of sources; and process the data using the dynamic data-ingestion pipeline by: using the converter operator to transform the received data into a common format compatible with the Hadoop file system; wherein the data is processed in real-time as the data is received, without storing the data in memory for the purpose of subsequent ingestion processing, thereby reducing storage requirements of the system. 20. The system of claim 19 , wherein the dynamic data-ingestion pipeline egresses the data to a sink in a set of sinks; wherein the sink is compatible with the Hadoop file system; and wherein the set of operators includes a writer operator for outputting the data to the sink.
Related publications grouped by family.
Answers are generated from the same data shown on this page.