Data processing method, data processing apparatus, and non-transitory computer-readable storage medium
US-2024320235-A1 · Sep 26, 2024 · US
US9594816B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9594816-B2 |
| Application number | US-201314065248-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 28, 2013 |
| Priority date | Nov 1, 2012 |
| Publication date | Mar 14, 2017 |
| Grant date | Mar 14, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure in general relates to technologies for processing data in a distributed data storage system, and more particularly, to a method, a system, and a computer program product for analytical processing of data by using the processing power of the distributed data storage system. In one embodiment, a system for analytical processing of data in a distributed data storage system is disclosed. The system comprises: a data extraction module configured to perform analytical operations to extract data from source databases in one or more data formats; and a processing module configured to perform data refinement operations to categorize the data while the data is being extracted. The processing module comprises: a mapping module configured to perform mapping operations of the categorized data; and a transformation module configured to perform an analytical transforming operation of the mapped categorized data to obtain a transformed categorized data.
Opening claim text (preview).
We claim: 1. A system for analytical processing of data in a distributed data storage system, the system comprising a processor and a memory storing instructions, the instructions comprising: a data extraction module configured to perform one or more elementary analytical operations while extracting data from one or more source databases in one or more data formats, the one or more source databases having one or more types of constraints and structures wherein the one or more elementary analytical operations facilitates condition checking; and a validation module configured to perform a repetitive data sorting operation in one or more stages wherein the repetitive data sorting operation identifies and categorizes the extracted data as a valid data and an invalid data, and stores the valid data and the invalid data in one or more corresponding databases; a processing engine configured to perform one or more data refinement operations while the data is being extracted wherein the processing engine performs one or more data refinement operations in parallel to the one or more elementary analytical operations performed by the data extraction module, the processing engine comprising: a mapping module configured to perform one or more types of mapping operations of the categorized data based on one or more mapping rules wherein the one or more types of mapping operations include data mapping from one table to another table, from two or more data paths and splitting of data into multiple output paths in a single step; and a transformation module configured to perform a secondary analytical transforming operation, based on one or more business rules, of the mapped categorized data to obtain a transformed categorized data, wherein the transformed categorized data is stored in a target area in the distributed data storage system. 2. The system as claimed in claim 1 , wherein the transformed categorized data that is stored in the target area in the distributed data storage system enables at least one of a market based analysis and a predictive data analysis. 3. The system as claimed in claim 1 , wherein the one or more elementary analytical operations comprise one or more algorithm-based analyses. 4. The system as claimed in claim 1 , wherein the one or more source databases comprises at least one of an oracle database and a DB2. 5. The system as claimed in claim 1 , wherein the system further comprises a parsing module configured to parse the extracted data into a custom format. 6. The system as claimed in claim 1 , wherein the data includes at least one or a structured data, a semi-structured data, and an unstructured data. 7. The system as claimed in claim 6 , wherein the data includes at least one of a JSON format, an XML format, and a CSV format. 8. The system as claimed in claim 1 , wherein the condition checking involves keeping a check whether to import entire dataset or only some records after transformation. 9. A method for analytical processing of data in a distributed data storage system, the method being performed by a processor using programmed instructions stored in a memory, the method comprising: performing one or more elementary analytical operations while extracting data from one or more source databases in one or more data formats, the one or more source databases having one or more types of constraints and structures wherein the one or more elementary analytical operations facilitates condition checking; performing a repetitive data sorting operation in one or more stages wherein the repetitive data sorting operation identifies and categorizes the extracted data as a valid data and an invalid data, and stores the valid data and the invalid data in one or more corresponding databases; performing one or more data refinement operations while the data is being extracted wherein the one or more data refinement operations are performed in parallel to the one or more elementary analytical operations; performing one or more types of mapping operations of the categorized data based on one or more mapping rules wherein the one or more types of mapping operations include data mapping from one table to another table, from two or more data paths and splitting of data into multiple output paths in a single step; and performing a secondary analytical transforming operation, based on one or more business rules, of the mapped categorized data to obtain a transformed categorized data, wherein the transformed categorized data is stored in a target area in the distributed data storage system. 10. The method as claimed in claim 9 , wherein the transformed categorized data that is stored in the target area in the distributed data storage system enables at least one of a market based analysis and a predictive data analysis. 11. The method as claimed in claim 9 , wherein the one or more elementary analytical operations comprise one or more algorithm-based analyses. 12. The method as claimed in claim 9 , wherein the one or more source databases comprise at least one of an oracle database and a DB2. 13. The method as claimed in claim 9 , wherein the method further comprises parsing the extracted data into a custom format. 14. The method as claimed in claim 9 , wherein the method further comprises performing a repetitive data sorting operation in one or more stages, wherein the repetitive data sorting operation identifies and categorizes the extracted data as a valid data and an invalid data, and stores the valid data and the invalid data in one or more corresponding databases. 15. The method as claimed in claim 9 , wherein the data includes at least one of a structured data, an unstructured data, and a semi-structured data. 16. The method as claimed in claim 15 , wherein the data include at least one of a JSON format, an XML format, and a CSV format. 17. The method as claimed in claim 9 , wherein the condition checking involves keeping a check whether to import entire dataset or only some records after transformation. 18. A non-transitory computer program product having embodied thereon computer program instructions for analytical processing of data in a distributed data storage system, the instructions comprising instructions for: authenticating and receiving one or more queries from one or more users; performing one or more elementary analytical operations while extracting data from one or more source databases in one or more data formats with respect to the one or more queries, the one or more source databases having one or more types of constraints and structures wherein the one or more elementary analytical operations facilitates condition checking; performing a repetitive data sorting operation in one or more stages wherein the repetitive data sorting operation identifies and categorizes the extracted data as a valid data and an invalid data, and stores the valid data and the invalid data in one or more corresponding databases; performing one or more data refinement operations while the data is being extracted wherein the one or more data refinement operations are performed in parallel to the one or more elementary analytical operations; performing one or more types of mapping operations of the categorized data based on one or more mapping rules wherein the one or more types of mapping operations include data mapping from one table to another table, from two or more data paths and splitting of data into multiple output paths in a single step; and performing a secondary analytical transforming operation, based on one or more business rules, of the mapped c
Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.