System and method to provide analytical processing of data in a distributed data storage systems

US9594816B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9594816-B2
Application numberUS-201314065248-A
CountryUS
Kind codeB2
Filing dateOct 28, 2013
Priority dateNov 1, 2012
Publication dateMar 14, 2017
Grant dateMar 14, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure in general relates to technologies for processing data in a distributed data storage system, and more particularly, to a method, a system, and a computer program product for analytical processing of data by using the processing power of the distributed data storage system. In one embodiment, a system for analytical processing of data in a distributed data storage system is disclosed. The system comprises: a data extraction module configured to perform analytical operations to extract data from source databases in one or more data formats; and a processing module configured to perform data refinement operations to categorize the data while the data is being extracted. The processing module comprises: a mapping module configured to perform mapping operations of the categorized data; and a transformation module configured to perform an analytical transforming operation of the mapped categorized data to obtain a transformed categorized data.

First claim

Opening claim text (preview).

We claim: 1. A system for analytical processing of data in a distributed data storage system, the system comprising a processor and a memory storing instructions, the instructions comprising: a data extraction module configured to perform one or more elementary analytical operations while extracting data from one or more source databases in one or more data formats, the one or more source databases having one or more types of constraints and structures wherein the one or more elementary analytical operations facilitates condition checking; and a validation module configured to perform a repetitive data sorting operation in one or more stages wherein the repetitive data sorting operation identifies and categorizes the extracted data as a valid data and an invalid data, and stores the valid data and the invalid data in one or more corresponding databases; a processing engine configured to perform one or more data refinement operations while the data is being extracted wherein the processing engine performs one or more data refinement operations in parallel to the one or more elementary analytical operations performed by the data extraction module, the processing engine comprising: a mapping module configured to perform one or more types of mapping operations of the categorized data based on one or more mapping rules wherein the one or more types of mapping operations include data mapping from one table to another table, from two or more data paths and splitting of data into multiple output paths in a single step; and a transformation module configured to perform a secondary analytical transforming operation, based on one or more business rules, of the mapped categorized data to obtain a transformed categorized data, wherein the transformed categorized data is stored in a target area in the distributed data storage system. 2. The system as claimed in claim 1 , wherein the transformed categorized data that is stored in the target area in the distributed data storage system enables at least one of a market based analysis and a predictive data analysis. 3. The system as claimed in claim 1 , wherein the one or more elementary analytical operations comprise one or more algorithm-based analyses. 4. The system as claimed in claim 1 , wherein the one or more source databases comprises at least one of an oracle database and a DB2. 5. The system as claimed in claim 1 , wherein the system further comprises a parsing module configured to parse the extracted data into a custom format. 6. The system as claimed in claim 1 , wherein the data includes at least one or a structured data, a semi-structured data, and an unstructured data. 7. The system as claimed in claim 6 , wherein the data includes at least one of a JSON format, an XML format, and a CSV format. 8. The system as claimed in claim 1 , wherein the condition checking involves keeping a check whether to import entire dataset or only some records after transformation. 9. A method for analytical processing of data in a distributed data storage system, the method being performed by a processor using programmed instructions stored in a memory, the method comprising: performing one or more elementary analytical operations while extracting data from one or more source databases in one or more data formats, the one or more source databases having one or more types of constraints and structures wherein the one or more elementary analytical operations facilitates condition checking; performing a repetitive data sorting operation in one or more stages wherein the repetitive data sorting operation identifies and categorizes the extracted data as a valid data and an invalid data, and stores the valid data and the invalid data in one or more corresponding databases; performing one or more data refinement operations while the data is being extracted wherein the one or more data refinement operations are performed in parallel to the one or more elementary analytical operations; performing one or more types of mapping operations of the categorized data based on one or more mapping rules wherein the one or more types of mapping operations include data mapping from one table to another table, from two or more data paths and splitting of data into multiple output paths in a single step; and performing a secondary analytical transforming operation, based on one or more business rules, of the mapped categorized data to obtain a transformed categorized data, wherein the transformed categorized data is stored in a target area in the distributed data storage system. 10. The method as claimed in claim 9 , wherein the transformed categorized data that is stored in the target area in the distributed data storage system enables at least one of a market based analysis and a predictive data analysis. 11. The method as claimed in claim 9 , wherein the one or more elementary analytical operations comprise one or more algorithm-based analyses. 12. The method as claimed in claim 9 , wherein the one or more source databases comprise at least one of an oracle database and a DB2. 13. The method as claimed in claim 9 , wherein the method further comprises parsing the extracted data into a custom format. 14. The method as claimed in claim 9 , wherein the method further comprises performing a repetitive data sorting operation in one or more stages, wherein the repetitive data sorting operation identifies and categorizes the extracted data as a valid data and an invalid data, and stores the valid data and the invalid data in one or more corresponding databases. 15. The method as claimed in claim 9 , wherein the data includes at least one of a structured data, an unstructured data, and a semi-structured data. 16. The method as claimed in claim 15 , wherein the data include at least one of a JSON format, an XML format, and a CSV format. 17. The method as claimed in claim 9 , wherein the condition checking involves keeping a check whether to import entire dataset or only some records after transformation. 18. A non-transitory computer program product having embodied thereon computer program instructions for analytical processing of data in a distributed data storage system, the instructions comprising instructions for: authenticating and receiving one or more queries from one or more users; performing one or more elementary analytical operations while extracting data from one or more source databases in one or more data formats with respect to the one or more queries, the one or more source databases having one or more types of constraints and structures wherein the one or more elementary analytical operations facilitates condition checking; performing a repetitive data sorting operation in one or more stages wherein the repetitive data sorting operation identifies and categorizes the extracted data as a valid data and an invalid data, and stores the valid data and the invalid data in one or more corresponding databases; performing one or more data refinement operations while the data is being extracted wherein the one or more data refinement operations are performed in parallel to the one or more elementary analytical operations; performing one or more types of mapping operations of the categorized data based on one or more mapping rules wherein the one or more types of mapping operations include data mapping from one table to another table, from two or more data paths and splitting of data into multiple output paths in a single step; and performing a secondary analytical transforming operation, based on one or more business rules, of the mapped c

Assignees

Inventors

Classifications

  • G06F16/254Primary

    Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9594816B2 cover?
The present disclosure in general relates to technologies for processing data in a distributed data storage system, and more particularly, to a method, a system, and a computer program product for analytical processing of data by using the processing power of the distributed data storage system. In one embodiment, a system for analytical processing of data in a distributed data storage system i…
Who is the assignee on this patent?
Tata Consultancy Services Ltd
What technology area does this patent fall under?
Primary CPC classification G06F16/254. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 14 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).