Managing data ingestion

US9870411B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9870411-B2
Application numberUS-201414557347-A
CountryUS
Kind codeB2
Filing dateDec 1, 2014
Priority dateJul 15, 2014
Publication dateJan 16, 2018
Grant dateJan 16, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present invention extends to methods, systems, and computer program products for managing data ingestion. Aspects of the invention include a pluggable architecture channel service (e.g., a push/pull channel service) to ingest raw data. Aspects of the invention also include a pluggable architecture formatter to convert ingested raw data into a common format, such as, for example, key value pairs. Aspects of the invention also include an EAV storage with functionality allowing consumers to define multiple entities on (and spanning) ingested data sets. Accordingly, data can be ingested without data loss, without having to define extraction logic, and without having to define a storage schema.

First claim

Opening claim text (preview).

What is claimed: 1. At a computer system, a method for supplementing a data consumer defined data entity with additional data from a new data source, the data consumer defined data entity spanning one or more ingested data sets from one or more data sources, the method comprising: receiving a new data set from the new data source, the new data set in a raw data format used by the new data source, the new data source in addition to the one or more data sources; responsive to receiving the new data set from the new data source: ingesting the new data set in the raw data format, ingesting the new-data set including utilizing a combined access mechanism and security context matched to the new data source; converting the new data set into a common format using a formatting plug-in configured to understand the raw data format; and storing new data set into storage, the storage including the one or more ingested data sets, the one or more ingested data sets having been previously formatted into the common format from one or more other raw data formats used by the one or more data sources, the one or more data sets previously ingested from the one or more data sources using combined access mechanisms and security contexts matched to each of the one or more data sources; and applying a schema to stored data in the common format, the stored data from the new ingested data set and the one or more ingested data sets, the stored data associated with data consumer selected attributes included in the data consumer defined data entity. 2. The method of claim 1 , further comprising: receiving a further data set from a further data source, the further data set in a further raw data format used by the further data source, the further raw data format differing from the raw data format; ingesting the further data set in the second raw data format, ingesting the second raw data including using a further combined access mechanism and a security context matched to the further data source the further combined access mechanism and security context differing from the combined access mechanism and security context; converting the further data set into the common format using a further formatting plug-in configured to understand the further raw data format; and supplementing the stored data by storing the further data set into the storage along with the new data set. 3. The method of claim 2 , wherein the first raw data format is eXstensbile Markup Language (XML) and the second raw data format is Character Separated Value (CSV). 4. The method of claim 1 , wherein storing the new data set into storage comprises storing the new data set into an entity-attribute-value data set. plurality of other entity attribute value data sets. 5. The method of claim 4 , further comprising enriching the entity-attribute-value data set with additional data from a pluggable enrichment service. 6. The method of claim 1 , further comprising: receiving the consumer selection of attributes indicating that the consumer defined data entity is to span a plurality of data sets. 7. The method of claim 6 , further comprising formulating the schema, the schema defining a data layout for returning data associated with the data consumer defined entity. 8. The method of claim 7 , further comprising: receiving an application request for data associated with the data consumer defined data entity; and returning the requested data from the storage to the application in the defined data layout in accordance with the schema. 9. A computer program product for use at a computer system, the computer program product for implementing a method for supplementing a data consumer defined data entity with additional data from a new data source, the data consumer defined data entity spanning one or more ingested data sets from one or more data sources, the computer program product comprising one or more computer storage media having stored thereon computer-executable instructions that, when executed at a processor, cause the computer system to perform the method, including the following: receive a new data set from the new data source, the new data set in a raw data format used by the new data source, the new data source in addition to the one or more data sources; responsive to receiving the new data set from the new data source: ingest the new data set in the raw data format, ingesting the new data set including utilizing a combined access mechanism and security context matched to the new data source; convert the new data set into a common format using a formatting plug-in configured to understand the raw data formate; and store new data set into storage, the storage including the one or more ingested data sets, the one or more ingested data sets having been previously formatted into the common format from one or more other raw data formats used by the one or more data sources, the one or more data sets previously ingested from the one or more data sources using combined access mechanisms and security contexts matched to each of the one or more data sources; and apply a schema to stored data in the common format, the stored data from the new ingested data set and the one or more ingested data sets, the stored data associated with data consumer selected attributes included in the data consumer defined data entity. 10. The computer program product of claim 9 , further comprising computer-executable instructions that, when executed, cause the computer system to: receive a further data set from a further data source, the further data set in a further raw data format used by the further data source, the further raw data format differing from the raw data format; ingest the further data set in the second raw data format, ingesting the second raw data including using a further combined access mechanism and a security context matched to the further data source the further combined access mechanism and security context differing from the combined access mechanism and security context; convert the further data set into the common format using a further formatting plug-in configured to understand the further raw data format; and supplement the stored data by storing the further data set into the storage along with the new data set. 11. The computer program product of claim 9 , wherein computer-executable instructions that, when executed, cause the computer system to store the data set into storage comprise computer-executable instructions that, when executed, cause the computer system to store the data set into an entity-attribute-value data set. 12. The computer program product of claim 11 , further comprising computer-executable instructions that, when executed, cause the computer system to enrich the entity-attribute-value data set with additional data from a pluggable enrichment service. 13. The computer program product of claim 11 , further comprising computer-executable instructions that, when executed, cause the computer system to: receive the consumer selection of attributes indicating that the consumer defined data entity is to span a plurality of data sets. 14. The computer program product of claim 13 , further computer-executable instructions that, when executed, cause the computer system to formulate the schema, the schema defining a data layout for returning data associated with the data consumer defined entity. 15. The computer program product of claim 14 , further comprising computer-executable instructions that, when executed, cause the computer system to: receive an application request for data associated with the data consumer defined data entity; and return

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Data format conversion from or to a database · CPC title

  • G06F16/254Primary

    Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9870411B2 cover?
The present invention extends to methods, systems, and computer program products for managing data ingestion. Aspects of the invention include a pluggable architecture channel service (e.g., a push/pull channel service) to ingest raw data. Aspects of the invention also include a pluggable architecture formatter to convert ingested raw data into a common format, such as, for example, key value p…
Who is the assignee on this patent?
Microsoft Corp, Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F17/30563. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 16 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).