Advances in data ingestion and data provisioning to aid management of domain-specific data via software data platform
US-2021398235-A1 · Dec 23, 2021 · US
US11768849B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11768849-B2 |
| Application number | US-202117351969-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 18, 2021 |
| Priority date | Mar 15, 2021 |
| Publication date | Sep 26, 2023 |
| Grant date | Sep 26, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computing system that includes one or more server computing devices including one or more processors configured to execute instructions for a domain extensibility module that provides software development tools for building domain extensions for a database platform, and a data ingestion module that provides software development tools for defining a metadata schema for extracting metadata from data files. The one or more processors are configured to receive a set of data from a user computing device, define a target metadata schema that includes one or more metadata fields that will be populated during a data ingestion process, define a target domain extension that defines one or more data types for storing the received set of data after performing the data ingestion process, and ingest the received set of data using a metadata extraction pipeline to generate metadata files based on the target metadata schema.
Opening claim text (preview).
The invention claimed is: 1. A computing system comprising: one or more server computing devices including one or more processors configured to execute instructions for: a domain extensibility module that provides software development tools for building domain extensions for a database platform of the computing system, wherein the domain extensions define a data type for data to be stored on the database platform, and storage and infrastructure components for the database platform for storing that defined data type; and a data ingestion module that provides software development tools for defining a metadata schema for extracting metadata from data files stored on the database platform, and generating a metadata extraction pipeline to extract metadata based on the defined metadata schema; wherein the one or more processors are configured to: receive a set of data from a domain-specific data platform, the domain-specific data platform being configured to aggregate data detected by one or more sensors operating in a domain associated with the domain-specific data platform; define a target metadata schema that includes one or more metadata fields that will be populated during a data ingestion process; define a target domain extension that defines one or more data types for storing the received set of data after performing the data ingestion process; ingest the received set of data using a metadata extraction pipeline to generate metadata files based on the target metadata schema; store the ingested set of data and the generated metadata files based on the target domain extension; and provide a network accessible endpoint for accessing the ingested set of data and the metadata file. 2. The computing system of claim 1 , wherein to define the target metadata schema, the one or more processors are configured to: classify the received set of data to determine a file format for the received set of data; and define the target metadata schema based on the determined file format for the received set of data. 3. The computing system of claim 2 , wherein to define the target metadata schema, the one or more processors are further configured to: identify a plurality of types of metadata that can be extracted from the received set of data; present a list of the plurality of types of metadata to a user; receive user input of one or more user selected types of metadata; and define the target metadata schema based on the one or more user selected types of metadata. 4. The computing system of claim 1 , wherein to define the target metadata schema, the one or more processors are configured to receive a new target metadata schema from a user. 5. The computing system of claim 1 , wherein the one or more processors are configured to execute instructions for a client application module that provides software development tools for integrating other application programs executed on client computing devices with the computing system. 6. The computing system of claim 5 , wherein the one or more processors are configured to: receive requests from an integrated application program to retrieve target data stored on the database platform; retrieve the target data from the database platform; and provide the integrated application program with a network accessible endpoint to retrieve the target data. 7. The computing system of claim 6 , wherein to retrieve the target data form the database platform, the one or more processors are configured to: receive a search parameter for the target data with the received request from the integrated application program; and search the ingested set of data and the stored metadata files based on the received search parameter to identify the target data. 8. The computing system of claim 6 , wherein the requests received from the integrated application program further include a target file system for receiving the target data, and wherein the one or more processors are further configured to: retrieve the target data from the database platform; mount the target data to the target file system; and provide the integrated application program with the network accessible endpoint to retrieve the target data mounted to the target file system. 9. The computing system of claim 8 , wherein to mount the target data to the target file system, the one or more processors are further configured to: emulate a file architecture of the target file system at the network accessible endpoint, the emulated file architecture including a target file path; and provide the target data to the integrated application program using the emulated file architecture. 10. The computing system of claim 1 , wherein the one or more processors are configured to execute instructions for a machine learning model module that provides software development tools for integrating one or more third party machine learning models executed by other computing devices with the computing system. 11. The computing system of claim 10 , wherein the received set of data is one of a plurality of sets of data, each set of data having a legacy file format, wherein each set of data of the plurality of sets of data are received from different respective domain-specific data platforms, each domain-specific data platform being configured to aggregate data detected by sensors operating in a domain associated with that domain-specific data platform, and wherein the one or more processors are further configured to: ingest the plurality of sets of data using the metadata extraction pipeline; store the ingested plurality of sets of data in a new file format that is different than the legacy file format and requires different storage and infrastructure components for the database platform for storing the new file format, the ingested plurality of sets of data being indexed for search; provide a network accessible endpoint for accessing the ingested plurality of sets of data; and provide the ingested plurality of sets of data to the one or more machine learning models using the network accessible endpoint. 12. The computing system of claim 11 , wherein the plurality of sets of data include data collected by sensors selected from the group consisting of wellhead sensors, seismic sensors, tank sensors, rolling stock sensors, and pipeline flow sensors. 13. A method comprising: at one or more processors of a computing system: providing software development tools for building domain extensions for a database platform of the computing system, wherein the domain extensions include defining a data type for data to be stored on the database platform, and storage and infrastructure components for the database platform for storing that defined data type; providing software development tools for defining a metadata schema for extracting metadata from data files stored on the database platform, and generating a metadata extraction pipeline to extract metadata based on the defined metadata schema; receiving a set of data from a domain-specific data platform, the domain-specific data platform being configured to aggregate data detected by one or more sensors operating in a domain associated with the domain-specific data platform; defining a target metadata schema that includes one or more metadata fields that will be populated during a data ingestion process; defining a target domain extension that defines one or more data types for storing the received set of data after performing the data ingestion process; ingesting the received set of data using a metadata extraction pipeline to generate metadata files based on the target metadata schema; storing the ingested set of data and the gene
Data format conversion from or to a database · CPC title
File meta data generation · CPC title
Specific adaptations of the file system to access devices and non-file objects via standard file system access operations, e.g. pseudo file systems (dedicated interfaces to storage systems G06F3/0601) · CPC title
Schema design and management · CPC title
Ensuring data consistency and integrity · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.