Probability mapping model for location of natural resources
US-11500905-B2 · Nov 15, 2022 · US
US11880345B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11880345-B2 |
| Application number | US-202117463591-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 1, 2021 |
| Priority date | Sep 14, 2020 |
| Publication date | Jan 23, 2024 |
| Grant date | Jan 23, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
This disclosure relates generally to generating annotations and field-names for a relational schema. Typically, most domains have relational database (RDB) system built for them instead of domain ontologies and usually linguistic information of the schema is not used to recover the domain terms. The disclosed method and system facilitate generating annotations and field-names for a relational schema, while considering the linguistic information of a schema by using a trained model, trained through a proposed training technique. The trained model comprises of at least one knowledge graph and a set of associated parameters. The trained model is further used to perform a plurality of tasks, wherein the plurality of tasks include generating a plurality of new fieldnames for a relational schema through a stochastic generative process and for generating a new annotation for a fieldname of a relational schema through a probabilistic inference technique.
Opening claim text (preview).
We claim: 1. A processor-implemented method comprising: receiving an input associated with a relational schema of a database, via a one or more hardware processors, the input comprising a plurality of relational data and a knowledge graph of a domain, wherein the plurality of relational data comprises of one of an annotated relational data and an unannotated relational data, wherein the unannotated relational data comprises of a plurality of fieldnames and the annotated relational data comprises of the plurality of fieldnames and an annotation for each fieldname of the plurality of fieldnames, wherein the knowledge graph comprises of a plurality of vertices and a plurality of edges, and wherein each of the plurality of vertices represents a domain concept and each of the edge represents a typed relationship between two vertices of the plurality vertices, and the knowledge graph comprises of one of an empty knowledge graph and a partially filled knowledge graph, wherein the empty knowledge graph does not comprise any vertices or any edges and the partially filled knowledge graph comprises at least one vertex, wherein the annotation is a path in the knowledge graph associated with the fieldname; training a pre-defined probabilistic generative model using the input to obtain a trained model, by the one or more hardware processors, the trained model comprising of the knowledge graph (G) and a set of associated parameters (θ) through a training technique, wherein the set of parameters(θ) of the trained model are associated with a plurality of concepts and a plurality of relationships of the knowledge graph and the pre-defined probabilistic generative model defines (a) a probability distribution over a set of fieldnames, a set of annotation for each fieldname, the knowledge graph and (b) a stochastic process to generate the set of fieldnames, the set of annotation for each field name, where the training technique comprises one of a supervised training and an unsupervised training, wherein training the pre-defined probabilistic generative model to obtain the trained model comprises: determining whether the input is an unannotated relational data or an annotated relational data, and performing, upon determining, one of: on determining the input as the annotated relational data, performing the supervised training, wherein the supervised training is optimizing the knowledge graph (G) and the set of associated parameters using a Maximum Likelihood Estimation (MLE) technique; and on determining the input is the unannotated relational data, performing the unsupervised training, wherein the unsupervised training is based on the Expectation Maximization (EM) technique, wherein the EM technique includes performing a pre-defined number of iterations of an expectation process and a maximization process, wherein the expectation process is performed using the plurality of annotations and the maximization process is performed using the knowledge graph (G) and the set of associated parameters; and performing a plurality of tasks using the trained model, wherein the plurality of tasks comprises of generating a plurality of new fieldnames for a new relational schema through a stochastic generative process and generating a new annotation for a fieldname of a new relational schema through a probabilistic inference technique, wherein the stochastic generative process for generating the fieldname of the new relational schema comprises sampling a path from the knowledge graph and further sampling a fieldname from the path, wherein the probabilistic inference technique for generating the annotation comprises: enumerating a plurality of annotations for the fieldname; computing a probability score for each of the enumerated annotations based on a probability score computing technique; and identifying an annotation for the fieldname, based on the set of probability scores; wherein the step for sampling the path from the knowledge graph comprises: identifying a starting concept to be sampled by using distribution Pf, then a stop or a continue is sampled from distribution Pl, when stop is sampled, then the path sampling process stops, if continue is sampled, the next concept in the path is sampled, wherein first the type of relationship to the next concept is sampled using the distribution Pi, further the next concept is sampled using the distribution Pt, further stop or continue is sampled using the distribution Pl. 2. A system, comprising: an input/output interface; one or more memories; and one or more hardware processors, the one or more memories coupled to the one or more hardware processors, wherein the one or more hardware processors are configured to execute programmed instructions stored in the one or more memories to: receive an input associated with a relational schema of a database, via a one or more hardware processors, the input comprising a plurality of relational data and a knowledge graph of a domain, wherein the plurality of relational data comprises of one of an annotated relational data and an unannotated relational data, wherein the unannotated relational data comprises of a plurality of fieldnames and the annotated relational data comprises of the plurality of fieldnames and an annotation for each fieldname of the plurality of fieldnames, wherein the knowledge graph comprises of a plurality of vertices and a plurality of edges, and wherein each of the plurality of vertices represents a domain concept and each of the edge represents a typed relationship between two vertices of the plurality vertices, and the knowledge graph comprises of one of an empty knowledge graph and a partially filled knowledge graph, wherein the empty knowledge graph does not comprise any vertices or any edges and the partially filled knowledge graph comprises at least one vertex, wherein the annotation is a path in the knowledge graph associated with the fieldname; train a pre-defined probabilistic generative model using the input to obtain a trained model, by the one or more hardware processors, the trained model comprising of the knowledge graph (G) and a set of associated parameters (θ) through a training technique, wherein the set of parameters(θ) of the trained model are associated with a plurality of concepts and a plurality of relationships of the knowledge graph and the pre-defined probabilistic generative model defines (a) a probability distribution over a set of fieldnames, a set of annotation for each fieldname, the knowledge graph and (b) a stochastic process to generate the set of fieldnames, the set of annotation for each field name, where the training technique comprises one of a supervised training and an unsupervised training, wherein training the pre-defined probabilistic generative model to obtain the trained model comprises: determine whether the input is an unannotated relational data or an annotated relational data, and perform, upon determining, one of: on determining the input as the annotated relational data, performing the supervised training, wherein the supervised training is optimizing the knowledge graph (G) and the set of associated parameters using a Maximum Likelihood Estimation (MLE) technique; and on determining the input is the unannotated relational data, performing the unsupervised training, wherein the unsupervised training is based on the Expectation Maximization (EM) technique, wherein the EM technique includes performing a pre-defined number of iterations of an expectation process and a maximization process, wherein the expectation process is performed using the plurality of annotations and the maximization process is performed using the knowledge graph (G) and the set of associated parameters; and perform a plurality of tasks using the trained model, wherein the plurality of tasks comprises of generating a plurality of new fieldnames for a new relational schema t
Schema design and management · CPC title
Knowledge engineering; Knowledge acquisition · CPC title
Machine learning · CPC title
Probabilistic graphical models, e.g. probabilistic networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.