What technology area does this patent fall under?

Primary CPC classification G06F16/211. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 23 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and system for generating annotations and field-names for relational schema

US11880345B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11880345-B2
Application number	US-202117463591-A
Country	US
Kind code	B2
Filing date	Sep 1, 2021
Priority date	Sep 14, 2020
Publication date	Jan 23, 2024
Grant date	Jan 23, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This disclosure relates generally to generating annotations and field-names for a relational schema. Typically, most domains have relational database (RDB) system built for them instead of domain ontologies and usually linguistic information of the schema is not used to recover the domain terms. The disclosed method and system facilitate generating annotations and field-names for a relational schema, while considering the linguistic information of a schema by using a trained model, trained through a proposed training technique. The trained model comprises of at least one knowledge graph and a set of associated parameters. The trained model is further used to perform a plurality of tasks, wherein the plurality of tasks include generating a plurality of new fieldnames for a relational schema through a stochastic generative process and for generating a new annotation for a fieldname of a relational schema through a probabilistic inference technique.

First claim

Opening claim text (preview).

We claim: 1. A processor-implemented method comprising: receiving an input associated with a relational schema of a database, via a one or more hardware processors, the input comprising a plurality of relational data and a knowledge graph of a domain, wherein the plurality of relational data comprises of one of an annotated relational data and an unannotated relational data, wherein the unannotated relational data comprises of a plurality of fieldnames and the annotated relational data comprises of the plurality of fieldnames and an annotation for each fieldname of the plurality of fieldnames, wherein the knowledge graph comprises of a plurality of vertices and a plurality of edges, and wherein each of the plurality of vertices represents a domain concept and each of the edge represents a typed relationship between two vertices of the plurality vertices, and the knowledge graph comprises of one of an empty knowledge graph and a partially filled knowledge graph, wherein the empty knowledge graph does not comprise any vertices or any edges and the partially filled knowledge graph comprises at least one vertex, wherein the annotation is a path in the knowledge graph associated with the fieldname; training a pre-defined probabilistic generative model using the input to obtain a trained model, by the one or more hardware processors, the trained model comprising of the knowledge graph (G) and a set of associated parameters (θ) through a training technique, wherein the set of parameters(θ) of the trained model are associated with a plurality of concepts and a plurality of relationships of the knowledge graph and the pre-defined probabilistic generative model defines (a) a probability distribution over a set of fieldnames, a set of annotation for each fieldname, the knowledge graph and (b) a stochastic process to generate the set of fieldnames, the set of annotation for each field name, where the training technique comprises one of a supervised training and an unsupervised training, wherein training the pre-defined probabilistic generative model to obtain the trained model comprises: determining whether the input is an unannotated relational data or an annotated relational data, and performing, upon determining, one of: on determining the input as the annotated relational data, performing the supervised training, wherein the supervised training is optimizing the knowledge graph (G) and the set of associated parameters using a Maximum Likelihood Estimation (MLE) technique; and on determining the input is the unannotated relational data, performing the unsupervised training, wherein the unsupervised training is based on the Expectation Maximization (EM) technique, wherein the EM technique includes performing a pre-defined number of iterations of an expectation process and a maximization process, wherein the expectation process is performed using the plurality of annotations and the maximization process is performed using the knowledge graph (G) and the set of associated parameters; and performing a plurality of tasks using the trained model, wherein the plurality of tasks comprises of generating a plurality of new fieldnames for a new relational schema through a stochastic generative process and generating a new annotation for a fieldname of a new relational schema through a probabilistic inference technique, wherein the stochastic generative process for generating the fieldname of the new relational schema comprises sampling a path from the knowledge graph and further sampling a fieldname from the path, wherein the probabilistic inference technique for generating the annotation comprises: enumerating a plurality of annotations for the fieldname; computing a probability score for each of the enumerated annotations based on a probability score computing technique; and identifying an annotation for the fieldname, based on the set of probability scores; wherein the step for sampling the path from the knowledge graph comprises: identifying a starting concept to be sampled by using distribution Pf, then a stop or a continue is sampled from distribution Pl, when stop is sampled, then the path sampling process stops, if continue is sampled, the next concept in the path is sampled, wherein first the type of relationship to the next concept is sampled using the distribution Pi, further the next concept is sampled using the distribution Pt, further stop or continue is sampled using the distribution Pl. 2. A system, comprising: an input/output interface; one or more memories; and one or more hardware processors, the one or more memories coupled to the one or more hardware processors, wherein the one or more hardware processors are configured to execute programmed instructions stored in the one or more memories to: receive an input associated with a relational schema of a database, via a one or more hardware processors, the input comprising a plurality of relational data and a knowledge graph of a domain, wherein the plurality of relational data comprises of one of an annotated relational data and an unannotated relational data, wherein the unannotated relational data comprises of a plurality of fieldnames and the annotated relational data comprises of the plurality of fieldnames and an annotation for each fieldname of the plurality of fieldnames, wherein the knowledge graph comprises of a plurality of vertices and a plurality of edges, and wherein each of the plurality of vertices represents a domain concept and each of the edge represents a typed relationship between two vertices of the plurality vertices, and the knowledge graph comprises of one of an empty knowledge graph and a partially filled knowledge graph, wherein the empty knowledge graph does not comprise any vertices or any edges and the partially filled knowledge graph comprises at least one vertex, wherein the annotation is a path in the knowledge graph associated with the fieldname; train a pre-defined probabilistic generative model using the input to obtain a trained model, by the one or more hardware processors, the trained model comprising of the knowledge graph (G) and a set of associated parameters (θ) through a training technique, wherein the set of parameters(θ) of the trained model are associated with a plurality of concepts and a plurality of relationships of the knowledge graph and the pre-defined probabilistic generative model defines (a) a probability distribution over a set of fieldnames, a set of annotation for each fieldname, the knowledge graph and (b) a stochastic process to generate the set of fieldnames, the set of annotation for each field name, where the training technique comprises one of a supervised training and an unsupervised training, wherein training the pre-defined probabilistic generative model to obtain the trained model comprises: determine whether the input is an unannotated relational data or an annotated relational data, and perform, upon determining, one of: on determining the input as the annotated relational data, performing the supervised training, wherein the supervised training is optimizing the knowledge graph (G) and the set of associated parameters using a Maximum Likelihood Estimation (MLE) technique; and on determining the input is the unannotated relational data, performing the unsupervised training, wherein the unsupervised training is based on the Expectation Maximization (EM) technique, wherein the EM technique includes performing a pre-defined number of iterations of an expectation process and a maximization process, wherein the expectation process is performed using the plurality of annotations and the maximization process is performed using the knowledge graph (G) and the set of associated parameters; and perform a plurality of tasks using the trained model, wherein the plurality of tasks comprises of generating a plurality of new fieldnames for a new relational schema t

Assignees

Tata Consultancy Services Ltd

Inventors

Classifications

G06F16/211Primary
Schema design and management · CPC title
G06N5/022
Knowledge engineering; Knowledge acquisition · CPC title
G06N20/00
Machine learning · CPC title
G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title

Patent family

Related publications grouped by family.

View patent family 81184776

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11880345B2 cover?: This disclosure relates generally to generating annotations and field-names for a relational schema. Typically, most domains have relational database (RDB) system built for them instead of domain ontologies and usually linguistic information of the schema is not used to recover the domain terms. The disclosed method and system facilitate generating annotations and field-names for a relational s…
Who is the assignee on this patent?: Tata Consultancy Services Ltd
What technology area does this patent fall under?: Primary CPC classification G06F16/211. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 23 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Probability mapping model for location of natural resources

Knowledge graph for conversational semantic search

Generalized expectation maximization

Intelligent systems and methods for process and asset health diagnosis, anomoly detection and control in wastewater treatment plants or drinking water plants

Unsupervised ontology-based graph extraction from texts

Automatic evaluation and improvement of ontologies for natural language processing tasks

Frequently asked questions