Systems and methods for grounded query generation over heterogeneous data sources

US2026050617A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2026050617-A1
Application numberUS-202519300327-A
CountryUS
Kind codeA1
Filing dateAug 14, 2025
Priority dateAug 16, 2024
Publication dateFeb 19, 2026
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed is a system and method for grounded query generation over heterogeneous data sources. The method includes receiving a user input corresponding to a user requirement from at least one user, extracting a plurality of sub-models corresponding to the received user input, creating a context of the received user input based on the extracted plurality of sub-models, generating an executable query in a specific query language based on the context using a Large Language Model (LLM) by processing the context and the user input within a structured prompt, validating the generated executable query using a multi-stage validation process, generating an LLM response for the received user input based on results of the validation, and outputting the generated LLM response on a user interface of a user device.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system comprising: a processor; and a memory communicably coupled to the processor, wherein the memory comprises processor-executable instructions which, when executed by the processor, cause the processor to: receive a user input corresponding to a user requirement from at least one user, wherein the user input comprises at least one entity and at least one concept corresponding to the user requirement; extract a plurality of sub-models corresponding to the received user input based on a semantic similarity between the user input and a plurality of ontological representations of data sources stored in a semantic data catalog; create a context of the received user input based on the extracted plurality of sub-models, wherein the context comprises a subgraph representing linked entities and relationships semantically related to the user input; generate an executable query in a specific query language based on the context using a Large Language Model (LLM) by processing the context and the user input within a structured prompt, wherein the context is represented in a query-language-specific schema format; validate the generated executable query using a multi-stage validation process, wherein the multi-stage validation process comprises at least one of a syntax validation, a benign operation check, a grounding validation, and an execution validation; generate an LLM response for the received user input based on results of the validation, wherein the LLM response comprises an explanation corresponding to the generated executable query and the generated executable query; and output the generated LLM response for the received user input on a user interface of a user device. 2 . The system of claim 1 , wherein the user input comprises named entities, domain-specific terms, and conceptual keywords. 3 . The system of claim 1 , wherein the processor is further to: preprocess the user input using a natural language processing model, wherein the natural language processing model comprises a named entity recognition, a part-of-speech tagging, a dependency parsing, and a domain-specific concept mapping. 4 . The system of claim 1 , wherein to extract the plurality of sub-models corresponding to the received user input based on the semantic similarity between the user input and the plurality of ontological representations of data sources stored in the semantic data catalog, the processor is to: encode preprocessed user input into a first vector embedding using a sentence and cross-encoder language model trained to generate contextualized semantic embeddings; retrieve the plurality of ontological representations from the semantic data catalog, wherein each ontological representation corresponds to a type of data source of an organization and wherein each ontological representation being modeled as an ontology comprising classes, data properties, object properties, and individuals; generate a plurality of vector embeddings for each ontological components within the plurality of ontological representations using a pretrained language model; compute semantic similarity scores between the first vector embedding of the user input and each of the plurality of vector embeddings associated with the ontological components using a cosine similarity function; rank the ontological components based on the computed semantic similarity scores; select a set of subgraphs corresponding to the plurality of sub-models from the plurality of ontological representations by mapping the semantic similarity scores with a predefined threshold value; identify additional linking entities and properties in the ontology components to be embedded in each sub-model; and construct each sub-model as the subgraph of an ontology comprising the semantically similar ontological components and corresponding linking relationships. 5 . The system of claim 1 , wherein to create the context of the received user input based on the extracted plurality of sub-models, the processor is to: aggregate the extracted plurality of sub-models, wherein each sub-model comprises a subset of ontological components selected from ontologies stored in the semantic data catalog, and wherein the ontological components comprise at least one of classes, data properties, object properties, and individuals; generate a unified intermediate graph structure as a preliminary context graph, wherein the extracted plurality of sub-models being preprocessed to filter duplicate entities and overlapping object properties across the sub-models using canonical entity alignment and ontology normalization rules; identify missing linking entities and missing relationships from the ontologies to be embedded in the plurality of sub-models, wherein the missing linking entities and the missing relationships determined to be required for forming connected paths between the at least one entity and the at least one concept in the preliminary context graph; update the preliminary context graph by embedding the identified missing linking entities and the missing relationships by traversing ontological graph structure to detect intermediate nodes and edges semantically connected to extracted entities based on a graph distance, a relationship strength, and a domain relevance; perform a graph completeness check to validate reachability of each of the at least one entity and the at least one concept from the user input within the updated preliminary context graph via ontologically valid object properties; filter semantically unrelated branches from the updated preliminary context graph based on a threshold semantic similarity score between each node and a user input embedding; and construct a final context subgraph comprising semantically relevant entities, data properties, object properties, and linking paths representing a structure and relationships required to interpret the user input. 6 . The system of claim 1 , wherein to generate the executable query in the specific query language based on the context using the Large Language Model (LLM) by processing the context and the user input within the structured prompt, the processor is to: construct a prompt by concatenating the received user input in natural language form, wherein the prompt comprises a textual representation of the final context subgraph in the query-language-specific schema format, and a plurality of language-specific generation instructions; convert the final context subgraph into the query-language-specific schema format; and generate a plurality of candidate executable queries in the specified query language by processing the user input and a structured schema representation using the LLM. 7 . The system of claim 1 , wherein to validate the generated executable query using a multi-stage validation process comprising at least one of the syntax validation, the benign operation check, the grounding validation, and the execution validation, the processor is to: perform the syntax validation on the generated executable query by checking compliance with a formal grammar of a target query language using a syntax parser model; perform the benign operation check by scanning the generated executable query for presence of potentially abnormal operations, and reject queries comprising the potentially abnormal operations; perform the grounding validation by mapping tables, classes, columns, properties, and schema components present in the generated executable query with corresponding components in ontology-derived context subgraph; perform an execution validation by executing the generated executable query in a staging environment to determine runtime errors; generate an error message specific corresponding to the generated executable query based on

Assignees

Inventors

Classifications

  • Natural language query formulation · CPC title

  • G06F16/383Primary

    using metadata automatically derived from the content · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2026050617A1 cover?
Disclosed is a system and method for grounded query generation over heterogeneous data sources. The method includes receiving a user input corresponding to a user requirement from at least one user, extracting a plurality of sub-models corresponding to the received user input, creating a context of the received user input based on the extracted plurality of sub-models, generating an executable …
Who is the assignee on this patent?
Accenture Global Solutions Ltd
What technology area does this patent fall under?
Primary CPC classification G06F16/3329. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Feb 19 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).