Method and system for analyzing natural language data by using domain-specific language models

US12298970B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12298970-B2
Application numberUS-202318217868-A
CountryUS
Kind codeB2
Filing dateJul 3, 2023
Priority dateJul 3, 2023
Publication dateMay 13, 2025
Grant dateMay 13, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for providing a domain-specific language model to facilitate natural language data analytics is disclosed. The method includes aggregating documents from various sources, each of the documents including natural language data; ingesting each of the documents to generate structured data sets that are organized according to a contextual hierarchy; determining prompts that provide domain-specific information for a language model, the domain-specific information including instructions to access the structured data sets; receiving a request via a graphical user interface, the request relating to questions in a natural language format; generating, by using the language model, software codes for the request based on the prompts; and executing each of the software codes to identify results for the request from the structured data sets.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for providing a domain-specific language model to facilitate natural language data analytics, the method being implemented by at least one processor, the method comprising: aggregating, by the at least one processor, a plurality of documents from at least one source, each of the plurality of documents including natural language data; ingesting, by the at least one processor, each of the plurality of documents to generate at least one structured data set that is organized according to a contextual hierarchy, wherein the ingesting of each of the plurality of documents further comprises: attaching, by the at least one processor, at least one tag to each of the plurality of documents based on content of corresponding natural language data, each of the at least one tag including corresponding metadata; formatting, by the at least one processor, at least one data table in each of the plurality of documents to discover at least one corresponding table boundary wherein the formatting of the at least one data table further comprises: associating, by the at least one processor, each of the at least one data table with a corresponding placeholder reference, the placeholder reference representing a spatial relationship between the at least one data table and the plurality of documents; and persisting, by the at least one processor, the placeholder reference within the corresponding plurality of documents in place of the corresponding at least one data table; and segmenting, by the at least one processor, each of the plurality of documents into at least one section by using at least one stylistic indicator and at least one contextual indicator; determining, by the at least one processor, at least one prompt that provides domain-specific information for a language model, the domain-specific information including instructions to access the at least one structured data set; receiving, by the at least one processor, a request via a graphical user interface, the request relating to at least one question in a natural language format; generating, by the at least one processor using the language model, at least one software code for the request based on the at least one prompt; and executing, by the at least one processor, each of the at least one software code to identify at least one result for the request from the at least one structured data set. 2. The method of claim 1 , wherein each of the at least one prompt includes at least one from among scenario data that defines a domain-specific scenario, interface data that defines at least one usable application programming interface, and instruction data that orders the language model to generate the at least one software code by using the interface data. 3. The method of claim 1 , wherein the at least one result includes a modularization of a previously generated software code into at least one function that answers a variant of the at least one question. 4. The method of claim 1 , wherein the corresponding metadata includes supplemental information that is automatically identified and automatically retrieved for each of the plurality of documents, the supplemental information including at least one from among filing information, participant information, agreement information, and date information. 5. The method of claim 1 , wherein each of the at least one section includes at least one direct citation to original data in the corresponding plurality of documents; and wherein each of the at least one section further includes at least one section label that is assigned according to a tree hierarchy to preserve full sectional context. 6. The method of claim 1 , further comprising: discovering, by the at least one processor, the at least one table boundary by using spatial positioning of table contents, table styles, visual table indicators, and textual table indicators; and organizing, by the at least one processor, at least one row and at least one column into a machine-readable format. 7. The method of claim 1 , wherein the language model includes at least one from among a large language model, a deep learning model, a neural network model, a natural language processing model, a machine learning model, a mathematical model, a process model, and a data model. 8. A computing device configured to implement an execution of a method for providing a domain-specific language model to facilitate natural language data analytics, the computing device comprising: a processor; a memory; and a communication interface coupled to each of the processor and the memory, wherein the processor is configured to: aggregate a plurality of documents from at least one source, each of the plurality of documents including natural language data; ingest each of the plurality of documents to generate at least one structured data set that is organized according to a contextual hierarchy, wherein the ingest of each of the plurality of documents, the processor is further configured to: attach at least one tag to each of the plurality of documents based on content of corresponding natural language data, each of the at least one tag including corresponding metadata; format at least one data table in each of the plurality of documents to discover at least one corresponding table boundary wherein, to format the at least one data table, the processor is further configured to: associate each of the at least one data table with a corresponding placeholder reference, the placeholder reference representing a spatial relationship between the at least one data table and the plurality of documents; and persist the placeholder reference within the corresponding plurality of documents in place of the corresponding at least one data table; and segment each of the plurality of documents into at least one section by using at least one stylistic indicator and at least one contextual indicator; determine at least one prompt that provides domain-specific information for a language model, the domain-specific information including instructions to access the at least one structured data set; receive a request via a graphical user interface, the request relating to at least one question in a natural language format; generate, by using the language model, at least one software code for the request based on the at least one prompt; and execute each of the at least one software code to identify at least one result for the request from the at least one structured data set. 9. The computing device of claim 8 , wherein each of the at least one prompt includes at least one from among scenario data that defines a domain-specific scenario, interface data that defines at least one usable application programming interface, and instruction data that orders the language model to generate the at least one software code by using the interface data. 10. The computing device of claim 8 , wherein the at least one result includes a modularization of a previously generated software code into at least one function that answers a variant of the at least one question. 11. The computing device of claim 8 , wherein the corresponding metadata includes supplemental information that is automatically identified and automatically retrieved for each of the plurality of documents, the supplemental information including at least one from among filing information, participant information, agreement information, and date information. 12. The computing device of claim 10 , wherein each of the at least one section includes at least one direct citation to original data in the corresponding plurality of documents; and wherein each of the at least one section further includes at l

Assignees

Inventors

Classifications

  • using context · CPC title

  • Data format conversion from or to a database · CPC title

  • G06F16/243Primary

    Natural language query formulation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12298970B2 cover?
A method for providing a domain-specific language model to facilitate natural language data analytics is disclosed. The method includes aggregating documents from various sources, each of the documents including natural language data; ingesting each of the documents to generate structured data sets that are organized according to a contextual hierarchy; determining prompts that provide domain-s…
Who is the assignee on this patent?
Jpmorgan Chase Bank Na
What technology area does this patent fall under?
Primary CPC classification G06F16/243. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 13 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).