What technology area does this patent fall under?

Primary CPC classification G06F16/243. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 13 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and system for analyzing natural language data by using domain-specific language models

US12298970B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12298970-B2
Application number	US-202318217868-A
Country	US
Kind code	B2
Filing date	Jul 3, 2023
Priority date	Jul 3, 2023
Publication date	May 13, 2025
Grant date	May 13, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for providing a domain-specific language model to facilitate natural language data analytics is disclosed. The method includes aggregating documents from various sources, each of the documents including natural language data; ingesting each of the documents to generate structured data sets that are organized according to a contextual hierarchy; determining prompts that provide domain-specific information for a language model, the domain-specific information including instructions to access the structured data sets; receiving a request via a graphical user interface, the request relating to questions in a natural language format; generating, by using the language model, software codes for the request based on the prompts; and executing each of the software codes to identify results for the request from the structured data sets.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for providing a domain-specific language model to facilitate natural language data analytics, the method being implemented by at least one processor, the method comprising: aggregating, by the at least one processor, a plurality of documents from at least one source, each of the plurality of documents including natural language data; ingesting, by the at least one processor, each of the plurality of documents to generate at least one structured data set that is organized according to a contextual hierarchy, wherein the ingesting of each of the plurality of documents further comprises: attaching, by the at least one processor, at least one tag to each of the plurality of documents based on content of corresponding natural language data, each of the at least one tag including corresponding metadata; formatting, by the at least one processor, at least one data table in each of the plurality of documents to discover at least one corresponding table boundary wherein the formatting of the at least one data table further comprises: associating, by the at least one processor, each of the at least one data table with a corresponding placeholder reference, the placeholder reference representing a spatial relationship between the at least one data table and the plurality of documents; and persisting, by the at least one processor, the placeholder reference within the corresponding plurality of documents in place of the corresponding at least one data table; and segmenting, by the at least one processor, each of the plurality of documents into at least one section by using at least one stylistic indicator and at least one contextual indicator; determining, by the at least one processor, at least one prompt that provides domain-specific information for a language model, the domain-specific information including instructions to access the at least one structured data set; receiving, by the at least one processor, a request via a graphical user interface, the request relating to at least one question in a natural language format; generating, by the at least one processor using the language model, at least one software code for the request based on the at least one prompt; and executing, by the at least one processor, each of the at least one software code to identify at least one result for the request from the at least one structured data set. 2. The method of claim 1 , wherein each of the at least one prompt includes at least one from among scenario data that defines a domain-specific scenario, interface data that defines at least one usable application programming interface, and instruction data that orders the language model to generate the at least one software code by using the interface data. 3. The method of claim 1 , wherein the at least one result includes a modularization of a previously generated software code into at least one function that answers a variant of the at least one question. 4. The method of claim 1 , wherein the corresponding metadata includes supplemental information that is automatically identified and automatically retrieved for each of the plurality of documents, the supplemental information including at least one from among filing information, participant information, agreement information, and date information. 5. The method of claim 1 , wherein each of the at least one section includes at least one direct citation to original data in the corresponding plurality of documents; and wherein each of the at least one section further includes at least one section label that is assigned according to a tree hierarchy to preserve full sectional context. 6. The method of claim 1 , further comprising: discovering, by the at least one processor, the at least one table boundary by using spatial positioning of table contents, table styles, visual table indicators, and textual table indicators; and organizing, by the at least one processor, at least one row and at least one column into a machine-readable format. 7. The method of claim 1 , wherein the language model includes at least one from among a large language model, a deep learning model, a neural network model, a natural language processing model, a machine learning model, a mathematical model, a process model, and a data model. 8. A computing device configured to implement an execution of a method for providing a domain-specific language model to facilitate natural language data analytics, the computing device comprising: a processor; a memory; and a communication interface coupled to each of the processor and the memory, wherein the processor is configured to: aggregate a plurality of documents from at least one source, each of the plurality of documents including natural language data; ingest each of the plurality of documents to generate at least one structured data set that is organized according to a contextual hierarchy, wherein the ingest of each of the plurality of documents, the processor is further configured to: attach at least one tag to each of the plurality of documents based on content of corresponding natural language data, each of the at least one tag including corresponding metadata; format at least one data table in each of the plurality of documents to discover at least one corresponding table boundary wherein, to format the at least one data table, the processor is further configured to: associate each of the at least one data table with a corresponding placeholder reference, the placeholder reference representing a spatial relationship between the at least one data table and the plurality of documents; and persist the placeholder reference within the corresponding plurality of documents in place of the corresponding at least one data table; and segment each of the plurality of documents into at least one section by using at least one stylistic indicator and at least one contextual indicator; determine at least one prompt that provides domain-specific information for a language model, the domain-specific information including instructions to access the at least one structured data set; receive a request via a graphical user interface, the request relating to at least one question in a natural language format; generate, by using the language model, at least one software code for the request based on the at least one prompt; and execute each of the at least one software code to identify at least one result for the request from the at least one structured data set. 9. The computing device of claim 8 , wherein each of the at least one prompt includes at least one from among scenario data that defines a domain-specific scenario, interface data that defines at least one usable application programming interface, and instruction data that orders the language model to generate the at least one software code by using the interface data. 10. The computing device of claim 8 , wherein the at least one result includes a modularization of a previously generated software code into at least one function that answers a variant of the at least one question. 11. The computing device of claim 8 , wherein the corresponding metadata includes supplemental information that is automatically identified and automatically retrieved for each of the plurality of documents, the supplemental information including at least one from among filing information, participant information, agreement information, and date information. 12. The computing device of claim 10 , wherein each of the at least one section includes at least one direct citation to original data in the corresponding plurality of documents; and wherein each of the at least one section further includes at l

Assignees

Jpmorgan Chase Bank Na

Inventors

Classifications

G06F16/24575
using context · CPC title
G06F16/258
Data format conversion from or to a database · CPC title
G06F16/243Primary
Natural language query formulation · CPC title

Patent family

Related publications grouped by family.

View patent family 94175411

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12298970B2 cover?: A method for providing a domain-specific language model to facilitate natural language data analytics is disclosed. The method includes aggregating documents from various sources, each of the documents including natural language data; ingesting each of the documents to generate structured data sets that are organized according to a contextual hierarchy; determining prompts that provide domain-s…
Who is the assignee on this patent?: Jpmorgan Chase Bank Na
What technology area does this patent fall under?: Primary CPC classification G06F16/243. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 13 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Exploiting domain-specific language characteristics for language model pretraining

Automatic adaptive digital content generation for collaborative documents using machine-learning-based digital content processing techniques

Machine learning techniques for identifying logical sections in unstructured data

Responding to user queries by context-based intelligent agents

Systems and methods for contextual ranking of search results

Frequently asked questions