Data tagging and prompt generation system

US12436973B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12436973-B2
Application numberUS-202318383557-A
CountryUS
Kind codeB2
Filing dateOct 25, 2023
Priority dateOct 25, 2023
Publication dateOct 7, 2025
Grant dateOct 7, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

System, method, and various embodiments for data tagging and prompt generation are described herein. An embodiment operates by receiving input data, identifying metadata, generating one or more statistics based on the input data, calculating a sample size for the input data based on the one or more statistics and extracting a sample of the input data of the sample size. A prompt is generated based on a prompt template, and the prompt is provided to a language model configured to tag the input in accordance with the prompt. The output including tagged input data is received, and a query is executed against the tagged input data.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving, by one or more processors, input data comprising data to be tagged by a language model; identifying metadata associated with the input data, wherein the metadata comprises a name by which to refer to the input data; generating one or more statistics based on the input data, the one or more statistics comprising a total number of data items in the input data; calculating a sample size for the input data based on the one or more statistics, wherein the sample size is less than the total number of data items in the input data; extracting a sample of the input data in accordance with the sample size, wherein the sample of the input data comprises a subset of the input data; generating a prompt based on a prompt template, the prompt template comprising an input segment comprising the metadata and the sample of the input data, and an output segment identifying a format for an output; providing the prompt to the language model configured to generate one or more tags based on the sample of the input data, and tag the input data with the one or more tags in accordance with the prompt; receiving the output comprising tagged input data which was tagged with one or more tags generated based on the sample of the input data and in accordance with the format, wherein the tagged input data includes a semantic meaning or semantic context of the input data; storing the tagged input data in a database; executing a query against the tagged input data stored in the database; and returning a result of the query. 2. The method of claim 1 , wherein the receiving comprises: identifying sensitive data and non-sensitive data from the input data; extracting the sensitive data, wherein only the non-sensitive data is provided to the language model; and tagging the sensitive data independently of the output. 3. The method of claim 1 , wherein the language model comprises an artificial intelligence language model configured to perform a variety of tasks including tagging the input data, and wherein the artificial intelligence language model is operating one or more different processors. 4. The method of claim 1 , further comprising: receiving a request for additional data, after providing the prompt and prior to receiving the output; extracting a second sample of the input data in accordance with the sample size; and generating a second prompt comprising the second sample; and providing the second prompt including the second sample to the language model. 5. The method of claim 4 , wherein the second sample is a same size as the sample size. 6. The method of claim 1 , wherein the input data comprises a table from a database, the table comprising a plurality of columns, each column including a plurality of rows. 7. The method of claim 6 , wherein at least a subset of the plurality of columns from the table include tags generated by the language model. 8. A system comprising: a memory; and at least one processor coupled to the memory and configured to perform operations comprising: receiving input data comprising data to be tagged by a language model; identifying metadata associated with the input data, wherein the metadata comprises a name by which to refer to the input data; generating one or more statistics based on the input data, the one or more statistics comprising a total number of data items in the input data; calculating a sample size for the input data based on the one or more statistics, wherein the sample size is less than the total number of data items in the input data; extracting a sample of the input data in accordance with the sample size, wherein the sample of the input data comprises a subset of the input data; generating a prompt based on a prompt template, the prompt template comprising an input segment comprising the metadata and the sample of the input data, and an output segment identifying a format for an output; providing the prompt to the language model configured to generate one or more tags based on the sample of the input data, and tag the input data with the one or more tags in accordance with the prompt; receiving the output comprising tagged input data which was tagged with one or more tags generated based on the sample of the input data and in accordance with the format, wherein the tagged input data includes a semantic meaning or semantic context of the input data; storing the tagged input data in a database; executing a query against the tagged input data stored in the database; and returning a result of the query. 9. The system of claim 8 , wherein the receiving comprises: identifying sensitive data and non-sensitive data from the input data; extracting the sensitive data, wherein only the non-sensitive data is provided to the language model; and tagging the sensitive data independently of the output. 10. The system of claim 8 , wherein the language model comprises an artificial intelligence language model configured to perform a variety of tasks including tagging the input data, and wherein the artificial intelligence language model is operating one or more different processors. 11. The system of claim 8 , the operations further comprising: receiving a request for additional data, after providing the prompt and prior to receiving the output; extracting a second sample of the input data in accordance with the sample size; and generating a second prompt comprising the second sample; and providing the second prompt including the second sample to the language model. 12. The system of claim 11 , wherein the second sample is a same size as the sample size. 13. The system of claim 8 , wherein the input data comprises a table from a database, the table comprising a plurality of columns, each column including a plurality of rows. 14. The system of claim 13 , wherein at least a subset of the plurality of columns from the table include tags generated by the language model. 15. A non-transitory computer-readable device having instructions stored thereon that, when executed by at least one computing device, cause the at least one computing device to perform operations comprising: receiving input data comprising data to be tagged by a language model; identifying metadata associated with the input data, wherein the metadata comprises a name by which to refer to the input data; generating one or more statistics based on the input data, the one or more statistics comprising a total number of data items in the input data; calculating a sample size for the input data based on the one or more statistics, wherein the sample size is less than the total number of data items in the input data; extracting a sample of the input data in accordance with the sample size, wherein the sample of the input data comprises a subset of the input data; generating a prompt based on a prompt template, the prompt template comprising an input segment comprising the metadata and the sample of the input data, and an output segment identifying a format for an output; providing the prompt to the language model configured to generate one or more tags based on the sample of the input data, and tag the input data with the one or more tags in accordance with the prompt; receiving the output comprising tagged input data which was tagged with one or more tags generated based on the sample of the input data and in accordance with the format, wherein the tagged input data includes a semantic meaning or semantic context of the input data; storing the tagged input data in a database; executing a query against the tagged input data stored in the

Assignees

Inventors

Classifications

  • Approximate or statistical queries · CPC title

  • G06F16/285Primary

    Clustering or classification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12436973B2 cover?
System, method, and various embodiments for data tagging and prompt generation are described herein. An embodiment operates by receiving input data, identifying metadata, generating one or more statistics based on the input data, calculating a sample size for the input data based on the one or more statistics and extracting a sample of the input data of the sample size. A prompt is generated ba…
Who is the assignee on this patent?
Sap Se
What technology area does this patent fall under?
Primary CPC classification G06F16/285. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).