Analytic system for machine learning prediction model selection
US-2019258904-A1 · Aug 22, 2019 · US
US12436973B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12436973-B2 |
| Application number | US-202318383557-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 25, 2023 |
| Priority date | Oct 25, 2023 |
| Publication date | Oct 7, 2025 |
| Grant date | Oct 7, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
System, method, and various embodiments for data tagging and prompt generation are described herein. An embodiment operates by receiving input data, identifying metadata, generating one or more statistics based on the input data, calculating a sample size for the input data based on the one or more statistics and extracting a sample of the input data of the sample size. A prompt is generated based on a prompt template, and the prompt is provided to a language model configured to tag the input in accordance with the prompt. The output including tagged input data is received, and a query is executed against the tagged input data.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving, by one or more processors, input data comprising data to be tagged by a language model; identifying metadata associated with the input data, wherein the metadata comprises a name by which to refer to the input data; generating one or more statistics based on the input data, the one or more statistics comprising a total number of data items in the input data; calculating a sample size for the input data based on the one or more statistics, wherein the sample size is less than the total number of data items in the input data; extracting a sample of the input data in accordance with the sample size, wherein the sample of the input data comprises a subset of the input data; generating a prompt based on a prompt template, the prompt template comprising an input segment comprising the metadata and the sample of the input data, and an output segment identifying a format for an output; providing the prompt to the language model configured to generate one or more tags based on the sample of the input data, and tag the input data with the one or more tags in accordance with the prompt; receiving the output comprising tagged input data which was tagged with one or more tags generated based on the sample of the input data and in accordance with the format, wherein the tagged input data includes a semantic meaning or semantic context of the input data; storing the tagged input data in a database; executing a query against the tagged input data stored in the database; and returning a result of the query. 2. The method of claim 1 , wherein the receiving comprises: identifying sensitive data and non-sensitive data from the input data; extracting the sensitive data, wherein only the non-sensitive data is provided to the language model; and tagging the sensitive data independently of the output. 3. The method of claim 1 , wherein the language model comprises an artificial intelligence language model configured to perform a variety of tasks including tagging the input data, and wherein the artificial intelligence language model is operating one or more different processors. 4. The method of claim 1 , further comprising: receiving a request for additional data, after providing the prompt and prior to receiving the output; extracting a second sample of the input data in accordance with the sample size; and generating a second prompt comprising the second sample; and providing the second prompt including the second sample to the language model. 5. The method of claim 4 , wherein the second sample is a same size as the sample size. 6. The method of claim 1 , wherein the input data comprises a table from a database, the table comprising a plurality of columns, each column including a plurality of rows. 7. The method of claim 6 , wherein at least a subset of the plurality of columns from the table include tags generated by the language model. 8. A system comprising: a memory; and at least one processor coupled to the memory and configured to perform operations comprising: receiving input data comprising data to be tagged by a language model; identifying metadata associated with the input data, wherein the metadata comprises a name by which to refer to the input data; generating one or more statistics based on the input data, the one or more statistics comprising a total number of data items in the input data; calculating a sample size for the input data based on the one or more statistics, wherein the sample size is less than the total number of data items in the input data; extracting a sample of the input data in accordance with the sample size, wherein the sample of the input data comprises a subset of the input data; generating a prompt based on a prompt template, the prompt template comprising an input segment comprising the metadata and the sample of the input data, and an output segment identifying a format for an output; providing the prompt to the language model configured to generate one or more tags based on the sample of the input data, and tag the input data with the one or more tags in accordance with the prompt; receiving the output comprising tagged input data which was tagged with one or more tags generated based on the sample of the input data and in accordance with the format, wherein the tagged input data includes a semantic meaning or semantic context of the input data; storing the tagged input data in a database; executing a query against the tagged input data stored in the database; and returning a result of the query. 9. The system of claim 8 , wherein the receiving comprises: identifying sensitive data and non-sensitive data from the input data; extracting the sensitive data, wherein only the non-sensitive data is provided to the language model; and tagging the sensitive data independently of the output. 10. The system of claim 8 , wherein the language model comprises an artificial intelligence language model configured to perform a variety of tasks including tagging the input data, and wherein the artificial intelligence language model is operating one or more different processors. 11. The system of claim 8 , the operations further comprising: receiving a request for additional data, after providing the prompt and prior to receiving the output; extracting a second sample of the input data in accordance with the sample size; and generating a second prompt comprising the second sample; and providing the second prompt including the second sample to the language model. 12. The system of claim 11 , wherein the second sample is a same size as the sample size. 13. The system of claim 8 , wherein the input data comprises a table from a database, the table comprising a plurality of columns, each column including a plurality of rows. 14. The system of claim 13 , wherein at least a subset of the plurality of columns from the table include tags generated by the language model. 15. A non-transitory computer-readable device having instructions stored thereon that, when executed by at least one computing device, cause the at least one computing device to perform operations comprising: receiving input data comprising data to be tagged by a language model; identifying metadata associated with the input data, wherein the metadata comprises a name by which to refer to the input data; generating one or more statistics based on the input data, the one or more statistics comprising a total number of data items in the input data; calculating a sample size for the input data based on the one or more statistics, wherein the sample size is less than the total number of data items in the input data; extracting a sample of the input data in accordance with the sample size, wherein the sample of the input data comprises a subset of the input data; generating a prompt based on a prompt template, the prompt template comprising an input segment comprising the metadata and the sample of the input data, and an output segment identifying a format for an output; providing the prompt to the language model configured to generate one or more tags based on the sample of the input data, and tag the input data with the one or more tags in accordance with the prompt; receiving the output comprising tagged input data which was tagged with one or more tags generated based on the sample of the input data and in accordance with the format, wherein the tagged input data includes a semantic meaning or semantic context of the input data; storing the tagged input data in a database; executing a query against the tagged input data stored in the
Approximate or statistical queries · CPC title
Clustering or classification · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.