Large language model privacy preservation system

US12455980B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12455980-B2
Application numberUS-202318466049-A
CountryUS
Kind codeB2
Filing dateSep 13, 2023
Priority dateSep 13, 2023
Publication dateOct 28, 2025
Grant dateOct 28, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Computer-implemented methods for a large language model privacy preservation system. Aspects include receiving prompt data from a user device. Aspects further include generating pre-processed prompt data using the prompt data from the user device. Aspects also include identifying a category for the pre-processed prompt data using topic modeling. Aspects include generating normalized prompt data using the pre-processed prompt data. Aspects further include storing the category and the normalized prompt data.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving prompt data from a user device for processing by a large language model; generating pre-processed prompt data using the prompt data from the user device, wherein generating the pre-processed prompt data comprises detecting personally identifiable information (PII) in the prompt data, substituting each detected PII instance with a placeholder token, and removing sensitive information and irrelevant information including punctuation and stop words; deleting the original prompt data from volatile memory upon completion of the substitution of detected PII with placeholder tokens, such that only the redacted prompt data remains in memory; identifying a category for the pre-processed prompt data using topic modeling, wherein the topic modeling identifies topics based on patterns of word and phrase clusters and frequencies of words in the pre-processed prompt data; generating normalized prompt data using the pre-processed prompt data, wherein the normalized prompt data retains key elements of the prompt data while preserving semantic essence without personally identifiable information; and storing the category and the normalized prompt data by generating a data object containing both the category and the normalized prompt data, and storing only the data object in a category-indexed datastore for use in large language model applications including personalized content recommendations, quality control, refined model training, resource optimization, and research in natural language processing, wherein the original prompt data is not retained in any storage after deletion from volatile memory. 2. The computer-implemented method of claim 1 , wherein generating the pre-processed prompt data further comprises removing personally identifiable information from the prompt data, stemming the prompt data, and lemmatizing the prompt data. 3. The computer-implemented method of claim 2 , further comprising tokenizing the prompt data; removing stop words from the prompt data; stemming the prompt data; and lemmatizing the prompt data. 4. The computer-implemented method of claim 1 , wherein identifying the category for the pre-processed prompt data further comprises applying a Latent Dirichlet Allocation model to the pre-processed prompt data. 5. The computer-implemented method of claim 1 , wherein identifying the category for the pre-processed prompt data further comprises applying a Non-Negative Matrix Factorization model to the pre-processed prompt data. 6. The computer-implemented method of claim 1 , further comprising: identifying a second category for the pre-processed prompt data using the topic modeling; and storing the second category and the normalized prompt data. 7. The computer-implemented method of claim 1 , wherein identifying the category for the pre-processed prompt data using topic modeling further comprises: generating a topic for the pre-processed prompt data using topic modeling; and identifying the category corresponding to the topic. 8. A system comprising: a memory having computer readable instructions; and one or more processors for executing the computer readable instructions, the computer readable instructions controlling the one or more processors to perform operations comprising: receiving prompt data from a user device for processing by a large language model; generating pre-processed prompt data using the prompt data from the user device, wherein generating the pre-processed prompt data comprises detecting personally identifiable information (PII) in the prompt data, substituting each detected PII instance with a placeholder token, and removing sensitive information and irrelevant information including punctuation and stop words; deleting the original prompt data from volatile memory upon completion of the substitution of detected PII with placeholder tokens, such that only the redacted prompt data remains in memory; identifying a category for the pre-processed prompt data using topic modeling, wherein the topic modeling identifies topics based on patterns of word and phrase clusters and frequencies of words in the pre-processed prompt data; generating normalized prompt data using the pre-processed prompt data, wherein the normalized prompt data retains key elements of the prompt data while preserving semantic essence without personally identifiable information; and storing the category and the normalized prompt data by generating a data object containing both the category and the normalized prompt data, and storing only the data object in a category-indexed datastore for use in large language model applications including personalized content recommendations, quality control, refined model training, resource optimization, and research in natural language processing, wherein the original prompt data is not retained in any storage after deletion from volatile memory. 9. The system of claim 8 , wherein the operations to generate the pre-processed prompt data further comprise: removing personally identifiable information from the prompt data, stemming the prompt data, and lemmatizing the prompt data. 10. The system of claim 9 , wherein the operations further comprise: tokenizing the prompt data; removing stop words from the prompt data; stemming the prompt data; and lemmatizing the prompt data. 11. The system of claim 8 , wherein the operations to identify the category for the pre-processed prompt data further comprise applying a Latent Dirichlet Allocation model to the pre-processed prompt data. 12. The system of claim 8 , wherein the operations to identify the category for the pre-processed prompt data further comprise applying a Non-Negative Matrix Factorization model to the pre-processed prompt data. 13. The system of claim 8 , wherein the operations further comprise: identifying a second category for the pre-processed prompt data using the topic modeling; and storing the second category and the normalized prompt data. 14. The system of claim 8 , wherein to identify the category for the pre-processed prompt data using topic modeling, the operations further comprise: generating a topic for the pre-processed prompt data using topic modeling; and identifying the category corresponding to the topic. 15. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform operations comprising: receiving prompt data from a user device for processing by a large language model; generating pre-processed prompt data using the prompt data from the user device, wherein generating the pre-processed prompt data comprises detecting personally identifiable information (PII) in the prompt data, substituting each detected PII instance with a placeholder token, and removing sensitive information and irrelevant information including punctuation and stop words; deleting the original prompt data from volatile memory upon completion of the substitution of detected PII with placeholder tokens, such that only the redacted prompt data remains in memory; identifying a category for the pre-processed prompt data using topic modeling, wherein the topic modeling identifies topics based on patterns of word and phrase clusters and frequencies of words in the pre-processed prompt data; generating normalized prompt data using the pre-processed prompt data, wherein the normalized prompt data retains key elements of the prompt data while preserving semantic essence without personally identifiab

Assignees

Inventors

Classifications

  • Syntactic pre-processing, e.g. stopword elimination, stemming · CPC title

  • Protecting personal data, e.g. for financial or medical purposes · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12455980B2 cover?
Computer-implemented methods for a large language model privacy preservation system. Aspects include receiving prompt data from a user device. Aspects further include generating pre-processed prompt data using the prompt data from the user device. Aspects also include identifying a category for the pre-processed prompt data using topic modeling. Aspects include generating normalized prompt data…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F21/6245. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 28 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).