What technology area does this patent fall under?

Primary CPC classification G06F40/295. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 09 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Training language models and preserving privacy

US12412038B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12412038-B2
Application number	US-202318173199-A
Country	US
Kind code	B2
Filing date	Feb 23, 2023
Priority date	Oct 5, 2022
Publication date	Sep 9, 2025
Grant date	Sep 9, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In implementations of systems for training language models and preserving privacy, a computing device implements a privacy system to predict a next word after a last word in a sequence of words by processing input data using a machine learning model trained on training data to predict next words after last words in sequences of words. The training data describes a corpus of text associated with clients and including sensitive samples and non-sensitive samples. The machine learning model is trained by sampling a client of the clients and using a subset of the sensitive samples associated with the client and a subset of the non-sensitive samples associated with the client to update parameters of the machine learning model. The privacy system generates an indication of the next word after the last word in the sequence of words for display in a user interface.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving, by a processing device, input data describing a sequence of words ending with a last word; predicting, by the processing device, a next word after the last word in the sequence of words by processing the input data using a machine learning model trained on injected Gaussian noise and training data to update parameters of the machine learning model to predict next words after last words in sequences of words, the training data describing a corpus of text associated with clients and including sensitive samples and non-sensitive samples taken from databases that are client-content adjacent as differing in that a client and a sensitive entity are present in one of the client-content adjacent databases and are not present in another one of the client-content adjacent databases; and generating, by the processing device, an indication of the next word after the last word in the sequence of words for display in a user interface. 2. The method as described in claim 1 , wherein the machine learning model includes at least one of a Long Short Term Memory model, a Bidirectional Encoder Representations from Transformers model, or a Generative Pretrained Transformer 2 model. 3. The method as described in claim 1 , wherein the sensitive samples and the non-sensitive samples are identified by processing the corpus of text using a named entity recognition model. 4. The method as described in claim 3 , wherein the non-sensitive samples include a sensitive sample from the corpus of text based on an error rate associated with the named entity recognition model. 5. The method as described in claim 3 , wherein the sensitive samples include a non-sensitive sample from the corpus of text based on an identification error rate associated with the named entity recognition model. 6. The method as described in claim 1 , wherein the sensitive samples and the non-sensitive samples are sentences included in the corpus of text. 7. The method as described in claim 1 , wherein the sensitive samples and the non-sensitive samples are paragraphs included in the corpus of text. 8. A method comprising: forming, by a processing device, client-content adjacent databases that include a client database and a sensitive contents database, the client-content adjacent databases differing in that a client and a sensitive entity are present a corresponding database of the client-content adjacent databases and are not present in another database of the client-content adjacent databases, the forming including: removing samples associated with a client of a plurality of clients from the respective database of the client-content adjacent databases; and removing sensitive samples associated with a particular instance of sensitive content of a plurality of sensitive content regardless of client association from the respective database of the client-content adjacent databases; identifying, by the processing device, a set of clients from the plurality of clients from the client-content adjacent databases; identifying, by the processing device, a set of sensitive samples from the plurality of sensitive content from the client-content adjacent databases; generating training data by applying one or more differential privacy techniques to the samples associated with the set of clients or the set of sensitive samples; and training a machine learning model using the training data by a loss function using an aggregated gradient that is aggregated across the plurality of clients and the plurality of sensitive content, the training including injecting Gaussian noise and updating parameters of the machine learning model. 9. The method as described in claim 8 , wherein the sensitive samples and the samples are determined by processing a corpus of text using an additional machine learning model. 10. The method as described in claim 9 , wherein the samples include a respective said sensitive sample based on an error rate associated with the additional machine learning model. 11. The method as described in claim 9 , wherein the additional machine learning model includes a named entity recognition model. 12. The method as described in claim 8 , wherein the machine learning model includes at least one of a Long Short Term Memory model, a Bidirectional Encoder Representations from Transformers model, or a Generative Pretrained Transformer 2 model. 13. The method as described in claim 8 , wherein the sensitive samples and the samples are sentences or paragraph included in a corpus of text. 14. A computing device comprising: a processing device; and a computer-readable storage medium storing instructions that, responsive to execution by the processing device, causes the processing device to perform operations including: forming client-content adjacent databases that include a client database and a sensitive contents database, the client-content adjacent databases differing in that a client and a sensitive entity are present a corresponding database of the client-content adjacent databases and are not present in another database of the client-content adjacent databases, the forming including: removing samples associated with a client of a plurality of clients from the respective database of the client-content adjacent databases; and removing sensitive samples associated with a particular instance of sensitive content of a plurality of sensitive content regardless of client association from the respective database of the client-content adjacent databases; identifying a set of clients from the plurality of clients from the client-content adjacent databases; identifying a set of sensitive samples from the plurality of sensitive content from the client-content adjacent databases; generating training data by applying one or more differential privacy techniques to the samples associated with the set of clients or the set of sensitive samples; and training a machine learning model using the training data by a loss function using an aggregated gradient that is aggregated across the plurality of clients and the plurality of sensitive content, the training including injecting Gaussian noise and updating parameters of the machine learning model. 15. The computing device as described in claim 14 , wherein the sensitive samples and the samples are determined by processing a corpus of text using an additional machine learning model. 16. The computing device as described in claim 15 , wherein the samples include a respective said sensitive sample based on an error rate associated with the additional machine learning model. 17. The computing device as described in claim 15 , wherein the additional machine learning model includes a named entity recognition model. 18. The computing device as described in claim 14 , wherein the machine learning model includes at least one of a Long Short Term Memory model, a Bidirectional Encoder Representations from Transformers model, or a Generative Pretrained Transformer 2 model. 19. The computing device as described in claim 14 , wherein the sensitive samples and the samples are sentences or paragraph included in a corpus of text.

Assignees

Adobe Inc

Inventors

Classifications

G06F40/274
Converting codes to words; Guess-ahead of partial word inputs · CPC title
G06F40/295Primary
Named entity recognition · CPC title
G06F40/30Primary
Semantic analysis · CPC title

Patent family

Related publications grouped by family.

View patent family 91281603

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12412038B2 cover?: In implementations of systems for training language models and preserving privacy, a computing device implements a privacy system to predict a next word after a last word in a sequence of words by processing input data using a machine learning model trained on training data to predict next words after last words in sequences of words. The training data describes a corpus of text associated with…
Who is the assignee on this patent?: Adobe Inc
What technology area does this patent fall under?: Primary CPC classification G06F40/295. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 09 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Interactive decoding of words from phoneme score distributions

Predictive text system

Secure Translation of Sensitive Content

Sensitive data redaction in memory dump

Modeling personal entities

Frequently asked questions