Systems and methods for a privacy preserving text representation learning framework
US-2021342546-A1 · Nov 4, 2021 · US
US11907666B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11907666-B2 |
| Application number | US-202117527632-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 16, 2021 |
| Priority date | Nov 16, 2020 |
| Publication date | Feb 20, 2024 |
| Grant date | Feb 20, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Various embodiments of a system and associated method for anonymization of text without losing semantic utility of text by extracting a latent embedding representation of content with respect to a given task and by learning an optimal strategy for text embedding manipulation to satisfy both privacy and utility requirements are disclosed herein. In particular, the system balances private attribute obfuscation with retained semantic utility.
Opening claim text (preview).
The invention claimed is: 1. A system, comprising: a processor in communication with a memory, the memory including instructions which, when executed, cause the processor to: iteratively apply, by a computer-implemented learning agent, an action to a randomly selected text embedding representation of a textual document based on a current state of the computer-implemented learning agent and an expected return; and iteratively update a reward value for a new state, wherein the reward value is determined based on a private attribute confidence of a private attribute classifier component and a utility confidence of a utility classifier component; wherein the computer-implemented learning agent seeks to apply the action to the randomly selected text embedding such that the reward value is maximized as the private attribute confidence is minimized and the utility confidence is maximized. 2. The system of claim 1 , wherein the memory further includes instructions, which, when executed, cause the processor to: encode the textual document into the text embedding representation with respect to a given task. 3. The system of claim 2 , wherein the memory further includes instructions, which, when executed, cause the processor to: exchange each word of a plurality of words within the textual document with a corresponding word vector of a plurality of word vectors; read, by a gated recurrent unit of a recurrent neural network, the plurality of word vectors representative of the textual document in a first direction and a second direction to obtain a first hidden state taken from the first direction and a second hidden state taken from the second direction; concatenate the first and second hidden states to generate an initial context vector representative of the textual document. 4. The system of claim 3 , wherein the memory further includes instructions, which, when executed, cause the processor to: apply a location-based attention layer to each element in the initial context vector to produce an attention-weighted context vector of the text embedding representation. 5. The system of claim 1 , wherein the action comprises manipulating the text embedding representation to obfuscate information within the text embedding representation. 6. The system of claim 1 , wherein the memory further includes instructions, which, when executed, cause the processor to: select, by the computer-implemented learning agent, an action given a current state based on a Q-function representative of an expected return based on the current state and the action. 7. The system of claim 6 , wherein the memory further includes instructions, which, when executed, cause the processor to: learn a set of parameters of the Q-function using a Deep Q-learning method. 8. The system of claim 1 , wherein the memory further includes instructions, which, when executed, cause the processor to: apply a private attribute classifier component to the text embedding representation; and obtain the private attribute confidence corresponding to a confidence of recovered private attributes from the text embedding representation following application of the private attribute classifier component to the text embedding representation; wherein the private attribute classifier component is configured to learn a classifier that accurately identifies private information within the text embedding representation. 9. The system of claim 1 , wherein the memory further includes instructions, which, when executed, cause the processor to: apply a utility classifier component to the text embedding representation, wherein the utility classifier component is configured to assess a quality of the semantic meaning of the text embedding representation; and obtain the utility confidence corresponding to a confidence of recovered semantic meaning from the text embedding representation. 10. The system of claim 1 , wherein the memory further includes instructions, which, when executed, cause the processor to: obtain a final text embedding representation that maximizes the reward function, wherein the final text embedding anonymizes private attribute information and preserves semantic meaning within the textual document. 11. A method, comprising: iteratively applying, by a computer-implemented learning agent, an action to a randomly selected text embedding representation of a textual document based on a current state of the computer-implemented learning agent and an expected reward; and iteratively updating a reward value for a new state, wherein the reward value is determined based on a private attribute confidence of a private attribute classifier component and a utility confidence of a utility classifier component; wherein the computer-implemented learning agent seeks to apply the action to the randomly selected text embedding such that the reward value is maximized as the private attribute confidence is minimized and the utility confidence is maximized. 12. The method of claim 11 , further comprising: encoding the textual document into the text embedding representation with respect to a given task. 13. The method of claim 12 , further comprising: exchanging each word of a plurality of words within the textual document with a corresponding word vector of a plurality of word vectors; reading, by a gated recurrent unit of a recurrent neural network, the plurality of word vectors representative of the textual document in a first direction and a second direction to obtain a first hidden state taken from the first direction and a second hidden state taken from the second direction; concatenating the first and second hidden states to generate an initial context vector representative of the textual document. 14. The method of claim 13 , further comprising: applying a location-based attention layer to each element in the initial context vector to produce an attention-weighted context vector of the text embedding representation. 15. The method of claim 11 , wherein the action comprises manipulating the text embedding representation to obfuscate information within the text embedding representation. 16. The method of claim 11 , wherein the computer-implemented learning agent selects an action given a current state based on a Q-function representative of an expected reward based on the current state and the action. 17. The method of claim 16 , further comprising: learning a set of parameters of the Q-function using a Deep Q-learning method. 18. The method of claim 11 , further comprising: applying a private attribute classifier component to the text embedding representation; and obtaining the private attribute confidence corresponding to a confidence of recovered private attributes from the text embedding representation following application of the private attribute classifier component to the text embedding representation; wherein the private attribute classifier component is configured to learn a classifier that accurately identifies private information within the text embedding representation. 19. The method of claim 11 , further comprising: applying a utility classifier component to the text embedding representation, wherein the utility classifier component is configured to assess a quality of the semantic meaning of the text embedding representation; and obtaining the utility confidence corresponding to a confidence of recovered semantic meaning from the text embedding representation. 20. The method of claim 11 , further comprising: obtaining a final text em
Semantic analysis · CPC title
Validation; Performance evaluation; Active pattern learning techniques · CPC title
Machine learning · CPC title
using neural networks · CPC title
Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.