Privacy preserving document analysis

US11689507B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11689507-B2
Application numberUS-201916695636-A
CountryUS
Kind codeB2
Filing dateNov 26, 2019
Priority dateNov 26, 2019
Publication dateJun 27, 2023
Grant dateJun 27, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and techniques for privacy preserving document analysis are described that derive insights pertaining to a digital document without communication of the content of the digital document. To do so, the privacy preserving document analysis techniques described herein capture visual or contextual features of the digital document and creates a stamp representation that represents these features without included the content of the digital document. The stamp representation is projected into a stamp embedding space based on a stamp encoding model generated through machine learning techniques capturing feature patterns and interaction in the stamp representations. The stamp encoding model exploits these feature interactions to define similarity of source documents based on location within the stamp embedding space. Accordingly, the techniques described herein can determine a similarity of documents without having access to the documents themselves.

First claim

Opening claim text (preview).

What is claimed is: 1. In a digital medium environment for privacy preserving document analysis, a method implemented by at least one computing device, the method comprising: populating, by the at least one computing device, a stamp embedding space by processing a plurality of stamp representations with a trained stamp encoding model to create a plurality of stamp embeddings, each respective one of the plurality of stamp representations corresponding to a respective source document and containing information derived from the respective source document without containing text or images of the respective source document, wherein the plurality of stamp embeddings characterizes features of the plurality of stamp representations in a plurality of numerical values; generating, by the at least one computing device, a plurality of clusters within the stamp embedding space based on locations of the plurality of stamp embeddings within the stamp embedding space, wherein the locations of the plurality of stamp embeddings are based on the plurality of numerical values; receiving, by the at least one computing device, an additional stamp representation; projecting, by the at least one computing device, the additional stamp representation into the stamp embedding space by processing the additional stamp representation with the stamp encoding model to create an additional stamp embedding; and comparing, by the at least one computing device, a location of the additional stamp embedding within the stamp embedding space with the plurality of clusters for use in deriving insights pertaining to a document corresponding to the additional stamp representation. 2. The method of claim 1 , wherein the receiving, projecting, and comparing is performed for a plurality of additional stamp representations. 3. The method of claim 2 , wherein the plurality of stamp representations are associated with a document corpus, the plurality of additional stamp representations are received from a plurality of client devices, and further comprising: determining a first document distribution with respect to the plurality of clusters for the stamp embeddings associated with the plurality of stamp representations; determining a second document distribution with respect to the plurality of clusters for the stamp embeddings associated with the plurality of additional stamp representations; and adjusting the documents included in the document corpus based on the first and second document distributions. 4. The method of claim 3 , wherein the adjusting includes identifying an out-of-distribution document of a type and adding at least one document of the type to the document corpus. 5. The method of claim 1 , further comprising retrieving, based on the plurality of clusters, at least one stamp representation of the plurality of stamp representations based on a similarity in the stamp embedding space to the additional stamp representation. 6. The method of claim 5 , further comprising retrieving the source document corresponding to the at least one stamp representation, and outputting the source document for display in a user interface. 7. The method of claim 1 , further comprising determining, based on the plurality of clusters, a probability of retention of a customer associated with the additional stamp representation. 8. The method of claim 1 , wherein the stamp embedding space is configured to represent similarity of documents based on user experience, and further comprising predicting a user satisfaction of a user associated with the additional stamp representation based on the plurality of clusters. 9. The method of claim 1 , further comprising determining, based on the plurality of clusters, expectations of a user associated with the additional stamp representation, and tracking the expectations over time. 10. A system comprising: a processor; and computer-readable storage media having stored instructions that, responsive to execution by the processor, cause the processor to perform operations including: populating a stamp embedding space by processing a plurality of stamp representations with a trained stamp encoding model to create a plurality of stamp embeddings, each respective one of the plurality of stamp representations corresponding to a respective source document and containing information derived from the respective source document without containing text or images of the respective source document, wherein the plurality of stamp embeddings characterizes features of the plurality of stamp representations in a plurality of numerical values; generating a plurality of clusters within the stamp embedding space based on locations of the plurality of stamp embeddings within the stamp embedding space, wherein the locations of the plurality of stamp embeddings are based on the plurality of numerical values; receiving an additional stamp representation; projecting the additional stamp representation into the stamp embedding space by processing the additional stamp representation with the stamp encoding model to create an additional stamp embedding; and comparing a location of the additional stamp embedding within the stamp embedding space with the plurality of clusters for use in deriving insights pertaining to a document corresponding to the additional stamp representation. 11. The system of claim 10 , wherein the receiving, projecting, and comparing is performed for a plurality of additional stamp representations. 12. The system of claim 11 , wherein the plurality of stamp representations are associated with a document corpus, the plurality of additional stamp representations are received from a plurality of client devices, and further comprising: determining a first document distribution with respect to the plurality of clusters for the stamp embeddings associated with the plurality of stamp representations; determining a second document distribution with respect to the plurality of clusters for the stamp embeddings associated with the plurality of additional stamp representations; and adjusting the documents included in the document corpus based on the first and second document distributions. 13. The system of claim 12 , wherein the adjusting includes identifying an out-of-distribution document of a type and adding at least one document of the type to the document corpus. 14. The system of claim 10 , further comprising retrieving, based on the plurality of clusters, at least one stamp representation of the plurality of stamp representations based on a similarity in the stamp embedding space to the additional stamp representation. 15. The system of claim 14 , further comprising retrieving the source document corresponding to the at least one stamp representation, and outputting the source document for display in a user interface. 16. The system of claim 10 , further comprising determining, based on the plurality of clusters, a probability of retention of a customer associated with the additional stamp representation. 17. The system of claim 10 , wherein the stamp embedding space is configured to represent similarity of documents based on user experience, and further comprising predicting a user satisfaction of a user associated with the additional stamp representation based on the plurality of clusters. 18. The system of claim 10 , further comprising determining, based on the plurality of clusters, expectations of a user associated with the additional stamp representation, and tracking the expectations for an amount of time. 19. One or more computer-readable stora

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • Market predictions or forecasting for commercial activities · CPC title

  • Inference or reasoning models · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11689507B2 cover?
Systems and techniques for privacy preserving document analysis are described that derive insights pertaining to a digital document without communication of the content of the digital document. To do so, the privacy preserving document analysis techniques described herein capture visual or contextual features of the digital document and creates a stamp representation that represents these featu…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06Q30/0202. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 27 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).