Creating and using triplet representations to assess similarity between job description documents
US-2019197482-A1 · Jun 27, 2019 · US
US11689507B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11689507-B2 |
| Application number | US-201916695636-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 26, 2019 |
| Priority date | Nov 26, 2019 |
| Publication date | Jun 27, 2023 |
| Grant date | Jun 27, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and techniques for privacy preserving document analysis are described that derive insights pertaining to a digital document without communication of the content of the digital document. To do so, the privacy preserving document analysis techniques described herein capture visual or contextual features of the digital document and creates a stamp representation that represents these features without included the content of the digital document. The stamp representation is projected into a stamp embedding space based on a stamp encoding model generated through machine learning techniques capturing feature patterns and interaction in the stamp representations. The stamp encoding model exploits these feature interactions to define similarity of source documents based on location within the stamp embedding space. Accordingly, the techniques described herein can determine a similarity of documents without having access to the documents themselves.
Opening claim text (preview).
What is claimed is: 1. In a digital medium environment for privacy preserving document analysis, a method implemented by at least one computing device, the method comprising: populating, by the at least one computing device, a stamp embedding space by processing a plurality of stamp representations with a trained stamp encoding model to create a plurality of stamp embeddings, each respective one of the plurality of stamp representations corresponding to a respective source document and containing information derived from the respective source document without containing text or images of the respective source document, wherein the plurality of stamp embeddings characterizes features of the plurality of stamp representations in a plurality of numerical values; generating, by the at least one computing device, a plurality of clusters within the stamp embedding space based on locations of the plurality of stamp embeddings within the stamp embedding space, wherein the locations of the plurality of stamp embeddings are based on the plurality of numerical values; receiving, by the at least one computing device, an additional stamp representation; projecting, by the at least one computing device, the additional stamp representation into the stamp embedding space by processing the additional stamp representation with the stamp encoding model to create an additional stamp embedding; and comparing, by the at least one computing device, a location of the additional stamp embedding within the stamp embedding space with the plurality of clusters for use in deriving insights pertaining to a document corresponding to the additional stamp representation. 2. The method of claim 1 , wherein the receiving, projecting, and comparing is performed for a plurality of additional stamp representations. 3. The method of claim 2 , wherein the plurality of stamp representations are associated with a document corpus, the plurality of additional stamp representations are received from a plurality of client devices, and further comprising: determining a first document distribution with respect to the plurality of clusters for the stamp embeddings associated with the plurality of stamp representations; determining a second document distribution with respect to the plurality of clusters for the stamp embeddings associated with the plurality of additional stamp representations; and adjusting the documents included in the document corpus based on the first and second document distributions. 4. The method of claim 3 , wherein the adjusting includes identifying an out-of-distribution document of a type and adding at least one document of the type to the document corpus. 5. The method of claim 1 , further comprising retrieving, based on the plurality of clusters, at least one stamp representation of the plurality of stamp representations based on a similarity in the stamp embedding space to the additional stamp representation. 6. The method of claim 5 , further comprising retrieving the source document corresponding to the at least one stamp representation, and outputting the source document for display in a user interface. 7. The method of claim 1 , further comprising determining, based on the plurality of clusters, a probability of retention of a customer associated with the additional stamp representation. 8. The method of claim 1 , wherein the stamp embedding space is configured to represent similarity of documents based on user experience, and further comprising predicting a user satisfaction of a user associated with the additional stamp representation based on the plurality of clusters. 9. The method of claim 1 , further comprising determining, based on the plurality of clusters, expectations of a user associated with the additional stamp representation, and tracking the expectations over time. 10. A system comprising: a processor; and computer-readable storage media having stored instructions that, responsive to execution by the processor, cause the processor to perform operations including: populating a stamp embedding space by processing a plurality of stamp representations with a trained stamp encoding model to create a plurality of stamp embeddings, each respective one of the plurality of stamp representations corresponding to a respective source document and containing information derived from the respective source document without containing text or images of the respective source document, wherein the plurality of stamp embeddings characterizes features of the plurality of stamp representations in a plurality of numerical values; generating a plurality of clusters within the stamp embedding space based on locations of the plurality of stamp embeddings within the stamp embedding space, wherein the locations of the plurality of stamp embeddings are based on the plurality of numerical values; receiving an additional stamp representation; projecting the additional stamp representation into the stamp embedding space by processing the additional stamp representation with the stamp encoding model to create an additional stamp embedding; and comparing a location of the additional stamp embedding within the stamp embedding space with the plurality of clusters for use in deriving insights pertaining to a document corresponding to the additional stamp representation. 11. The system of claim 10 , wherein the receiving, projecting, and comparing is performed for a plurality of additional stamp representations. 12. The system of claim 11 , wherein the plurality of stamp representations are associated with a document corpus, the plurality of additional stamp representations are received from a plurality of client devices, and further comprising: determining a first document distribution with respect to the plurality of clusters for the stamp embeddings associated with the plurality of stamp representations; determining a second document distribution with respect to the plurality of clusters for the stamp embeddings associated with the plurality of additional stamp representations; and adjusting the documents included in the document corpus based on the first and second document distributions. 13. The system of claim 12 , wherein the adjusting includes identifying an out-of-distribution document of a type and adding at least one document of the type to the document corpus. 14. The system of claim 10 , further comprising retrieving, based on the plurality of clusters, at least one stamp representation of the plurality of stamp representations based on a similarity in the stamp embedding space to the additional stamp representation. 15. The system of claim 14 , further comprising retrieving the source document corresponding to the at least one stamp representation, and outputting the source document for display in a user interface. 16. The system of claim 10 , further comprising determining, based on the plurality of clusters, a probability of retention of a customer associated with the additional stamp representation. 17. The system of claim 10 , wherein the stamp embedding space is configured to represent similarity of documents based on user experience, and further comprising predicting a user satisfaction of a user associated with the additional stamp representation based on the plurality of clusters. 18. The system of claim 10 , further comprising determining, based on the plurality of clusters, expectations of a user associated with the additional stamp representation, and tracking the expectations for an amount of time. 19. One or more computer-readable stora
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Market predictions or forecasting for commercial activities · CPC title
Inference or reasoning models · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.