Grouping documents based on document concepts

US10169353B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10169353-B1
Application numberUS-201514926719-A
CountryUS
Kind codeB1
Filing dateOct 29, 2015
Priority dateOct 30, 2014
Publication dateJan 1, 2019
Grant dateJan 1, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving at least one electronic document, and identifying one or more words, phrases, or patterns used within the electronic document and that are based on a lexicon. Mapping, using a concept library, the one or more words, phrases, or patterns to a concept intended to be conveyed by the one or more words, phrases, or patterns according to the lexicon. Generating concept data based on the mapping, and storing the concept data in association with data identifying the electronic document.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method executed by one or more processors, the method comprising: receiving at least one electronic document; identifying, by the one or more processors, one or more words, phrases, or patterns used within the electronic document, the one or more words, phrases, or patterns based on a lexicon; mapping, by the one or more processors using a concept library, the one or more words, phrases, or patterns to a concept intended to be conveyed by the one or more words, phrases, or patterns according to the lexicon, wherein the concept library comprises two or more lexicons each having a plurality of context entries, each context entry comprising a first set of one or more words, phrases, or patterns from a first lexicon and a second set of one or more different words, phrases, or patterns from a second lexicon, wherein both the first set and second set are mapped to at least one common concept intended to be conveyed by the respective sets of one or more words, phrases, or patterns; generating, by the one or more processors, concept data based on the mapping; and storing the concept data associated with the electronic document in a concept index comprising concept data associated with at least one other electronic document that conveys respective concepts according to a different lexicon from the lexicon of the at least one electronic document, the concept index comprising, for each electronic document represented in the concept index, an array of values where each value indicates a presence or absence of a given concept within a respective document irrespective of a particular lexicon used to convey the concept within the document. 2. The method of claim 1 , wherein the concept library is one of a product library, a product feature library, a line of business library, or a life events library. 3. The method of claim 1 , wherein the lexicon is one of a customer lexicon, a customer service representative lexicon, a legal staff lexicon, a marketing staff lexicon, or a technical staff lexicon. 4. The method of claim 1 , wherein the values in the array of values comprise binary data associated with each concept represented in the concept library, the binary data indicating that a respective concept is either present or not present in the respective electronic document. 5. The method of claim 1 , wherein the values in the array of values comprise a concept score for each concept represented in the concept index, each concept score indicating a frequency with each respective concept is conveyed in the respective electronic document. 6. The method of claim 1 , further comprising: determining, using the concept index, a set of concepts that are conveyed in both a first and a second electronic document based on comparing values from a concept data array associated with the first electronic document to values from a concept data array associated with the second electronic document; and in response to receiving a user request to compare the first and the second electronic document, providing, for display to the user, the set of concepts that are conveyed in both the first and the second electronic document. 7. The method of claim 1 , further comprising: determining, using the concept index, a set of concepts that are conveyed in a first electronic document but that are not conveyed in a second electronic document based on comparing values from a concept data array associated with the first electronic document to values from a concept data array associated with the second electronic document; and in response to receiving a user request to compare the first and the second electronic document, providing, for display to the user, the set of concepts that are conveyed in the first electronic document but that are not conveyed in the second electronic document. 8. A system comprising: at least one processor; and at least one data store coupled to the at least one processor having instructions stored thereon which, when executed by the at least one processor, causes the at least one processor to perform operations comprising to perform operations comprising: receiving at least one electronic document; identifying one or more words, phrases, or patterns used within the electronic document, the one or more words, phrases, or patterns based on a lexicon; mapping, using a concept library, the one or more words, phrases, or patterns to a concept intended to be conveyed by the one or more words, phrases, or patterns according to the lexicon, wherein the concept library comprises two or more lexicons each having a plurality of context entries, each context entry comprising a first set of one or more words, phrases, or patterns from a first lexicon and a second set of one or more different words, phrases, or patterns from a second lexicon, wherein both the first set and second set are mapped to at least one common concept intended to be conveyed by the respective sets of one or more words, phrases, or patterns; generating concept data based on the mapping; and storing the concept data associated with the electronic document in a concept index comprising concept data associated with at least one other electronic document that conveys respective concepts according to a different lexicon from the lexicon of the at least one electronic document, the concept index comprising, for each electronic document represented in the concept index, an array of values where each value indicates a presence or absence of a given concept within a respective document irrespective of a particular lexicon used to convey the concept within the document. 9. The system of claim 8 , wherein the concept library is one of a product library, a product feature library, a line of business library, or a life events library. 10. The system of claim 8 , wherein the lexicon is one of a customer lexicon, a customer service representative lexicon, a legal staff lexicon, a marketing staff lexicon, or a technical staff lexicon. 11. The system of claim 8 , wherein the values in the array of values comprise binary data associated with each concept represented in the concept library, the binary data indicating that a respective concept is either present or not present in the respective electronic document. 12. The system of claim 8 , wherein the values in the array of values comprise a concept score for each concept represented in the concept index, each concept score indicating a frequency with each respective concept is conveyed in the respective electronic document. 13. The system of claim 8 , wherein the operations further comprise: determining, using the concept index, a set of concepts that are conveyed in both a first and a second electronic document based on comparing values from a concept data array associated with the first electronic document to values from a concept data array associated with the second electronic document; and in response to receiving a user request to compare the first and the second electronic document, providing, for display to the user, the set of concepts that are conveyed in both the first and the second electronic document. 14. The system of claim 8 , wherein the one or more processors are further configured to perform operations comprising: determining, using the concept index, a set of concepts that are conveyed in a first electronic document but that are not conveyed in a second electronic document based on comparing values from a concept data array associated with the first electronic document to values from a concept data array associated with the second electronic document; and in response to receiving a user re

Assignees

Inventors

Classifications

  • G06F40/284Primary

    Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Natural language query formulation · CPC title

  • Document management systems · CPC title

  • Creation or modification of classes or clusters · CPC title

  • Query results presentation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10169353B1 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving at least one electronic document, and identifying one or more words, phrases, or patterns used within the electronic document and that are based on a lexicon. Mapping, using a concept library, the one or more words, phrases, or patterns to a concept intended to be conveyed by the one…
Who is the assignee on this patent?
Usaa
What technology area does this patent fall under?
Primary CPC classification G06F40/284. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 01 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).