Maintaining a custodian directory by analyzing documents
US-2017024697-A1 · Jan 26, 2017 · US
US10891591B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10891591-B2 |
| Application number | US-201815955249-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 17, 2018 |
| Priority date | Jul 22, 2015 |
| Publication date | Jan 12, 2021 |
| Grant date | Jan 12, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer processor may extract identity information from a document. The identity information may include at least one custodian identity attribute. After extracting the identity information, the computer processor may determine that the identity information is associated with a specific custodian. The computer processor may then search for the custodian identity attribute in a custodian directory to determine whether the custodian directory contains an entry for the custodian. If the custodian is not in the custodian directory, the computer processor may create a new entry in the custodian directory for the custodian and store the extracted identity information in the new entry.
Opening claim text (preview).
What is claimed is: 1. A system for maintaining a custodian directory, the system comprising: a memory; a processor communicatively coupled to the memory, wherein the processor is configured to perform a method comprising: identifying two or more entries in a custodian directory that have at least one matching custodian identity attribute; generating a relationship score for two or more entries, the relationship score being a score that indicates a level of similarity between the two or more entries; determining, based on the relationship score satisfying a confidence threshold, that the two or more entries in the custodian directory relate to a same custodian, wherein the confidence threshold is a minimum score that the two or more entries have to obtain for the two or more entries to be merged; and merging, in response to determining that all of the two or more entries relate to the same custodian, the two or more entries in the custodian directory. 2. The system of claim 1 , wherein generating the relationship score includes determining a weighting factor for each field in the custodian directory. 3. The system of claim 2 , wherein generating the relationship score further includes: identifying a first plurality of custodian identity attributes for a first entry in the custodian directory; identifying a second plurality of custodian identity attributes for a second entry in the custodian directory; comparing each custodian identity attribute in the first plurality of custodian identity attributes to a corresponding custodian identity attribute in the second plurality of custodian identity attributes to generate a score for each custodian identity attribute; weighting each score using the weighting factors; and generating the relationship score for the two or more entries using the weighted scores. 4. The system of claim 2 , wherein the weighting factor for each respective field is based on a likelihood that the custodian identity attribute for the respective field is unique to a single custodian. 5. The system of claim 1 , wherein generating the relationship score comprises: identifying a first plurality of custodian identity attributes for a first entry in the custodian directory; identifying a second plurality of custodian identity attributes for a second entry in the custodian directory; and comparing each custodian identity attribute in the first plurality of custodian identity attributes to a corresponding custodian identity attribute in the second plurality of custodian identity attributes. 6. The system of claim 4 , wherein the custodian identity attributes are compared using fuzzy logic matching. 7. The system of claim 1 , wherein the custodian identity attribute is a name, and wherein identifying two or more entries in a custodian directory that have at least one matching custodian identity attribute comprises: identifying a first name in a first entry in the custodian directory; identifying a second name in a second entry in the custodian directory; determining that the first name is an alternative name for the second name. 8. The system of claim 1 , wherein identifying two or more entries in a custodian directory that have at least one matching custodian identity attribute comprises: identifying a first set of fields in the custodian directory, the first set of fields including fields with information that is unique to individual custodians; identifying a second set of fields in the custodian directory, the second set of fields including fields with information that is not unique to individual custodians; and comparing custodian identity attributes in the first set of fields for a first entry to corresponding custodian identity attributes for a second entry. 9. The system of claim 8 , wherein the first set of fields includes one or more selected from a group consisting of residential addresses, email addresses, and mobile phone numbers. 10. The system of claim 8 , wherein the second set of fields includes work addresses. 11. The system of claim 1 , wherein generating a relationship score for the two or more entries includes: identifying a critical field; determining that the custodian identity attributes associated with the critical field for the two or more entries do not match; and assigning a relationship score that does not satisfy the confidence threshold in response to determining that the custodian identities attributes associated with the critical field for the two or more entries do not match. 12. The system of claim 1 , wherein the confidence threshold is based on historical data relating to custodian directory merges. 13. The system of claim 1 , wherein merging the two or more entries in the custodian directory comprises: determining that a first custodian identity attribute in a first merged entry does not match a second custodian identity attribute for a second merged entry, the first and second custodian identity attributes corresponding to the same field in the custodian directory; determining to keep the first custodian identity attribute; and storing the first custodian identity attribute in a merged entry. 14. The system of claim 13 , wherein determining to keep the first custodian identity attribute includes determining that the first custodian identity attribute was added to the custodian directory more recently than the second custodian identity attribute. 15. The system of claim 13 , wherein merging the two or more entries in the custodian directory further comprises: deleting the second custodian identity attribute. 16. The system of claim 13 , wherein merging the two or more entries in the custodian directory further comprises: determining to keep the second custodian identity attribute; generating a new field in the custodian directory; and storing the second custodian identity attribute for the second merged entry in the new field in the custodian directory. 17. A system for maintaining a custodian directory, the system comprising: a memory; a processor communicatively coupled to the memory, wherein the processor is configured to perform a method comprising: identifying two or more entries in a custodian directory that have at least one matching custodian identity attribute; generating a relationship score for two or more entries, the relationship score being a score that indicates a level of similarity between the two or more entries; determining, based on the relationship score satisfying a confidence threshold, that the two or more entries in the custodian directory relate to a same custodian; and merging, in response to determining that all of the two or more entries relate to the same custodian, the two or more entries in the custodian directory, wherein merging the two or more entries in the custodian directory comprises: determining that a first custodian identity attribute in a first merged entry does not match a second custodian identity attribute for a second merged entry, the first and second custodian identity attributes corresponding to the same field in the custodian directory; determining to keep the first custodian identity attribute; and storing the first custodian identity attribute in a merged entry. 18. The system of claim 17 , wherein determining to keep the first custodian identity attribute includes determining that the first custodian identity attribute was added to the custodian directory more recently than the second custodian identity attribute. 19. The system of claim 17 , wherein merging the two or more entrie
Address books, i.e. directories containing contact information about correspondents (telephone directories in user terminals H04M1/27453) · CPC title
Computer-aided management of electronic mailing [e-mailing] · CPC title
File access structures, e.g. distributed indices (arrangements of input from, or output to, record carriers G06F3/06) · CPC title
Parsing · CPC title
Document management systems · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.