Maintaining a custodian directory by analyzing documents

US10013673B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10013673-B2
Application numberUS-201615155164-A
CountryUS
Kind codeB2
Filing dateMay 16, 2016
Priority dateJul 22, 2015
Publication dateJul 3, 2018
Grant dateJul 3, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer processor may extract identity information from a document. The identity information may include at least one custodian identity attribute. After extracting the identity information, the computer processor may determine that the identity information is associated with a specific custodian. The computer processor may then search for the custodian identity attribute in a custodian directory to determine whether the custodian directory contains an entry for the custodian. If the custodian is not in the custodian directory, the computer processor may create a new entry in the custodian directory for the custodian and store the extracted identity information in the new entry.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method for maintaining a custodian directory, the method comprising: extracting, by a processor, identity information from a document, the identity information including a custodian identity attribute; determining that the identity information is associated with a first custodian; determining whether the first custodian is in a custodian directory by searching for the custodian identity attribute in the custodian directory; creating, in response to determining that the first custodian is not in the custodian directory, a new entry for the first custodian in the custodian directory, the new entry including the identity information; updating, in response to determining that the first custodian is in the custodian directory, an entry for the first custodian in the custodian directory using the extracted identity information; and carrying out a cleanup of the custodian directory by: identifying two or more entries in the custodian directory that have at least one matching custodian identity attribute; determining a weighting factor for each field in the custodian directory, wherein the weighting factor for each respective field is based on a likelihood that the custodian identity attribute for the respective field is unique to a single custodian; generating a relationship score for the two or more entries by comparing the identity information in the two or more entries and using the weighting factors, the relationship score being a numeric score that indicates a level of similarity between the two or more entries; determining that the relationship score exceeds a confidence threshold; determining, based on the relationship score exceeding the confidence threshold, that all of the two or more entries in the custodian directory relate to a particular custodian; and merging, in response to determining that all of the two or more entries relate to the particular custodian, the two or more entries in the custodian directory. 2. The method of claim 1 , wherein the custodian identity attribute is a custodian ID, the custodian ID being associated with the first custodian, and wherein the determining whether the first custodian is in the custodian directory comprises searching for the custodian ID in the custodian directory. 3. The method of claim 1 , wherein the determining whether the first custodian is in a custodian directory by searching for the custodian identity attribute in the custodian directory comprises: searching for the custodian identity attribute in the custodian directory; identifying an entry in the custodian directory relating to a second custodian, the second custodian being a potential match to the first custodian; determining a relationship score between the first custodian and the second custodian by comparing the identity information for the first custodian to the identity information for the second custodian; and determining that the relationship score exceeds a confidence threshold. 4. The method of claim 1 , wherein the custodian identity attribute is one or more selected from a group consisting of a custodian ID, an email address, a name, a residential address, a work address, and a phone number. 5. The method of claim 1 , wherein the document is an email. 6. The method of claim 1 , wherein the document is a word processing document. 7. The method of claim 1 , wherein extracting identity information from a document comprises extracting metadata from the document. 8. The method of claim 1 , wherein extracting identity information from a document comprises extracting identity information from a content of the document using an information extraction method. 9. The method of claim 1 , wherein extracting identity information includes utilizing natural language processing. 10. The method of claim 1 , wherein the identity information is a name, and wherein the identifying two or more entries that relate to a particular custodian comprises: identifying a first name in a first entry in the custodian directory; identifying a second name in a second entry in the custodian directory; determining that the first name is an alternative name for the second name. 11. The method of claim 1 , the method further comprising: identifying a first entry in the custodian directory; determining, using information in the custodian directory, that the first entry corresponds to a customer; and transmitting, in response to determining that the first entry corresponds to the customer, the first entry to a customer relationship management (CRM) system. 12. The method of claim 1 , the method further comprising determining whether the entry for the first custodian in the custodian directory needs to be updated by: comparing two or more custodian identity attributes extracted from the document to values in corresponding fields in the entry in the custodian directory; determining, based on comparing the two or more custodian identity attributes to the values in corresponding fields in the custodian directory, that at least one custodian identity attribute extract from the document does not match a corresponding value; determining that the at least one custodian identity attribute is not an equivalent value to the corresponding value; and determining, based on the at least one custodian identity attribute not being an equivalent value to the corresponding value, that the entry needs to be updated. 13. The method of claim 12 , wherein updating the entry using the identity information comprises: creating a new field in the custodian directory for the entry; storing the corresponding value in the new field; and overwriting the corresponding value with the at least one custodian identity attribute. 14. The method of claim 1 , wherein extracting the identity information includes extracting information from a body of the document using natural language processing and extracting information from metadata of the document, wherein the identity information includes a second custodian identity attribute extracted from the metadata and a third custodian identity attribute extracted from the body of the document, the method further comprising: determining that the second custodian identity attribute is associated with a second custodian; determining, based on a field of the metadata where the second custodian identity attribute was extracted from and a location in the body of the document that the third custodian identity attribute was extracted from, that the second custodian identity attribute and the third custodian identity attribute are associated with the same custodian; searching for the second custodian identity attribute in the custodian directory; determining, based on the searching for the second custodian identity attribute, that an existing entry exists for the second custodian in the custodian directory; determining a type of custodian identity attribute for the third custodian identity attribute; comparing the third custodian identity attribute to a corresponding field in the existing entry using the type of custodian identity attribute; determining, based on comparing the third custodian identity attribute to the corresponding field, that the third custodian identity attribute does not match a value stored in the corresponding field; and updating, in response to determining that the third custodian identity attribute does not match the value stored in the corresponding field, the existing entry for the second custodian by storing the third custodian identity attribute in the custodian directory. 15. The method of claim 1 , wherein extractin

Assignees

Inventors

Classifications

  • Document management systems · CPC title

  • Computer-aided management of electronic mailing [e-mailing] · CPC title

  • G06Q10/105Primary

    Human resources · CPC title

  • File access structures, e.g. distributed indices (arrangements of input from, or output to, record carriers G06F3/06) · CPC title

  • Parsing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10013673B2 cover?
A computer processor may extract identity information from a document. The identity information may include at least one custodian identity attribute. After extracting the identity information, the computer processor may determine that the identity information is associated with a specific custodian. The computer processor may then search for the custodian identity attribute in a custodian dire…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06Q10/105. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 03 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).