Automatic electronic message content extraction method and apparatus

US10977289B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10977289-B2
Application numberUS-201916272285-A
CountryUS
Kind codeB2
Filing dateFeb 11, 2019
Priority dateFeb 11, 2019
Publication dateApr 13, 2021
Grant dateApr 13, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed are systems and methods for improving interactions with and between computers in electronic messaging, and other, systems supported by or configured with personal computing devices, servers and/or platforms. The systems interact to identify and retrieve data within or across platforms, which can be used to improve the quality of data used in processing interactions between or among processors in such systems. The disclosed systems and methods provide systems and methods for automatically generating data extraction rules, which can then be used to automatically extract data from electronic messages.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: receiving, at a computing device, a data extraction rule creation request in connection with a corpus of electronic messages; retrieving, via the computing device, the corpus of electronic messages; assigning, via the computing device, each electronic message, of the corpus, to one cluster of a plurality of clusters, a cluster of the plurality having an assigned set of electronic messages, from the corpus of electronic messages, the cluster having an associated set of XPATH expressions shared by each electronic message in the assigned set, each XPATH expression of the set of XPATH expressions leading to a textual leaf in a DOM tree of each electronic message in the assigned set, each XPATH expression having a corresponding value correspond to each of the electronic messages in the assigned set; determining, via the computing device and for the cluster of the plurality, that the corresponding value for each of a number of the XPATH expressions, in the set of XPATH expressions associated with the cluster, comprises a variable value, such that for an XPATH expression, of the number, the variable value indicates that at least a portion of the corresponding value of the XPATH expression varies across the number of electronic messages in the assigned set; automatically determining, via the computing device and for each XPATH expression of the number of XPATH expressions associated with the cluster, an annotation, for the variable value, the determining the annotation comprising analyzing the variable value corresponding to multiple electronic messages of the number of electronic messages to determine the annotation; automatically refining, via the computing device and for each XPATH expression of the number of XPATH expressions associated with the cluster, the annotation, the refining for an XPATH expression of the number comprising using the corresponding value of at least one other XPATH expression of the set to determine a meaning of the variable value of the XPATH expression and using the meaning of the variable value to refine the annotation associated with the XPATH expression; and automatically generating, via the computing device and for the cluster, a data extraction rule, the automatically-generated data extraction rule comprising the annotation associated with each XPATH expression having a variable value; automatically extracting, via the computing device and using the automatically-generated data extraction rule, data from another electronic message; the data extracted from the other electronic message comprising the variable value of each XPATH expression of the number, automatically extracting the data comprising associating the variable value for an XPATH expression, of the number, with the associated annotation providing the meaning of the variable value; and communicating, via the computing device, the automatically extracted data to an entity. 2. The method of claim 1 , the communicating further comprising communicating at least some of the extracted data and associated annotations to a client application for display in an electronic message information summary display at the user device. 3. The method of claim 1 , wherein each electronic message in the assigned set shares a common sender domain. 4. The method of claim 1 , determining the annotation for the variable value of an XPATH expression of the number further comprising: for each electronic message of the multiple electronic messages, searching a dictionary, of a number of dictionaries, using at least a portion of the variable value, from the electronic message, as a query, each dictionary of the number having an associated annotation; determining that the at least a portion of the variable value associated with the multiple messages is found in the dictionary, the determining further comprising determining that a threshold confidence level is satisfied; and using the associated annotation of the dictionary containing the at least a portion of the variable as the annotation for the variable value. 5. The method of claim 1 , determining the annotation for the variable value of an XPATH expression of the number further comprising: determining, using at least one pattern recognition analyzer, a pattern of at least a portion of the variable; and using an associated annotation of the pattern as the annotation for the variable value. 6. The method of claim 1 , the refining further comprising using a set of annotations associated with the cluster. 7. The method of claim 1 , automatically refining the annotation associated with the XPATH expression further comprising: generating, via the computing device, training data across the set of XPATH expressions associated with the cluster, the training data comprising a plurality of training examples, each training example comprising a set of features and corresponding feature values; training, via the computing device and using machine learning, an annotation refinement model for use in refining the annotation; generating, via the computing device, a set of features for the annotation; and the refining further comprising using the annotation refinement model trained with the set of features generated for the annotation to refine the annotation associated with the XPATH expression. 8. The method of claim 1 , automatically extracting data from the electronic message further comprising, for an XPATH expression of the number: using the XPATH expression to retrieve the corresponding value for the other electronic message. 9. The method of claim 8 , further comprising: determining that, in addition to the variable value, the corresponding value, of the XPATH expression, of the number, comprises a constant portion that remains constant across the electronic messages in the assigned set; and retrieving the variable value from the corresponding value of the other electronic message. 10. The method of claim 1 , further comprising: determining, via the computing device and for an electronic message of the corpus, a digital signature based on the set of XPATH expressions, each XPATH expression of the set leading to a textual leaf in the DOM tree of the electronic message; and the assigning further comprising using the digital signature of the electronic message in assigning the electronic message to a cluster of the plurality, such that a common digital signature is shared by each electronic message in the assigned set of the cluster. 11. The method of claim 10 , further comprising: before automatically extracting data from the other electronic message, determining that the other electronic message belongs to the cluster, the determining comprising: determining, via the computing device and for the other electronic message, a digital signature based on the set of XPATH expressions, each XPATH expression of the set leading to a textual leaf in the DOM tree of the other electronic message; and determining, via the computing device, that the digital signature determined for the other electronic message matches the common digital signature shared by each electronic message in the assigned set of the cluster. 12. The method of claim 11 , determining that the other electronic message belongs to the cluster further comprising: determining that a sender domain of the other electronic message matches a sender domain of each electronic message in the assigned set of the cluster. 13. The method of claim 1 , the communicating further comprising communicating at least some of the extracted data and each associated annotation to a search engine for use in generating a search index for use in ele

Assignees

Inventors

Classifications

  • G06F16/345Primary

    Summarisation for human users · CPC title

  • G06F16/35Primary

    Clustering; Classification · CPC title

  • Querying · CPC title

  • Inference or reasoning models · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10977289B2 cover?
Disclosed are systems and methods for improving interactions with and between computers in electronic messaging, and other, systems supported by or configured with personal computing devices, servers and/or platforms. The systems interact to identify and retrieve data within or across platforms, which can be used to improve the quality of data used in processing interactions between or among pr…
Who is the assignee on this patent?
Verizon Media Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/345. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 13 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).