Automatic electronic message content extraction method and apparatus

US11663259B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11663259-B2
Application numberUS-202117226746-A
CountryUS
Kind codeB2
Filing dateApr 9, 2021
Priority dateFeb 11, 2019
Publication dateMay 30, 2023
Grant dateMay 30, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed are systems and methods for improving interactions with and between computers in electronic messaging, and other, systems supported by or configured with personal computing devices, servers and/or platforms. The systems interact to identify and retrieve data within or across platforms, which can be used to improve the quality of data used in processing interactions between or among processors in such systems. The disclosed systems and methods provide systems and methods for automatically generating data extraction rules, which can then be used to automatically extract data from electronic messages.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: retrieving, via a computing device, a corpus of electronic messages, each electronic message of the corpus using an internal format comprising a plurality of expressions; assigning, via the computing device, each electronic message of the corpus to one cluster of a plurality of clusters, a cluster of the plurality having an assigned set of electronic messages from the corpus of electronic messages, the cluster having an associated set of expressions shared by each electronic message in the assigned set; identifying, via the computing device, a variable value in connection with a number of expressions in the set of expressions, the variable value indicating that at least a portion of a value corresponding to the number of expressions varies across the number of electronic messages; analyzing, via the computing device, the variable value corresponding to multiple electronic messages of the number of electronic messages to determine an annotation; automatically refining, via the computing device, the annotation, comprising: using a value corresponding to at least one other expression of the set to determine a meaning of the variable value; and using the meaning of the variable value to refine the annotation; automatically generating, via the computing device and for the cluster, a data extraction rule comprising the annotation associated with each expression having the variable value; automatically extracting, via the computing device and using the automatically-generated data extraction rule, data from another electronic message; and communicating, via the computing device, at least some of the automatically extracted data to an entity. 2. The method of claim 1 , communicating at least some of the automatically extracted data to an entity further comprising communicating at least some of the extracted data and associated annotations to a client application for display in an electronic message information summary display at a client device. 3. The method of claim 1 , analyzing the variable value corresponding to multiple electronic messages to determine the annotation for the variable value further comprising: for each electronic message of the multiple electronic messages, searching a dictionary, of a number of dictionaries, using at least a portion of the variable value from the electronic message as a query, each dictionary of the number having an associated annotation; determining that the at least a portion of the variable value associated with the multiple messages is found in the dictionary, the determining further comprising determining that a threshold confidence level is satisfied; and using the associated annotation of the dictionary containing the at least a portion of the variable value as the annotation for the variable value. 4. The method of claim 1 , analyzing the variable value corresponding to multiple electronic messages to determine the annotation further comprising: determining, using at least one pattern recognition analyzer, a pattern of at least a portion of the variable; and using an associated annotation of the pattern as the annotation for the variable value. 5. The method of claim 1 , automatically refining further comprising using a set of annotations associated with the cluster. 6. The method of claim 1 , automatically refining the annotation further comprising: generating, via the computing device, training data across the set of expressions associated with the cluster, the training data comprising a plurality of training examples, each training example comprising a set of features and corresponding feature values; training, via the computing device and using machine learning, an annotation refinement model for use in refining the annotation; generating, via the computing device, a set of features for the annotation; and the refining further comprising using the annotation refinement model trained with the set of features generated for the annotation to refine the annotation associated with the expression. 7. The method of claim 1 , automatically extracting data from the other electronic message further comprising, for an expression of the number: using the expression to retrieve the corresponding value for the other electronic message. 8. The method of claim 7 , further comprising: determining that, in addition to the variable value, the corresponding value of the expression of the number of expressions comprises a constant portion that remains constant across the electronic messages in the assigned set; and retrieving the variable value from the corresponding value of the other electronic message. 9. The method of claim 1 , further comprising: determining, via the computing device and for an electronic message of the corpus, a digital signature based on the set of expressions, each expression of the set leading to a textual leaf in a DOM tree of the electronic message; and the assigning further comprising using the digital signature of the electronic message in assigning the electronic message to a cluster of the plurality, such that a common digital signature is shared by each electronic message in the assigned set of the cluster. 10. The method of claim 9 , further comprising: before automatically extracting data from the other electronic message, determining that the other electronic message belongs to the cluster, the determining comprising: determining, via the computing device and for the other electronic message, a digital signature based on the set of expressions; and determining, via the computing device, that the digital signature determined for the other electronic message matches a common digital signature shared by each electronic message in the assigned set of the cluster. 11. The method of claim 10 , determining that the other electronic message belongs to the cluster further comprising: determining that a sender domain of the other electronic message matches a sender domain of each electronic message in the assigned set of the cluster. 12. The method of claim 11 , each expression of the set leading to a textual leaf in a DOM tree of the electronic message and each expression of the set leading to a textual leaf in the DOM tree of the other electronic message. 13. The method of claim 1 , further comprising communicating at least some of the extracted data and each associated annotation to be communicated to a search engine for use in generating a search index for use in electronic message searching. 14. The method of claim 1 , further comprising communicating at least some of the extracted data and each associated annotation to a recommendation system for use in determining at least one interest of a user of the recommendation system, the at least one interest of the user for use in making at least one recommendation to a user at a computing device of the user. 15. The method of claim 14 , the recommendation system comprises an advertising content recommendation system, and the at least one interest is for use in determining advertising content for presentation to the user at a computing device of the user. 16. A non-transitory computer-readable storage medium tangibly encoded with computer-executable instructions that when executed by a processor associated with a computing device perform a method comprising: retrieving a corpus of electronic messages, each electronic message of the corpus using an internal format comprising a plurality of expressions; assigning each electronic message of the corpus to one cluster of a plurality of clusters, a cluster of the plur

Assignees

Inventors

Classifications

  • Machine learning · CPC title

  • Inference or reasoning models · CPC title

  • G06F16/345Primary

    Summarisation for human users · CPC title

  • Querying · CPC title

  • G06F16/35Primary

    Clustering; Classification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11663259B2 cover?
Disclosed are systems and methods for improving interactions with and between computers in electronic messaging, and other, systems supported by or configured with personal computing devices, servers and/or platforms. The systems interact to identify and retrieve data within or across platforms, which can be used to improve the quality of data used in processing interactions between or among pr…
Who is the assignee on this patent?
Yahoo Assets Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/345. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 30 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).