Extracting actionable information from emails

US2018024986A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018024986-A1
Application numberUS-201615215286-A
CountryUS
Kind codeA1
Filing dateJul 20, 2016
Priority dateJul 20, 2016
Publication dateJan 25, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are provided for extracting actionable information from emails in a completely unsupervised manner with no need for the data to be labeled (i.e., the systems and methods do not a human to identify unlabeled or relabeled emails). Changes in the email structure are automatically incorporated to learn new templates through the novel concept of sub-templates. The systems and methods incorporate the minor variations in email structure seamlessly, without needing to introduce new templates. Email templates are computed as permutations of multiple sub-templates in the email, which allows the systems and methods to handle variations in email structure seamlessly and highly efficiently. These systems and methods are extendable to any domain using structured emails, and improve the efficiency of the systems that receive and act on information contained in emails.

First claim

Opening claim text (preview).

We claim: 1 . A method for improving efficiency of a computing device used in extracting actionable information from a message, comprising: receiving a message; parsing the message; identifying one or more keywords from a dictionary in the parsed message; separating the message into nodes; generating node scores for the nodes; identifying an area of interest based at least in part on the node scores; correlating the area of interest to one or more sub-templates; identifying a template based on the one or more sub-templates; and extracting actionable information from the message based on the identified template. 2 . The method of claim 1 , wherein the message is an email message formatted according to the Hypertext Markup Language. 3 . The method of claim 1 , further comprising: identifying a language of the parsed message; and selecting the dictionary based on the identified language. 4 . The method of claim 1 , wherein the message is separated into nodes based on a hierarchical structural in which the message is composed. 5 . The method of claim 1 , wherein the node scores are generated by: incrementing a given node score based on an associated node containing at least one keyword; and adding the node scores of child nodes of a particular node to a particular node score associated with the particular node. 6 . The method of claim 1 , wherein the identified area of interest comprises a given node and child nodes of the given node wherein the given node has a highest node score furthest from a root node. 7 . The method of claim 6 , wherein the given node has the highest node score that is selected from the nodes scores at least a set number of tiers below the root node. 8 . The method of claim 1 , wherein correlating the area of interest to the one or more sub-templates further comprises: determining whether portions of the area of interest match one or more existing sub-templates; in response to determining that a given portion of the area of interest matches a given existing sub-template, selecting the given sub-template; and in response to determining that the given portion of the area of interest does not match the one or more existing sub-templates, saving the given portion as a new sub-template. 9 . The method of claim 1 , wherein the actionable information extracted from the message is added to the dictionary. 10 . The method of claim 1 , further comprising transmitting the extracted actionable information to one of: a personal digital assistant; or a calendar application. 11 . A system for improving efficiency of a computing device in extracting actionable information from a message, comprising: a parser, operable to receive a message and break the message into nodes based on a structure of the message, wherein the nodes are organized according to a tree structure; a domain dictionary, in communication with the parser; a template library, in communication with the parser; wherein the parser is further operable to identify keywords in nodes of the message matching entries in the domain dictionary and assign node scores to each of the nodes based on keyword presence; wherein the parser is further operable to identify an area of interest in the message comprising a given node and child nodes of the given node based on the given node having a highest node score that is furthest from a root of the tree structure; wherein the parser is further operable to identify a template for the message from the template library; and wherein the parser is further operable to extract actionable information from the area of interest based on the template. 12 . The system of claim 11 , wherein the extracted actionable information is transmitted to a personal digital assistant and integrated into a calendar application. 13 . The system of claim 11 , wherein the keywords include: names; dates; holidays; and times. 14 . The system of claim 11 , wherein the domain dictionary is selected from a plurality of domain dictionaries based on a sender of the message. 15 . The system of claim 11 , wherein the domain dictionary is built based on text included in the area of interest. 16 . The system of claim 11 , wherein the template library is built based on identifying tree structures and node scores from portions of the area of interest that are repeated in multiple messages. 17 . A computer readable storage device including instructions, which when executed by a processor are operable to: defining a plurality of nodes of an email message, the plurality of nodes arranged in a tree structure based on a structure of the email message; parse the email message according to a domain dictionary to identify keywords from the domain dictionary included in leaf nodes in the tree structure; increment a node score for each leaf node that includes at least one keyword; combine node scores of each child node of the tree structure at a parent node; identify a node in the tree structure having a highest node score; define the node having the highest node score and child nodes of the node having the highest node score as an core region in the email message; identify one or more sub-templates having tree structures and node scores matching tree structures and node scores of one or more portions of the core region; identify an ur-template that includes the one or more sub-templates; and extract actionable information from the core region based on the ur-template. 18 . The computer readable storage device of claim 17 , wherein when the highest node score is shared by multiple nodes, a node of the multiple nodes sharing the highest node score located furthest from the root node in the tree structure is selected as the node with the highest node score. 19 . The computer readable storage device of claim 17 , wherein when the tree structures and the node scores of the one or more portions of the email message do not match a known sub-template, the tree structures and the node scores of the one or more portions are saved as new sub-templates. 20 . The computer readable storage device of claim 17 , wherein the structure of the email message is formatted according to Hypertext Markup Language and the nodes and the tree structure are determined based on tags of the elements.

Assignees

Inventors

Classifications

  • Dictionaries · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Computer-aided management of electronic mailing [e-mailing] · CPC title

  • Tree-structured documents (parsing G06F40/205; validation G06F40/226) · CPC title

  • Templates · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018024986A1 cover?
Systems and methods are provided for extracting actionable information from emails in a completely unsupervised manner with no need for the data to be labeled (i.e., the systems and methods do not a human to identify unlabeled or relabeled emails). Changes in the email structure are automatically incorporated to learn new templates through the novel concept of sub-templates. The systems and met…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F17/272. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 25 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).