Generating and applying a trained structured machine learning model for determining a semantic label for content of a transient segment of a communication

US10540610B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10540610-B1
Application numberUS-201615139807-A
CountryUS
Kind codeB1
Filing dateApr 27, 2016
Priority dateAug 8, 2015
Publication dateJan 21, 2020
Grant dateJan 21, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, apparatus, and computer-readable media are provided for analyzing a cluster of communications, such as B2C emails, to generate a template for the cluster that defines transient segments and fixed segments of the cluster of communications. More particularly, methods, apparatus, and computer-readable media are provided for generating and/or applying a trained structured machine learning model for a generated template that can be used to determine, for one or more transient segments of subsequent communications, a corresponding probability that a given semantic label is the correct semantic label for extracted content of the transient segment(s).

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: grouping a corpus of electronic communications into a plurality of clusters based on metadata associated with each communication; identifying, from communications of a particular cluster, a set of segments; classifying a plurality of the segments of the set of segments as transient segments, wherein classifying a given segment of the segments as a transient segment is based on determining variability of content of the given segment across the particular cluster satisfies one or more criteria, and wherein the classifying is performed without human access to content of the communications; classifying a plurality of the segments of the set of segments as fixed segments based on variability of content of the fixed segments across the particular cluster; generating a template for the cluster, the template defining an order of the transient segments, wherein the order of the transient segments is based on the particular cluster; for each communication of a training set of the communications of the particular cluster, annotating each of one or more of the transient segments with at least one corresponding semantic label, the annotating performed without human access to the content of the communications; generating training examples that each define a plurality of features for a corresponding one of the communications of the training set, the features including at least the annotated semantic labels for the transient segments and the order for the transient segments; and training a structured machine learning model for the template using the training examples, the trained structured machine learning model defining parameters for determining, for one or more of the transient segments, a corresponding probability that a given semantic label of the semantic labels is a correct label. 2. The computer-implemented method of claim 1 , further comprising: identifying an additional communication as matching the template; selecting the trained structured machine learning model for the additional communication based on it being assigned to the template; and applying the assigned trained structured machine learning model to the additional communication to determine the probability that the given semantic label is a correct label for content of one of the transient segments in the additional communication. 3. The computer-implemented method of claim 1 , wherein the semantic label annotated for at least one transient segment of the transient segments is the given semantic label and further comprising, for each of the communications of the training set: annotating the transient segment with a first probability that the given semantic label is correct for the transient segment for the communication, and annotating the transient segment with a second probability that the given semantic label is incorrect for the transient segment for the communication; wherein the features of the training examples further include weights for each of the training examples determined based on a corresponding one of the first and second probabilities. 4. The computer-implemented method of claim 3 , wherein the trained structured machine learning model is a conditional random field machine learning model that is trained based on an expectation maximization algorithm. 5. The computer-implemented method of claim 1 , further comprising: providing, as input to a classifier, one or more properties of content of the given transient segment in the communication; and receiving, as output from the classifier, a probability that the corresponding semantic label is correct for the given transient segment for the communication. 6. The computer-implemented method of claim 5 , further comprising: annotating, for the communication, the given transient segment with the probability of the corresponding semantic label. 7. The computer-implemented method of claim 6 , wherein the features for a training example for the communication include a weight of the training example that is determined based on the probability. 8. The computer-implemented method of claim 7 , wherein the trained structured machine learning model is a conditional random field machine learning model. 9. The computer-implemented method of claim 5 , further comprising: annotating the given transient segment with the corresponding semantic label for the communication only when the probability satisfies a threshold. 10. The computer-implemented method of claim 6 , wherein annotating an additional transient segment of the transient segments with at least one corresponding semantic label for the communication of the training set comprises: determining the corresponding semantic label for the additional transient segment based on a regular expression or heuristics. 11. The computer-implemented method of claim 10 , wherein the corresponding semantic label for the additional transient segment is indicative of price, order number, or tracking number. 12. The computer-implemented method of claim 11 , wherein the corresponding semantic label for the given transient segment is indicative of a product name. 13. The computer-implemented method of claim 1 , wherein the features of the training examples further include features of the fixed segments and an order of the fixed segments relative to one another and relative to the transient segments. 14. The computer-implemented method of claim 1 , further comprising: applying the trained structured machine learning model to determine a probability for a given semantic label for a given transient segment; and assigning the given semantic label to the given transient segment in the template based on the probability satisfying a threshold. 15. A computer-implemented method, comprising: grouping a corpus of electronic communications into a plurality of clusters based on metadata associated with each communication; identifying, from communications of a particular cluster, a set of segments; classifying a plurality of the segments of the set of segments as transient segments, wherein classifying a given segment of the segments as a transient segment is based on determining variability of content of the given segment across the particular cluster satisfies one or more criteria, and wherein the classifying is performed without human access to content of the communications; generating a template for the cluster, the template defining an order of the transient segments, wherein the order of the transient segments is based on the particular cluster; for each communication of a training set of the communications of the particular cluster: annotating each of one or more of the transient segments with at least one corresponding semantic label, wherein the annotating is performed without human access to the content of the communications, and wherein the semantic label annotated for at least one transient segment of the transient segments is a given semantic label, annotating the at least one transient segment with a first probability that the given semantic label is correct for the at least one transient segment for the communication, and annotating the at least one transient segment with a second probability that the given semantic label is incorrect for the at least one transient segment for the communication; generating training examples that each define a plurality of features for a corresponding one of the communications of the training set, the features including at least the annotated semantic labels for the transient segments, the order for the transient segments, and weights for each of

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • H04L51/42Primary

    Mailbox-related aspects, e.g. synchronisation of mailboxes · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • Electricity · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10540610B1 cover?
Methods, apparatus, and computer-readable media are provided for analyzing a cluster of communications, such as B2C emails, to generate a template for the cluster that defines transient segments and fixed segments of the cluster of communications. More particularly, methods, apparatus, and computer-readable media are provided for generating and/or applying a trained structured machine learning …
Who is the assignee on this patent?
Google Inc, Google Llc
What technology area does this patent fall under?
Primary CPC classification H04L51/42. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Jan 21 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).