Systems and methods for identifying and categorizing electronic documents through machine learning
US-9514414-B1 · Dec 6, 2016 · US
US10120928B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10120928-B2 |
| Application number | US-201414318968-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 30, 2014 |
| Priority date | Jun 24, 2014 |
| Publication date | Nov 6, 2018 |
| Grant date | Nov 6, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The current document is directed to methods and systems for processing, classifying, and efficiently storing large volumes of event messages generated in modern computing systems. In a disclosed implementation, received event messages are assigned to event-message clusters based on non-parameter tokens identified within the event messages. A parsing function is generated for each cluster that is used to extract data from incoming event messages and to prepare event records from event messages that more efficiently and accessible store event information. The parsing functions also provide an alternative basis for assignment of event massages to clusters.
Opening claim text (preview).
The invention claimed is: 1. An event-message clustering system comprising: one or more processors; one or more memories; and computer instructions, stored in one or more of the one or more memories that, when executed by one or more of the one or more processors, control the event-message clustering system to receive event messages, and process each of the received event messages by determining a cluster to which to assign the event message, employing a parsing function associated with the determined cluster to extract data values from the event message, generating an event record corresponding to the event message that includes the extracted data values, and storing the event record within, or associated with, the selected cluster in a physical data-storage device. 2. The event-message clustering system of claim 1 wherein the parsing function is based on a regular expression. 3. The event-message clustering system of claim 1 wherein the parsing function is generated for a cluster by determining the non-variable portions common to a number of event messages assigned to the cluster; and generating a regular expression that includes literals representing the determined non-variable portions common to a number of event messages assigned to the cluster and that includes any-substring-matching sub-regular expressions to represent the variable portions. 4. The event-message clustering system of claim 3 wherein the generated parsing function is rendered more specific to the cluster by: identifying data values having particular data types encoded within each of the determined non-variable portions common to the number of event messages; and modifying the generated parsing function to include sub-regular expressions that match the data types of the identified data values to produce a final regular expression. 5. The event-message clustering system of claim 4 wherein the final regular expression is supplied to, or incorporated within, a search function that extracts data values from received event messages based on matching the final regular expression to entire, or portions of, received event messages. 6. The event-message clustering system of claim 1 wherein the event-message clustering system maintains the clusters by periodically monitoring the clusters and carrying out maintenance operations. 7. The event-message clustering system of claim 6 wherein, when, during periodic monitoring, the event-message clustering system identifies a cluster for which a parsing function has not yet been generated and with which more than a threshold number of event messages are associated, the event-message clustering system generates a parsing function for the cluster and associates the parsing function with the cluster. 8. The event-message clustering system of claim 7 wherein, when, during periodic monitoring, the event-message clustering system identifies a first cluster associated with a first parsing function and a second cluster with a second parsing function for which the first parsing function can be successfully applied to event messages assigned to the second cluster and the second parsing function can be successfully applied to event messages assigned to the first cluster, the event-message clustering system merges the two clusters into a single cluster. 9. The event-message clustering system of claim 7 wherein, when, during periodic monitoring, the event-message clustering system identifies a first cluster to which a number of event messages have been assigned that is less than a threshold fraction of the number of events assigned to a second cluster, the event-message clustering system analyzes the first and second clusters to determine whether or not to merge the first and second cluster into a single cluster. 10. The event-message clustering system of claim 7 wherein, when, during periodic monitoring, the event-message clustering system identifies a cluster associated with a parsing function that cannot be successfully applied to more than a threshold number of event messages assigned to the cluster, the event-message clustering system splits the cluster into a pair of clusters that includes the cluster and a new cluster and assigns the more than a threshold number of event messages to the new cluster. 11. The event-message clustering system of claim 7 wherein determining a cluster to which to assign the event message further includes: normalizing the event message to identify parameter tokens within the event message; computing, using non-parameter tokens within the event message, a metric to represent the event message; and using the metric to select an event-message cluster to which to assign the event message. 12. The event-message clustering system of claim 7 wherein determining a cluster to which to assign the event message further includes: identifying a parsing function associated with a cluster that can be successfully applied to the event message; and selecting the identified cluster as the event-message cluster to which to assign the event message. 13. A method that processes event messages, carried out within an event-message clustering system, the event-message clustering system having one or more processors, one or more memories, and computer instructions, stored in one or more of the one or more memories that, when executed by one or more of the one or more processors, control the event-message clustering system to receive event messages and process each of the received event messages, the method comprising: receiving event messages, and processing each of the received event messages by determining a cluster to which to assign the event message, employing a parsing function associated with the determined cluster to extract data values from the event message, generating an event record corresponding to the event message that includes the extracted data values, and storing the event record within, or associated with, the selected cluster in a physical data-storage device. 14. The method of claim 13 further including: generating a parsing function for a cluster by: determining the non-variable portions common to a number of event messages assigned to the cluster; and generating a regular expression that includes literals representing the determined non-variable portions common to a number of event messages assigned to the cluster and that includes any-substring-matching sub-regular expressions to represent the variable portions. 15. The method of claim 14 wherein the generated parsing function is rendered more specific to the cluster by: identifying data values having particular data types encoded within each of the determined non-variable portions common to the number of event messages; and modifying the generated parsing function to include sub-regular expressions that match the data types of the identified data values to produce a final regular expression. 16. The method of claim 15 wherein the final regular expression is supplied to, or incorporated within, a search function that extracts data values from received event messages based on matching the final regular expression to entire, or portions of, received event messages. 17. The method of claim 13 further including: maintaining the clusters by periodically monitoring the clusters and carrying out maintenance operations. 18. The method of claim 17 further including: when, during periodic monitoring, a cluster for which a parsing function has not yet been generated and with which more than a threshold number of event messages are assoc
using filtering, e.g. reduction of information by using priority, element types, position or time · CPC title
Physics · mapped topic
Physics · mapped topic
using virtualisation of network functions or resources, e.g. SDN or NFV entities · CPC title
into predefined classes · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.