Visualization framework based on document representation learning
US-2018196873-A1 · Jul 12, 2018 · US
US11238230B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11238230-B2 |
| Application number | US-201816135822-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 19, 2018 |
| Priority date | Sep 19, 2018 |
| Publication date | Feb 1, 2022 |
| Grant date | Feb 1, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Word vectors are multi-dimensional vectors that represent words in a corpus of text and that are embedded in a semantically-encoded vector space. Word vectors can be used for sentiment analysis, comparison of the topic or content of sentences, paragraphs, or other passages of text or other natural language processing tasks. However, the generation of word vectors can be computationally expensive. Accordingly, when a set of word vectors is needed for a particular corpus of text, a set of word vectors previously generated from a corpus of text that is sufficiently similar to the particular corpus of text, with respect to some criteria, may be re-used for the particular corpus of text. Such similarity could include the two corpora of text containing the same or similar sets of words or containing incident reports or other time-coded sets of text from overlapping or otherwise similar periods of time.
Opening claim text (preview).
What is claimed is: 1. A system comprising: a database containing (i) a corpus of incident reports relating to operation of a managed network, wherein each incident report contains set of fields, each field containing a text string, (ii) a first criterion representative of a first subset of incident reports from the corpus of incident reports, and (iii) a first artificial neural network (ANN) that includes a first encoder and that has been trained on the first subset of incident reports such that the first encoder can generate word vector representations within a first semantically encoded vector space for words present in text strings of the first subset of incident reports; and a server device configured to: obtain a second criterion representative of a second subset of incident reports from the corpus of incident reports; determine a similarity between the first criterion and the second criterion; and generate word vector representations for words present in text strings of incident reports of the second subset of incident reports, wherein generating the word vector representations when the similarity is greater than a threshold similarity level comprises using the first encoder to generate the word vector representations, and wherein generating the word vector representations when the similarity is less than a threshold similarity level comprises: (i) generating a second ANN that includes a second encoder, wherein generating the second ANN comprises training the second ANN on the second subset of incident reports such that the second encoder can generate word vector representations within a second semantically encoded vector space for words present in text strings of the second subset of incident reports, and (ii) using the second encoder to generate the word vector representations. 2. The system of claim 1 , wherein the server device is further configured to, when the similarity is greater than the threshold similarity level: generate, for each of the incident reports in the second subset, an aggregate vector representation, wherein the aggregate vector representation for a given incident report is a combination of first-encoder-generated word vector representations of words present in text strings of the given incident report; obtain an additional incident report that contains an enumerated set of fields and that satisfies the second criterion, each field containing a text string; generate an aggregate vector representation for the additional incident report by (i) using the first encoder to generate word vector representations within the first semantically encoded vector space for words present in text strings of the additional incident report, and (ii) combining the first-encoder-generated word vector representations of words present in text strings of the additional incident report; compare the aggregate vector representation of each of the incident reports in the second subset to the aggregate vector representation for the additional incident report; based on the comparison, identify a relevant subset of the second subset; and transmit, to a client device, the relevant subset of incident reports. 3. The system of claim 1 , wherein the server device is further configured to, when the similarity is greater than the threshold similarity level: generate, for each of the incident reports in the second subset, an aggregate vector representation, wherein the aggregate vector representation for a given incident report is a combination of first-encoder-generated word vector representations of words present in text strings of the given incident report; compare the generated aggregate vector representations of the incident reports in the second subset to identify one or more clusters of related incident reports within the second subset; and transmit, to a client device, incident reports of a first cluster of the identified one or more clusters of related incident reports within the second subset. 4. The system of claim 1 , wherein the similarity being greater than the threshold similarity level comprises a time period represented by the second criterion overlapping with a time period represented by the first criterion by more than a threshold amount. 5. The system of claim 1 , wherein the similarity being greater than the threshold similarity level comprises a time period represented by the second criterion being a subset of a time period represented by the first criterion. 6. The system of claim 1 , wherein the similarity being greater than the threshold similarity level comprises the first subset containing more than a threshold amount of incident reports from the second subset. 7. The system of claim 6 , wherein the similarity being greater than the threshold similarity level additionally comprises the second subset containing more than a threshold amount of incident reports from the first subset. 8. The system of claim 1 , wherein the similarity being greater than the threshold similarity level comprises a set of words present in the text strings of the second subset overlapping with a set of words present in the text strings of the first subset by more than a threshold amount. 9. The system of claim 1 , wherein the similarity being greater than the threshold similarity level comprises a set of words present in the text strings of the second subset including only words that are present in the text strings of the first subset. 10. The system of claim 1 , wherein the database additionally contains a first hash representative of the first criterion, wherein the server device is additionally configured to determine a second hash representative of the second criterion, wherein determining the similarity between the first criterion and the second criterion comprises determining a similarity between the first hash and the second hash, and wherein the similarity between the first criterion and the second criterion being greater than the threshold similarity level comprises the similarity between the first hash and the second hash being greater than a threshold amount. 11. The system of claim 1 , wherein the first ANN has been trained on the first subset of incident reports such that the first encoder can additionally generate, for a given incident report of the first subset, at least one paragraph vector representation within the first semantically encoded vector space for words present in text strings of a subset of the fields of the given incident report, and wherein the server device is additionally configured to, when the similarity is greater than the threshold similarity level: use the first ANN to generate, for each incident report in the second set of incident reports, at least one paragraph vector representation within the first semantically encoded vector space of words present in text strings of the second set of incident reports. 12. A method comprising: accessing a database that contains (i) a corpus of incident reports relating to operation of a managed network, wherein each incident report contains set of fields, each field containing a text string, (ii) a first criterion representative of a first subset of incident reports from the corpus of incident reports, and (iii) a first artificial neural network (ANN) that includes a first encoder and that has been trained on the first subset of incident reports such that the first encoder can generate word vector representations within a first semantically encoded vector space for words present in text strings of the first subset of incident reports; obtaining a second criterion representative of a second subset of incident reports from the corpus of incident reports; determining that a first similarity between the fir
Backpropagation, e.g. using gradient descent · CPC title
Activation functions · CPC title
Knowledge-based neural networks; Logical representations of neural networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Combinations of networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.