Persistent word vector input to multiple machine learning models

US11238230B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11238230-B2
Application numberUS-201816135822-A
CountryUS
Kind codeB2
Filing dateSep 19, 2018
Priority dateSep 19, 2018
Publication dateFeb 1, 2022
Grant dateFeb 1, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Word vectors are multi-dimensional vectors that represent words in a corpus of text and that are embedded in a semantically-encoded vector space. Word vectors can be used for sentiment analysis, comparison of the topic or content of sentences, paragraphs, or other passages of text or other natural language processing tasks. However, the generation of word vectors can be computationally expensive. Accordingly, when a set of word vectors is needed for a particular corpus of text, a set of word vectors previously generated from a corpus of text that is sufficiently similar to the particular corpus of text, with respect to some criteria, may be re-used for the particular corpus of text. Such similarity could include the two corpora of text containing the same or similar sets of words or containing incident reports or other time-coded sets of text from overlapping or otherwise similar periods of time.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a database containing (i) a corpus of incident reports relating to operation of a managed network, wherein each incident report contains set of fields, each field containing a text string, (ii) a first criterion representative of a first subset of incident reports from the corpus of incident reports, and (iii) a first artificial neural network (ANN) that includes a first encoder and that has been trained on the first subset of incident reports such that the first encoder can generate word vector representations within a first semantically encoded vector space for words present in text strings of the first subset of incident reports; and a server device configured to: obtain a second criterion representative of a second subset of incident reports from the corpus of incident reports; determine a similarity between the first criterion and the second criterion; and generate word vector representations for words present in text strings of incident reports of the second subset of incident reports, wherein generating the word vector representations when the similarity is greater than a threshold similarity level comprises using the first encoder to generate the word vector representations, and wherein generating the word vector representations when the similarity is less than a threshold similarity level comprises: (i) generating a second ANN that includes a second encoder, wherein generating the second ANN comprises training the second ANN on the second subset of incident reports such that the second encoder can generate word vector representations within a second semantically encoded vector space for words present in text strings of the second subset of incident reports, and (ii) using the second encoder to generate the word vector representations. 2. The system of claim 1 , wherein the server device is further configured to, when the similarity is greater than the threshold similarity level: generate, for each of the incident reports in the second subset, an aggregate vector representation, wherein the aggregate vector representation for a given incident report is a combination of first-encoder-generated word vector representations of words present in text strings of the given incident report; obtain an additional incident report that contains an enumerated set of fields and that satisfies the second criterion, each field containing a text string; generate an aggregate vector representation for the additional incident report by (i) using the first encoder to generate word vector representations within the first semantically encoded vector space for words present in text strings of the additional incident report, and (ii) combining the first-encoder-generated word vector representations of words present in text strings of the additional incident report; compare the aggregate vector representation of each of the incident reports in the second subset to the aggregate vector representation for the additional incident report; based on the comparison, identify a relevant subset of the second subset; and transmit, to a client device, the relevant subset of incident reports. 3. The system of claim 1 , wherein the server device is further configured to, when the similarity is greater than the threshold similarity level: generate, for each of the incident reports in the second subset, an aggregate vector representation, wherein the aggregate vector representation for a given incident report is a combination of first-encoder-generated word vector representations of words present in text strings of the given incident report; compare the generated aggregate vector representations of the incident reports in the second subset to identify one or more clusters of related incident reports within the second subset; and transmit, to a client device, incident reports of a first cluster of the identified one or more clusters of related incident reports within the second subset. 4. The system of claim 1 , wherein the similarity being greater than the threshold similarity level comprises a time period represented by the second criterion overlapping with a time period represented by the first criterion by more than a threshold amount. 5. The system of claim 1 , wherein the similarity being greater than the threshold similarity level comprises a time period represented by the second criterion being a subset of a time period represented by the first criterion. 6. The system of claim 1 , wherein the similarity being greater than the threshold similarity level comprises the first subset containing more than a threshold amount of incident reports from the second subset. 7. The system of claim 6 , wherein the similarity being greater than the threshold similarity level additionally comprises the second subset containing more than a threshold amount of incident reports from the first subset. 8. The system of claim 1 , wherein the similarity being greater than the threshold similarity level comprises a set of words present in the text strings of the second subset overlapping with a set of words present in the text strings of the first subset by more than a threshold amount. 9. The system of claim 1 , wherein the similarity being greater than the threshold similarity level comprises a set of words present in the text strings of the second subset including only words that are present in the text strings of the first subset. 10. The system of claim 1 , wherein the database additionally contains a first hash representative of the first criterion, wherein the server device is additionally configured to determine a second hash representative of the second criterion, wherein determining the similarity between the first criterion and the second criterion comprises determining a similarity between the first hash and the second hash, and wherein the similarity between the first criterion and the second criterion being greater than the threshold similarity level comprises the similarity between the first hash and the second hash being greater than a threshold amount. 11. The system of claim 1 , wherein the first ANN has been trained on the first subset of incident reports such that the first encoder can additionally generate, for a given incident report of the first subset, at least one paragraph vector representation within the first semantically encoded vector space for words present in text strings of a subset of the fields of the given incident report, and wherein the server device is additionally configured to, when the similarity is greater than the threshold similarity level: use the first ANN to generate, for each incident report in the second set of incident reports, at least one paragraph vector representation within the first semantically encoded vector space of words present in text strings of the second set of incident reports. 12. A method comprising: accessing a database that contains (i) a corpus of incident reports relating to operation of a managed network, wherein each incident report contains set of fields, each field containing a text string, (ii) a first criterion representative of a first subset of incident reports from the corpus of incident reports, and (iii) a first artificial neural network (ANN) that includes a first encoder and that has been trained on the first subset of incident reports such that the first encoder can generate word vector representations within a first semantically encoded vector space for words present in text strings of the first subset of incident reports; obtaining a second criterion representative of a second subset of incident reports from the corpus of incident reports; determining that a first similarity between the fir

Assignees

Inventors

Classifications

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • Activation functions · CPC title

  • Knowledge-based neural networks; Logical representations of neural networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11238230B2 cover?
Word vectors are multi-dimensional vectors that represent words in a corpus of text and that are embedded in a semantically-encoded vector space. Word vectors can be used for sentiment analysis, comparison of the topic or content of sentences, paragraphs, or other passages of text or other natural language processing tasks. However, the generation of word vectors can be computationally expensiv…
Who is the assignee on this patent?
Servicenow Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).