Data structures for efficient storage and updating of paragraph vectors

US11423069B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11423069-B2
Application numberUS-201816135891-A
CountryUS
Kind codeB2
Filing dateSep 19, 2018
Priority dateSep 19, 2018
Publication dateAug 23, 2022
Grant dateAug 23, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods involving data structures for efficient management of paragraph vectors for textual searching are described. A database may contain records, each associated with an identifier and including a text string and timestamp. A look-up table may contain entries for text strings from the records, each entry associating: a paragraph vector for a respective unique text string, a hash of the respective unique text string, and a set of identifiers of records containing the respective unique text string. A server may receive from a client device an input string, compute a hash of the input string, and determine matching table entries, each containing a hash identical to that of the input string, or a paragraph vector similar to one calculated for the input string. A prioritized list of identifiers from the matching entries may be determined based on timestamps, and the prioritized list may be returned to the client.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a database containing incident reports each associated with a unique identifier, and each including a text string and a timestamp; memory storing a look-up table that contains entries for respective unique text strings from the incident reports, wherein each of the entries associates: a pre-calculated paragraph vector for the respective unique text string, a hash of the respective unique text string, and a set of unique identifiers associated with incident reports that contain the respective unique text string; and a server device configured to: receive, from a client device, an input text string, determine a hash of the input text string, determine, from the look-up table, one or more matching entries for the input text string, wherein each of the one or more matching entries either: (i) contains an identical copy of the hash of the input text string, or (ii) contains a pre-calculated paragraph vector that is within pre-defined matching criteria of, but not identical to, a paragraph vector calculated for the input text string, based at least on the timestamps within the incident reports specified by the set of unique identifiers in the one or more matching entries, determine a prioritized list of unique identifiers from an aggregate of the sets of unique identifiers in the one or more matching entries; and transmit, to the client device, at least one of: (i) the prioritized list of unique identifiers, or (ii) information related to incident reports associated with the prioritized list of unique identifiers. 2. The system of claim 1 , wherein determining, from the look-up table, the one or more matching entries for the input text string comprises: determining first if there is an entry that contains an identical copy of the hash of the input text string; and if there is no entry that contains an identical copy of the hash of the input text string, searching for one or more entries that contain a pre-calculated paragraph vector that is within the pre-defined matching criteria of the paragraph vector calculated for the input text string. 3. The system of claim 2 , wherein searching for the one or more entries that contain the pre-calculated paragraph vector that is within the pre-defined matching criteria of the paragraph vector calculated for the input text string comprises computing the paragraph vector for the input string with an artificial neural network. 4. The system of claim 1 , wherein searching for the one or more entries that contain the pre-calculated paragraph vector that is within the pre-defined matching criteria of the paragraph vector calculated for the input text string comprises: identifying cosine similarities between the paragraph vector calculated for the input text string and the respective pre-calculated paragraph vector in each of the look-up table entries that are greater than a pre-determined cosine-similarity threshold. 5. The system of claim 1 , wherein determining the prioritized list of unique identifiers based at least on the timestamps comprises: for all the unique identifiers in a matching entry of the one or more matching entries that contains an identical copy of the hash of the input text string, forming a first priority list sorted from most recent timestamp to oldest timestamp; for all the unique identifiers in all of the one or more matching entries that respectively contain a pre-calculated paragraph vector that is within the pre-defined matching criteria of, but not identical to, the paragraph vector calculated for the input text string, forming an aggregate priority list of unique identifiers sorted according jointly to: (i) similarity of the pre-calculated paragraph vectors in the associated matching entries to the paragraph vector calculated for the input text string, and (ii) most recent timestamp to oldest timestamp; concatenating the first priority list with the aggregate priority list, giving priority to the first priority list; and selecting in priority order from the concatenated list up to N unique identifiers, wherein N is a positive integer. 6. The system of claim 1 , wherein the input text string is associated with a new incident, wherein the hash of the input text string is identical to the hash of a particular entry in the look-up table, and wherein the server device is further configured to: update the database by adding a new incident report for the new incident, including a new unique identifier and a timestamp indicating a creation time for the new incident report; and update the set of unique identifiers associated with the particular entry to include the new unique identifier. 7. The system of claim 1 , wherein the input text string is associated with a new incident, wherein the hash of the input text string is not identical to the hash of any entry in the look-up table, and wherein the server device is further configured to: update the database by adding a new incident report for the new incident, including a new unique identifier and a timestamp indicating a creation time for the new incident report; and create a new entry in the look-up table, the new entry including the hash of the input text string, the paragraph vector calculated for the input text string, and a set of unique identifiers including only the new unique identifier. 8. The system of claim 1 , wherein the memory further stores a timestamp-ID map table that contains timestamp-ID entries, wherein each timestamp-ID entry associates a unique timestamp with a list of unique identifiers associated with incident reports that were created within a threshold time of the unique timestamp of the entry, and wherein the server device is further configured to: advance a sliding time widow from a current reference time to a new reference time; identify all timestamp-ID entries having timestamps within the sliding time window at the new reference point; and update the one or more entries of the look-up table based on a comparison of (i) the unique identifiers in an aggregate ID list of all unique identifiers associated with the identified timestamp-ID entries with (ii) the unique identifiers associated with the one or more entries of the look-up table. 9. The system of claim 8 , wherein updating the one or more entries of the look-up table based on the comparison of (i) the unique identifiers in the aggregate ID list of all unique identifiers associated with the identified timestamp-ID entries with (ii) the unique identifiers associated with the one or more entries of the look-up table comprises: for each entry in the look-up table having in its set of unique identifiers one or more particular unique identifiers that are not in the aggregate ID list, removing the particular one or more unique identifier from the set of unique identifiers; for each entry in the look-up table having in its set of unique identifiers one or more given unique identifiers that are in the aggregate ID list, retaining the given one or more unique identifier in the set of unique identifiers; for each given unique identifier in the aggregate ID list that (i) is not in the set of unique identifiers associated with any of the look-up table entries, and (ii) is associated with an incident report having a text string with a hash that is identical to the hash associated with an existing entry of the look-up table, add the given unique identifier to the set of the unique identifiers associated with the existing entry; and for each particular unique identifier in the aggregate ID list that (i) is not in the set of unique identifiers associated with any of the look-up table entries, and (ii) is associated with a particular incident report having a text string with a hash that is not identi

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Supervised learning · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Feedforward networks · CPC title

  • using directory or table look-up (use of a directory or look-up table in file systems G06F16/13) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11423069B2 cover?
Systems and methods involving data structures for efficient management of paragraph vectors for textual searching are described. A database may contain records, each associated with an identifier and including a text string and timestamp. A look-up table may contain entries for text strings from the records, each entry associating: a paragraph vector for a respective unique text string, a hash …
Who is the assignee on this patent?
Servicenow Inc
What technology area does this patent fall under?
Primary CPC classification H04L41/5074. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Aug 23 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).