Determining semantic similarity of texts based on sub-sections thereof

US12299397B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12299397-B2
Application numberUS-202117448667-A
CountryUS
Kind codeB2
Filing dateSep 23, 2021
Priority dateMar 22, 2019
Publication dateMay 13, 2025
Grant dateMay 13, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are provided to compare a target sample of text to a set of textual records, each textual record including a sample of text and an indication of one or more segments of text within the sample of text. Semantic similarity values between the target sample of text and each of the textual records are determined. Determining a particular semantic similarity value between the target sample of text and a particular textual record of the corpus includes: (i) determining individual semantic similarity values between the target sample of text and each of the segments of text indicated by the particular textual record, and (ii) generating the particular semantic similarity value between the target sample of text and the particular textual record based on the individual semantic similarity values. A textual record is then selected based on the semantic similarities.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a processor; and a memory, accessible by the processor, the memory storing instructions that, when executed by the processor, cause the processor to perform operations comprising: providing a plurality of context vectors, wherein the plurality of context vectors were generated using a machine learning model by: accessing an incident report database comprising a plurality of incident reports, wherein the plurality of incident reports comprise incident reports generated within a predetermined time; generating, via the machine learning model, a plurality of respective segments of text from each incident report of the plurality of incident reports in the incident report; and generating, via the machine learning model, one or more first paragraph vector representations of the plurality of respective segments of text from each incident report of the plurality of incident reports in the incident report database, comprising for each of the one or more first paragraph vector representations:  computing, via the machine learning model, one or more word contexts of the one or more first paragraph vector representations; and  outputting a respective context vector of the plurality of context vectors for each of the one or more first paragraph vector representations, wherein each respective context vector of the plurality of context vectors is indicative of the one or more word contexts of each of the one or more first paragraph vector representations; and obtaining, from a client device, a text query; transforming the text query to a database query; performing, via the machine learning model, an inference step to generate a target vector of the database query; receiving the target vector of the database query, wherein the target vector comprises one or more second paragraph vector representations of the database query, one or more word vectors of the database query, or a weighted combination thereof; generating one or more respective record semantic similarity values between the target vector of the database query and each context vector of the one or more first paragraph vector representations; selecting from the incident report database, based on the one or more generated respective record semantic similarity values, a particular incident report having the highest respective record semantic similarity value; and providing, to the client device, a representation of the particular incident report, wherein the particular incident report provides a response to the text query. 2. The system of claim 1 , wherein generating the one or more respective semantic similarity values between the target vector of the database query and each context vector of the one or more first paragraph vector representations comprises: receiving the target vector of the database query, wherein the target vector of the database query includes the one or more word vectors that describe, in a first semantically-encoded vector space, a meaning of respective words of the database query, or the one or more second paragraph vector representations that describes, in a second semantically-encoded vector space, a meaning of multiple words of the database query, or both; and receiving one or more first paragraph vector representations of the plurality of respective segments of text from each incident report of the plurality of incident reports, wherein the one or more first paragraph vector representations of the plurality of respective segments of text from each incident report of the plurality of incident reports describes, in the second semantically-encoded vector space, a meaning of multiple words within the plurality of respective segments of text from each incident report of the plurality of incident reports. 3. The system of claim 1 , wherein generating the respective record semantic similarity values comprises: determining one or more respective segment semantic similarity values between the target vector of the database query and and each context vector of the one or more first paragraph vector representations; comparing, to a threshold similarity level, each of the one or more respective segment semantic similarity values between the database query and each of the plurality of respective segments of text from the plurality of incident reports in the incident report database; and determining a number of the one or more respective segment semantic similarity values that exceed the threshold similarity level as the one or more respective record semantic similarity values. 4. The system of claim 1 , wherein the plurality of respective segments of text from each incident report of the plurality of incident reports in the incident report database comprise non-overlapping segments of text. 5. The system of claim 1 , wherein the plurality of respective segments of text from each incident report of the plurality of incident reports in the incident report database comprise one or more discrete sentences. 6. The system of claim 1 , wherein generating the respective record semantic similarity values between the target vector of the database query and each context vector of the one or more first paragraph vector representations comprises: determining respective semantic similarity values between the target vector of the database query and each context vector indicated by the particular incident report; weighting the one or more respective semantic similarity values based on a ranking of the one or more respective semantic similarity values; generating a sum of the weighted one or more respective semantic similarity values between each context vector on the one or more first paragraph vector representations of the plurality of respective segments of text from each incident report of the plurality of incident reports in the incident report database; and normalizing the sum of the weighted one or more respective semantic similarity values to a number of segments indicated by the particular incident report. 7. The system of claim 1 , wherein each incident report of the plurality of incident reports in the incident report database comprises an indication of a time stamp within a predetermined time threshold. 8. A computer-implemented method comprising: providing a plurality of context vectors, wherein the plurality of context vectors were generated by: accessing, by a server device, an incident report database comprising a plurality of incident reports, wherein the plurality of incident reports comprise incident reports generated within a predetermined time; generating a plurality of respective segments of text from each incident report of the plurality of incident reports in the incident report; and generating one or more first paragraph vector representations of the plurality of respective segments of text from each incident report of the plurality of incident reports in the incident report database, comprising for each of the one or more first paragraph vector representations: computing one or more word contexts of the one or more first paragraph vector representations; and outputting a respective context vector of the plurality of context vectors for each of the first one or more paragraph vector representations, wherein each respective context vector of the plurality of context vectors is indicative of the one or more word contexts of each of the one or more first paragraph vector representations; and receiving, by the server device and from a client device, a text query; transforming the text query to a database query; performing an inference step to generate a target vector of the database query; receiving the target vector of the database query, wherein the target vector comprises one or more second paragraph vector repre

Assignees

Inventors

Classifications

  • Parsing · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

  • Creation of semantic tools, e.g. ontology or thesauri · CPC title

  • using vector based model · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12299397B2 cover?
Systems and methods are provided to compare a target sample of text to a set of textual records, each textual record including a sample of text and an indication of one or more segments of text within the sample of text. Semantic similarity values between the target sample of text and each of the textual records are determined. Determining a particular semantic similarity value between the targ…
Who is the assignee on this patent?
Servicenow Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 13 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).