Document indexing, searching, and ranking with semantic intelligence

US11113327B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11113327-B2
Application numberUS-201916274464-A
CountryUS
Kind codeB2
Filing dateFeb 13, 2019
Priority dateFeb 13, 2019
Publication dateSep 7, 2021
Grant dateSep 7, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

There is a need for solutions that perform preprocessing and/or searching of documents with semantic intelligence. This need can be addressed by, for example, by performing pre-processing of each document of a plurality of documents to generate an indexed representation for the document by identifying sentences in the document; determining, for each n-gram of one or more n-grams associated with the document, one or more n-gram semantic scores based semantic proximity indicators for the n-gram; determining, based at least in part on each one or more n-gram semantic scores, one or more sentence semantic labels for each sentence in the document; and determining the indexed representation for the document based at least in part on the one or more sentence semantic labels for the document; performing the search query based each indexed representation associated with a document; and transmitting the result to a computing device associated with the search query.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method for performing a search query associated with a plurality of search terms based at least in part on a plurality of documents, the computer-implemented method comprising: performing, by one or more processors, pre-processing of each document of the plurality of documents, wherein (1) each document of the plurality of documents comprises a plurality of words to generate an indexed representation of each document of the plurality of documents, and (2) the pre-processing of a document of the plurality of documents comprises: identifying, by the one or more processors, one or more sentences in the document, wherein each sentence of the one or more sentences comprises one or more words from the plurality of words in the document; generating, by the one or more processors and for each n-gram of one or more n-grams associated with the document, one or more n-gram semantic scores based at least in part on one or more semantic proximity indicators for the corresponding n-gram, wherein (1) each n-gram of the one or more n-grams associated with the document comprises a combination of one or more words from the plurality of words in the document, (2) each semantic proximity indicator of the one or more semantic proximity indicators for an n-gram indicates a degree of semantic relatedness between a corresponding word of the one or more words in the n-gram and a corresponding candidate semantic label of one or more candidate semantic labels associated with the document, (3) each n-gram semantic score of the one or more n-gram semantic scores for a document is generated based at least in part on one or more frequency scores for the n-gram, and (4) the one or more frequency scores for a first one-gram of the one or more n-grams in the document are generated by (a) applying a logarithmic transformation to an occurrence frequency for the first one-gram in the document to generate a logarithmic occurrence frequency for the first one-gram in the document, and applying a modulo transformation to the logarithmic occurrence frequency to generate the one or more frequency scores for the first one-gram in the document; determining, by the one or more processors and based at least in part on each n-gram semantic score of the one or more n-gram semantic scores for an n-gram of the one or more n-grams, one or more semantic labels for each sentence of the one or more sentences in the document; and generating, by the one or more processors, the indexed representation for the document of the plurality of documents based at least in part on the one or more semantic labels for each sentence of the one or more sentences in the document; processing, by the one or more processors, the search query based at least in part on (1) the plurality of search terms, and (2) each indexed representation associated with a document of the plurality of documents, wherein the processing the search query comprises: providing, by the one or more processors, the search query to a data storage unit, wherein (1) the data storage unit executes the search query to generate one or more search results for the search query, and (2) executing the search query comprises: generating, for each document of the plurality of documents, a relevance score in relation to the search query based at least in part on (1) one or more selected semantic labels for the document, (2) the one or more frequency scores, and (3) the one or more n-gram semantic scores; and generating the one or more search results based at least in part on the relevance scores for each document of the plurality of documents, wherein the one or more search results are ranked based at least in part on the relevance scores for each document of the plurality of documents; and transmitting, by the one or more processors, the one or more search results to a computing device associated with the search query, wherein the one or more search results are displayed by the computing device. 2. The computer-implemented method of claim 1 , wherein the one or more n-grams for a document comprise a first multi-gram associated with two or more words comprising a first word and one or more other words, wherein each word of one or more other words is selected from the group consisting of a word that occurs within a threshold distance of the first word in the document and a word deemed co-referencing with the first word. 3. The computer-implemented method of claim 1 , wherein: each n-gram semantic score is associated with a linguistic domain of a plurality of linguistic domains; the one or more n-gram semantic scores associated with each linguistic domain of the plurality of linguistic domains are determined based at least in part on semantic proximity data for the linguistic domain; and the semantic proximity data for a linguistic domain of the plurality of linguistic domains comprise one or more semantic proximity relationships associated with the linguistic domain. 4. The computer-implemented method of claim 3 , wherein the one or more frequency scores for a first multi-gram of the one or more n-grams in a document are generated by: determining a plurality of one-grams associated with the first multi-gram, wherein each one-gram of the plurality of one-grams comprises a word from a plurality of words associated with the first multi-gram; generating a measure of summation of each frequency score associated with a one-gram of the plurality of one-grams; generating a measure of product of each frequency score associated with a one-gram of the plurality of one-grams; applying a logarithmic function to a ratio of a measure of joint probability of occurrence of the plurality of words and the measure of product to generate a logarithmic frequency value; and generating the frequency score for the first multi-gram based at least in part on the measure of summation and the logarithmic frequency value. 5. The computer-implemented method of claim 1 , wherein: the degree of semantic relatedness between a first word of a first n-gram of the one or more n-grams and a first candidate semantic label of the one or more candidate semantic labels is determined based at least in part on a degree of separation of a first node corresponding to the first word and a second node corresponding to the first candidate semantic label in a semantic proximity graph; the semantic proximity graph comprises one or more nodes, wherein each node of the one or more nodes corresponds to a semantic construct of a plurality semantic constructs and one or more edges, wherein each edge of the one or more edges corresponds to a strongest-type semantic relationship between a first semantic construct of the plurality semantic constructs and a second semantic construct of the plurality semantic constructs; and the plurality semantic constructs comprise the one or more candidate semantic labels and at least some of the plurality of words. 6. The computer-implemented method of claim 1 , wherein the semantic proximity indicator between a first word of the one or more words and a first semantic label of the one or more semantic labels has a value of zero if there is no semantic relationship between the first word and the first semantic label. 7. The computer-implemented method of claim 1 , wherein generating the one or more first n-gram semantic scores for a first n-gram of the one or more n-grams comprises: generating, for each word of the one or more words in the first n-gram and each first semantic label of the one or more semantic labels, a word semantic score for occurrence of the word in the first n-gram based at least in part on a frequency score for the first n-gram, a frequency score for each n-gram of the one or more n-grams that comprises the word, and a semantic

Assignees

Inventors

Classifications

  • G06F16/316Primary

    Indexing structures · CPC title

  • Document management systems · CPC title

  • Presentation of query results · CPC title

  • using natural language analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11113327B2 cover?
There is a need for solutions that perform preprocessing and/or searching of documents with semantic intelligence. This need can be addressed by, for example, by performing pre-processing of each document of a plurality of documents to generate an indexed representation for the document by identifying sentences in the document; determining, for each n-gram of one or more n-grams associated with…
Who is the assignee on this patent?
Optum Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/316. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 07 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).