Aggregating Semantic Information for Improved Understanding of Users
US-2019327331-A1 · Oct 24, 2019 · US
US11113327B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11113327-B2 |
| Application number | US-201916274464-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 13, 2019 |
| Priority date | Feb 13, 2019 |
| Publication date | Sep 7, 2021 |
| Grant date | Sep 7, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
There is a need for solutions that perform preprocessing and/or searching of documents with semantic intelligence. This need can be addressed by, for example, by performing pre-processing of each document of a plurality of documents to generate an indexed representation for the document by identifying sentences in the document; determining, for each n-gram of one or more n-grams associated with the document, one or more n-gram semantic scores based semantic proximity indicators for the n-gram; determining, based at least in part on each one or more n-gram semantic scores, one or more sentence semantic labels for each sentence in the document; and determining the indexed representation for the document based at least in part on the one or more sentence semantic labels for the document; performing the search query based each indexed representation associated with a document; and transmitting the result to a computing device associated with the search query.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method for performing a search query associated with a plurality of search terms based at least in part on a plurality of documents, the computer-implemented method comprising: performing, by one or more processors, pre-processing of each document of the plurality of documents, wherein (1) each document of the plurality of documents comprises a plurality of words to generate an indexed representation of each document of the plurality of documents, and (2) the pre-processing of a document of the plurality of documents comprises: identifying, by the one or more processors, one or more sentences in the document, wherein each sentence of the one or more sentences comprises one or more words from the plurality of words in the document; generating, by the one or more processors and for each n-gram of one or more n-grams associated with the document, one or more n-gram semantic scores based at least in part on one or more semantic proximity indicators for the corresponding n-gram, wherein (1) each n-gram of the one or more n-grams associated with the document comprises a combination of one or more words from the plurality of words in the document, (2) each semantic proximity indicator of the one or more semantic proximity indicators for an n-gram indicates a degree of semantic relatedness between a corresponding word of the one or more words in the n-gram and a corresponding candidate semantic label of one or more candidate semantic labels associated with the document, (3) each n-gram semantic score of the one or more n-gram semantic scores for a document is generated based at least in part on one or more frequency scores for the n-gram, and (4) the one or more frequency scores for a first one-gram of the one or more n-grams in the document are generated by (a) applying a logarithmic transformation to an occurrence frequency for the first one-gram in the document to generate a logarithmic occurrence frequency for the first one-gram in the document, and applying a modulo transformation to the logarithmic occurrence frequency to generate the one or more frequency scores for the first one-gram in the document; determining, by the one or more processors and based at least in part on each n-gram semantic score of the one or more n-gram semantic scores for an n-gram of the one or more n-grams, one or more semantic labels for each sentence of the one or more sentences in the document; and generating, by the one or more processors, the indexed representation for the document of the plurality of documents based at least in part on the one or more semantic labels for each sentence of the one or more sentences in the document; processing, by the one or more processors, the search query based at least in part on (1) the plurality of search terms, and (2) each indexed representation associated with a document of the plurality of documents, wherein the processing the search query comprises: providing, by the one or more processors, the search query to a data storage unit, wherein (1) the data storage unit executes the search query to generate one or more search results for the search query, and (2) executing the search query comprises: generating, for each document of the plurality of documents, a relevance score in relation to the search query based at least in part on (1) one or more selected semantic labels for the document, (2) the one or more frequency scores, and (3) the one or more n-gram semantic scores; and generating the one or more search results based at least in part on the relevance scores for each document of the plurality of documents, wherein the one or more search results are ranked based at least in part on the relevance scores for each document of the plurality of documents; and transmitting, by the one or more processors, the one or more search results to a computing device associated with the search query, wherein the one or more search results are displayed by the computing device. 2. The computer-implemented method of claim 1 , wherein the one or more n-grams for a document comprise a first multi-gram associated with two or more words comprising a first word and one or more other words, wherein each word of one or more other words is selected from the group consisting of a word that occurs within a threshold distance of the first word in the document and a word deemed co-referencing with the first word. 3. The computer-implemented method of claim 1 , wherein: each n-gram semantic score is associated with a linguistic domain of a plurality of linguistic domains; the one or more n-gram semantic scores associated with each linguistic domain of the plurality of linguistic domains are determined based at least in part on semantic proximity data for the linguistic domain; and the semantic proximity data for a linguistic domain of the plurality of linguistic domains comprise one or more semantic proximity relationships associated with the linguistic domain. 4. The computer-implemented method of claim 3 , wherein the one or more frequency scores for a first multi-gram of the one or more n-grams in a document are generated by: determining a plurality of one-grams associated with the first multi-gram, wherein each one-gram of the plurality of one-grams comprises a word from a plurality of words associated with the first multi-gram; generating a measure of summation of each frequency score associated with a one-gram of the plurality of one-grams; generating a measure of product of each frequency score associated with a one-gram of the plurality of one-grams; applying a logarithmic function to a ratio of a measure of joint probability of occurrence of the plurality of words and the measure of product to generate a logarithmic frequency value; and generating the frequency score for the first multi-gram based at least in part on the measure of summation and the logarithmic frequency value. 5. The computer-implemented method of claim 1 , wherein: the degree of semantic relatedness between a first word of a first n-gram of the one or more n-grams and a first candidate semantic label of the one or more candidate semantic labels is determined based at least in part on a degree of separation of a first node corresponding to the first word and a second node corresponding to the first candidate semantic label in a semantic proximity graph; the semantic proximity graph comprises one or more nodes, wherein each node of the one or more nodes corresponds to a semantic construct of a plurality semantic constructs and one or more edges, wherein each edge of the one or more edges corresponds to a strongest-type semantic relationship between a first semantic construct of the plurality semantic constructs and a second semantic construct of the plurality semantic constructs; and the plurality semantic constructs comprise the one or more candidate semantic labels and at least some of the plurality of words. 6. The computer-implemented method of claim 1 , wherein the semantic proximity indicator between a first word of the one or more words and a first semantic label of the one or more semantic labels has a value of zero if there is no semantic relationship between the first word and the first semantic label. 7. The computer-implemented method of claim 1 , wherein generating the one or more first n-gram semantic scores for a first n-gram of the one or more n-grams comprises: generating, for each word of the one or more words in the first n-gram and each first semantic label of the one or more semantic labels, a word semantic score for occurrence of the word in the first n-gram based at least in part on a frequency score for the first n-gram, a frequency score for each n-gram of the one or more n-grams that comprises the word, and a semantic
Indexing structures · CPC title
Document management systems · CPC title
Presentation of query results · CPC title
using natural language analysis · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.