Computer-implemented method of and system for searching an inverted index having a plurality of posting lists
US-2016162574-A1 · Jun 9, 2016 · US
US10482178B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10482178-B2 |
| Application number | US-201715672643-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 9, 2017 |
| Priority date | Aug 9, 2017 |
| Publication date | Nov 19, 2019 |
| Grant date | Nov 19, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and system to determine relatedness select a first customer observable from a first source document, the first customer observable being made up of two terms, the two terms being a first term of a first type and a first term of a second type, and select a second customer observable from a second source document, the second customer observable being made up of a second term of the first type and a second term of the second type. The method includes creating a first corpus of all documents that include the first terms, creating a second corpus of all documents that include the second terms, obtaining other first terms in the first corpus and other second in the second corpus, and performing semantic similarity analysis to determine a similarity score between the first customer observable and the second customer observable.
Opening claim text (preview).
What is claimed is: 1. A method of determining relatedness of heterogeneous data, the method comprising: selecting a first customer observable from a first source document, the first customer observable being made up of two terms, the two terms being a first term of a first type and a first term of a second type; selecting a second customer observable from a second source document, the second customer observable being made up of a second term of the first type and a second term of the second type; creating a first corpus of all documents that include the first term of the first type and the first term of the second type; creating a second corpus of all documents that include the second term of the first type and the second term of the second type; obtaining other first terms of the first type and other first terms of the second type in the first corpus and other second terms of the first type and other second terms of the second type in the second corpus; and performing semantic similarity analysis using the first term of the first type, the other first terms of the first type, the second term of the first type, and the other second terms of the first type and the first term of the second type, the other first terms of the second type, the second term of the second type, and the other second terms of the second type to determine a similarity score between the first customer observable and the second customer observable. 2. The method according to claim 1 , further comprising applying a first filter to the first term of the first type, the other first terms of the first type, the first term of the second type, the other first terms of the second type, the second term of the first type, the other second terms of the first type, the second term of the second type, and the other second terms of the second type prior to the performing the semantic similarity analysis. 3. The method according to claim 1 , further comprising forming a first vector that includes the first term of the first type, the other first terms of the first type, the first term of the second type, and the other first terms of the second type, and forming a second vector that includes the second term of the first type, the other second terms of the first type, the second term of the second type, and the other second terms of the second type. 4. The method according to claim 3 , further comprising forming a first matrix from the first vector and forming a second matrix from the second vector. 5. The method according to claim 3 , further comprising obtaining a co-occurrence index value for each of the first term of the first type and the other first terms of the first type with every one of the first term of the second type and the other first terms of the second type, and obtaining a co-occurrence index value for each of the second term of the first type and the other second terms of the first types with every one of the second term of the second type and the other second terms of the second type. 6. The method according to claim 5 , wherein the obtaining the co-occurrence index values includes performing computations based on occurrences of the first term of the first type, the other first terms of the first type, the first term of the second type, and the other first terms of the second type in the first corpus, and occurrences of the second term of the first type, the other second terms of the first type, the second term of the second type, and the other second terms of the second type in the second corpus. 7. The method according to claim 3 , further comprising determining a term frequency (tf) and inverse document frequency (idf) of some or all elements of the first vector and some or all elements of the second vector. 8. The method according to claim 7 , wherein the determining the tf for a term, the term being one of the first term of the first type, the other first terms of the first type, the first term of the second type, the other first terms of the second type, the second term of the first type, the other second terms of the first type, the second term of the second type, or the other second terms of the second type, includes determining a total number of mentions of the term in the first corpus based on the term being one of the first term of the first type, the other first terms of the first type, the first term of the second type, or the other first terms of the second type and in the second corpus based on the term being one of the second term of the first type, the other second terms of the first type, the second term of the second type, or the other second terms of the second type, and the determining the idf for the term includes adding a nominal value to a computation based on a number of documents in which the term is mentioned. 9. The method according to claim 7 , further comprising determining the similarity score includes computing a cosine similarity or computing a Kullback-Leibler (KL) Divergence using a product of the tf and the idf. 10. The method according to claim 1 , wherein the determining the relatedness is performed iteratively by selecting a different second customer observable in each iteration. 11. A system to determine relatedness of heterogeneous data, the system comprising: a memory device configured to store a first corpus of all documents that include a first term of a first type and a first term of a second type and to store a second corpus of all documents that include a second term of the first type and a second term of the second type, wherein the first term of the first type and the first term of the second type comprise a first customer observable, and the second term of the first type and the second term of the second type comprise a second customer observable; and a processor configured to identify other first terms of the first type and other first terms of the second type in the first corpus, identify other second terms of the first type and other second terms of the second type in the second corpus, and perform semantic similarity analysis to determine a similarity score between the first customer observable and the second customer observable. 12. The system according to claim 11 , wherein the processor is further configured to apply a first filter to the first term of the first type, the other first terms of the first type, the first term of the second type, the other first terms of the second type, the second term of the first type, the other second terms of the first type, the second term of the second type, and the other second terms of the second type prior to the performing the semantic similarity analysis. 13. The system according to claim 11 , wherein the processor is further configured to form a first vector that includes the first term of the first type, the other first terms of the first type, the first term of the second type, and the other first terms of the second type, and to form a second vector that includes the second term of the first type, the other second terms of the first type, the second term of the second type, and the other second terms of the second type. 14. The system according to claim 13 , wherein the processor is further configured to form a first matrix from the first vector and form a second matrix from the second vector. 15. The system according to claim 13 , wherein the processor is further configured to obtain a co-occurrence index value for each of the first term of the first type and the other first terms of the first type with every one of the first term of the second type and the other first terms of the second type, and obtain a co-occurrence index value for each of the sec
Semantic analysis · CPC title
Document management systems · CPC title
Clustering or classification · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.