Semantic similarity analysis to determine relatedness of heterogeneous data

US10482178B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10482178-B2
Application numberUS-201715672643-A
CountryUS
Kind codeB2
Filing dateAug 9, 2017
Priority dateAug 9, 2017
Publication dateNov 19, 2019
Grant dateNov 19, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system to determine relatedness select a first customer observable from a first source document, the first customer observable being made up of two terms, the two terms being a first term of a first type and a first term of a second type, and select a second customer observable from a second source document, the second customer observable being made up of a second term of the first type and a second term of the second type. The method includes creating a first corpus of all documents that include the first terms, creating a second corpus of all documents that include the second terms, obtaining other first terms in the first corpus and other second in the second corpus, and performing semantic similarity analysis to determine a similarity score between the first customer observable and the second customer observable.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of determining relatedness of heterogeneous data, the method comprising: selecting a first customer observable from a first source document, the first customer observable being made up of two terms, the two terms being a first term of a first type and a first term of a second type; selecting a second customer observable from a second source document, the second customer observable being made up of a second term of the first type and a second term of the second type; creating a first corpus of all documents that include the first term of the first type and the first term of the second type; creating a second corpus of all documents that include the second term of the first type and the second term of the second type; obtaining other first terms of the first type and other first terms of the second type in the first corpus and other second terms of the first type and other second terms of the second type in the second corpus; and performing semantic similarity analysis using the first term of the first type, the other first terms of the first type, the second term of the first type, and the other second terms of the first type and the first term of the second type, the other first terms of the second type, the second term of the second type, and the other second terms of the second type to determine a similarity score between the first customer observable and the second customer observable. 2. The method according to claim 1 , further comprising applying a first filter to the first term of the first type, the other first terms of the first type, the first term of the second type, the other first terms of the second type, the second term of the first type, the other second terms of the first type, the second term of the second type, and the other second terms of the second type prior to the performing the semantic similarity analysis. 3. The method according to claim 1 , further comprising forming a first vector that includes the first term of the first type, the other first terms of the first type, the first term of the second type, and the other first terms of the second type, and forming a second vector that includes the second term of the first type, the other second terms of the first type, the second term of the second type, and the other second terms of the second type. 4. The method according to claim 3 , further comprising forming a first matrix from the first vector and forming a second matrix from the second vector. 5. The method according to claim 3 , further comprising obtaining a co-occurrence index value for each of the first term of the first type and the other first terms of the first type with every one of the first term of the second type and the other first terms of the second type, and obtaining a co-occurrence index value for each of the second term of the first type and the other second terms of the first types with every one of the second term of the second type and the other second terms of the second type. 6. The method according to claim 5 , wherein the obtaining the co-occurrence index values includes performing computations based on occurrences of the first term of the first type, the other first terms of the first type, the first term of the second type, and the other first terms of the second type in the first corpus, and occurrences of the second term of the first type, the other second terms of the first type, the second term of the second type, and the other second terms of the second type in the second corpus. 7. The method according to claim 3 , further comprising determining a term frequency (tf) and inverse document frequency (idf) of some or all elements of the first vector and some or all elements of the second vector. 8. The method according to claim 7 , wherein the determining the tf for a term, the term being one of the first term of the first type, the other first terms of the first type, the first term of the second type, the other first terms of the second type, the second term of the first type, the other second terms of the first type, the second term of the second type, or the other second terms of the second type, includes determining a total number of mentions of the term in the first corpus based on the term being one of the first term of the first type, the other first terms of the first type, the first term of the second type, or the other first terms of the second type and in the second corpus based on the term being one of the second term of the first type, the other second terms of the first type, the second term of the second type, or the other second terms of the second type, and the determining the idf for the term includes adding a nominal value to a computation based on a number of documents in which the term is mentioned. 9. The method according to claim 7 , further comprising determining the similarity score includes computing a cosine similarity or computing a Kullback-Leibler (KL) Divergence using a product of the tf and the idf. 10. The method according to claim 1 , wherein the determining the relatedness is performed iteratively by selecting a different second customer observable in each iteration. 11. A system to determine relatedness of heterogeneous data, the system comprising: a memory device configured to store a first corpus of all documents that include a first term of a first type and a first term of a second type and to store a second corpus of all documents that include a second term of the first type and a second term of the second type, wherein the first term of the first type and the first term of the second type comprise a first customer observable, and the second term of the first type and the second term of the second type comprise a second customer observable; and a processor configured to identify other first terms of the first type and other first terms of the second type in the first corpus, identify other second terms of the first type and other second terms of the second type in the second corpus, and perform semantic similarity analysis to determine a similarity score between the first customer observable and the second customer observable. 12. The system according to claim 11 , wherein the processor is further configured to apply a first filter to the first term of the first type, the other first terms of the first type, the first term of the second type, the other first terms of the second type, the second term of the first type, the other second terms of the first type, the second term of the second type, and the other second terms of the second type prior to the performing the semantic similarity analysis. 13. The system according to claim 11 , wherein the processor is further configured to form a first vector that includes the first term of the first type, the other first terms of the first type, the first term of the second type, and the other first terms of the second type, and to form a second vector that includes the second term of the first type, the other second terms of the first type, the second term of the second type, and the other second terms of the second type. 14. The system according to claim 13 , wherein the processor is further configured to form a first matrix from the first vector and form a second matrix from the second vector. 15. The system according to claim 13 , wherein the processor is further configured to obtain a co-occurrence index value for each of the first term of the first type and the other first terms of the first type with every one of the first term of the second type and the other first terms of the second type, and obtain a co-occurrence index value for each of the sec

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10482178B2 cover?
A method and system to determine relatedness select a first customer observable from a first source document, the first customer observable being made up of two terms, the two terms being a first term of a first type and a first term of a second type, and select a second customer observable from a second source document, the second customer observable being made up of a second term of the first…
Who is the assignee on this patent?
Gm Global Tech Operations Llc
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 19 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).