Named entity disambiguation using entity distance in a knowledge graph
US-2020342055-A1 · Oct 29, 2020 · US
US11386335B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11386335-B2 |
| Application number | US-201916590738-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 2, 2019 |
| Priority date | Oct 2, 2019 |
| Publication date | Jul 12, 2022 |
| Grant date | Jul 12, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Complex computer system architectures are described for analyzing data elements of a knowledge graph, and predicting new facts from relational learning applied to the knowledge graph. This discovery process includes converting the knowledge graph into a set of candidate embeddings spaces to apply further analysis to rank the set of candidate embeddings spaces, where the top ranked candidate embeddings spaces are further processed to identify the new facts.
Opening claim text (preview).
What is claimed is: 1. A computing device comprising: a data reception circuitry configured to receive a knowledge graph including a set of structured data; a knowledge graph embedding circuitry configured to: select a sampled set of vector triples from the knowledge graph; convert the knowledge graph to a set of candidate embeddings spaces, wherein each of the candidate embeddings spaces includes a different set of vectors representing the sampled set of vector triples from the knowledge graph; and a processing circuitry configured to: calculate a weighted mean reciprocal rank of the sampled set of vector triples for each of the candidate embeddings spaces; determine a respective fact score for each of the candidate embeddings spaces based on the calculated weighted mean reciprocal rank; rank the candidate embeddings spaces according to their respective fact scores; and select a predetermined number of the highest ranked candidate embeddings spaces when a stopping condition is met. 2. The computing device of claim 1 , wherein the processing circuitry is further configured to: obtain a set of candidate vector triples, wherein each vector triple represents a different candidate fact; determine prediction scores for the set of candidate vector triples in view of the selected highest ranked candidate embeddings spaces; and select a vector triple from the set of candidate vector triples as a new fact based on the prediction scores. 3. The computing device of claim 1 , wherein the stopping condition includes at least one of when a fact score of at least one of the candidate embeddings spaces if greater than a predetermined selection threshold, when a predetermined period has passed, or when an improvement in the fact scores for plurality of candidate embeddings spaces is less than a predetermined amount compared to a previous iteration. 4. The computing device of claim 1 , wherein the processing circuitry is further configured to: generate a new set of candidate embeddings spaces according to a search strategy when a stopping condition is not met, wherein the search strategy comprises both an exploration search strategy and an exploitation search strategy; calculate a weighted mean reciprocal rank of the sampled set of vector triples for each of the new set of candidate embeddings spaces; determine a respective fact score for each of the new set of candidate embeddings spaces based on the weighted mean reciprocal rank calculation; rank the new set of candidate embeddings spaces according to their respective fact scores; and select the predetermined number of the highest ranked new candidate embeddings spaces when the stopping condition is met. 5. The computing device of claim 4 , wherein the processing circuitry is configured to generate a new candidate embeddings space according to the exploration search strategy by combining at least two candidate embeddings spaces from the selected highest ranked candidate embeddings spaces as a weighted average of the at least two candidate embeddings spaces from the selected highest ranked candidate embeddings spaces. 6. The computing device of claim 5 , wherein the processing circuitry is configured to determine a weighted average of nodes in each of the at least two candidate embeddings spaces. 7. The computing device of claim 4 , wherein the processing circuitry is configured to generate a new candidate embeddings space according to the exploitation search strategy by modifying a position of a vector included in a highest ranked candidate embeddings space. 8. The computing device of claim 7 , wherein the processing circuitry is configured to determine the modified position of the vector based on a position of other connected nodes. 9. The computing device of claim 4 , wherein the processing circuitry is configured to generate the new set of candidate embeddings spaces according to a predetermined ratio that includes a first predetermined number of new candidate embeddings spaces using the exploration search strategy and a second predetermined number of new candidate embeddings spaces using the exploitation search strategy. 10. The computing device of claim 1 , wherein the processing circuitry is further configured to: determine an initial ranking score for each combination of the sampled set of vector triples for each of the candidate embeddings spaces; determine a mean of the initial ranking scores for each of the candidate embeddings spaces; determine a corner case weighting for each combination of the sampled set of vector triples; and wherein the weighted mean reciprocal rank is calculated based on the mean of the initial ranking scores and the corner case weighting. 11. A method comprising: receiving, by a reception circuitry, a knowledge graph including a set of structured data; selecting, by a knowledge graph embedding circuitry, a sampled set of vector triples from the knowledge graph; converting, by the knowledge graph embedding circuitry, the knowledge graph to a set of candidate embeddings spaces, wherein each of the candidate embeddings spaces includes a different set of vectors representing the sampled set of vector triples from the knowledge graph; calculating, by a processing circuitry, a weighted mean reciprocal rank of the sampled set of vector triples for each of the candidate embeddings spaces; determining, by the processing circuitry, a respective fact score for each of the candidate embeddings spaces based on the calculated weighted mean reciprocal rank; ranking, by the processing circuitry, the candidate embeddings spaces according to their respective fact scores; and selecting, by the processing circuitry, a predetermined number of the highest ranked candidate embeddings spaces when a stopping condition is met. 12. The method of claim 11 , further comprising: obtaining, by the processing circuitry, a set of candidate vector triples, wherein each vector triple represents a different candidate fact; determining, by the processing circuitry, prediction scores for the set of candidate vector triples in view of the selected highest ranked candidate embeddings spaces; and selecting, by the processing circuitry, a vector triple from the set of candidate vector triples as a new fact based on the prediction scores. 13. The method of claim 11 , wherein the stopping condition includes at least one of when a fact score of at least one of the candidate embeddings spaces if greater than a predetermined selection threshold, when a predetermined period has passed, or when an improvement in the fact scores for plurality of candidate embeddings spaces is less than a predetermined amount compared to a previous iteration. 14. The method of claim 11 , further comprising: generating, by the processing circuitry, a new set of candidate embeddings spaces according to a search strategy when a stopping condition is not met, wherein the search strategy comprises both an exploration search strategy and an exploitation search strategy; calculating, by the processing circuitry, a weighted mean reciprocal rank of the sampled set of vector triples for each of the new set of candidate embeddings spaces; determining, by the processing circuitry, a respective fact score for each of the new set of candidate embeddings spaces based on the weighted mean reciprocal rank calculation; ranking, by the processing circuitry, the new set of candidate embeddings spaces according to their respective fact scores; and selecting, by the processing circuitry, the predetermined number of the highest ranked new candidate embeddings spaces when the stopping condition is met.
Machine learning · CPC title
using ranking · CPC title
for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title
Knowledge engineering; Knowledge acquisition · CPC title
Vectors, bitmaps or matrices · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.