Text processing method and device based on ambiguous entity words
US-2019220749-A1 · Jul 18, 2019 · US
US11704492B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11704492-B2 |
| Application number | US-202117213927-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 26, 2021 |
| Priority date | Apr 23, 2020 |
| Publication date | Jul 18, 2023 |
| Grant date | Jul 18, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, apparatus, device, and storage medium for entity linking is disclosed. The method includes: acquiring a target text; determining at least one entity mention included in the target text; determining a candidate entity corresponding to each of the entity mention based on a preset knowledge base; determining a reference text of each of the candidate entity and determining additional feature information of each of the candidate entity; and determining an entity linking result based on the target text, each of the reference text, and each piece of the additional feature information, wherein determining the entity linking result includes determining a probability of linking each of the candidate entity to the entity mention based on a splicing of a first embedding vector and a second embedding vector of the target text and a splicing of a first embedding vector and a second embedding vector of each respective reference text.
Opening claim text (preview).
What is claimed is: 1. A method for entity linking, comprising: acquiring a target text; determining at least one entity mention included in the target text; determining a candidate entity corresponding to each of the entity mention based on a preset knowledge base; determining a reference text of each of the candidate entity and determining additional feature information of each of the candidate entity; and determining an entity linking result based on the target text, each of the reference text, and each piece of the additional feature information, wherein the determining the entity linking result based on the target text, each of the reference text, and each piece of the additional feature information comprises: determining a first embedding vector of the target text, a second embedding vector of the target text, a first embedding vector of each of the reference text, and a second embedding vector of each of the reference text respectively; splicing, for each reference text, the first embedding vector of the reference text, the second embedding vector of the reference text, and additional feature information of a candidate entity corresponding to the reference text, to obtain a first spliced vector; splicing the first embedding vector of the target text, the second embedding vector of the target text, and each of the first spliced vector, to obtain a second spliced vector; and determining a probability of linking each of the candidate entity to the entity mention based on each of the first spliced vector, the second spliced vector, and a preset classification model. 2. The method according to claim 1 , wherein the determining the at least one entity mention included in the target text comprises: determining a text embedding vector and a relevant eigenvector of the target text; fusing the text embedding vector and the relevant eigenvector to obtain a fused vector; and determining the at least one entity mention based on the fused vector. 3. The method according to claim 2 , wherein the determining the at least one entity mention based on the fused vector comprises: performing attention enhancement on the fused vector to obtain an enhanced vector; classifying the enhanced vector twice to obtain a head position and a tail position of each of the entity mention; and determining each of the entity mention based on the obtained head position and the obtained tail position. 4. The method according to claim 1 , wherein the determining the reference text of each of the candidate entity comprises: acquiring, for each candidate entity, at least one description text of the candidate entity; and splicing each of the description text to obtain the reference text of the candidate entity. 5. The method according to claim 1 , wherein the additional feature information comprises an entity embedding vector; and the determining the additional feature information of each of the candidate entity comprises: acquiring, for each candidate entity, description information of the candidate entity; acquiring a triplet sequence related to the candidate entity; and determining the entity embedding vector of the candidate entity based on the candidate entity, the description information, the triplet sequence, and a pretrained vector determining model. 6. The method according to claim 1 , wherein the additional feature information comprises at least one upperseat concept and a probability corresponding to each of the upperseat concept; and the determining the additional feature information of each of the candidate entity comprises: determining, for each candidate entity, at least one upperseat concept of the candidate entity and the probability corresponding to each of the upperseat concept based on the candidate entity and a preset concept predicting model, to obtain a probability sequence. 7. The method according to claim 1 , wherein the determining the first embedding vector of the target text, the second embedding vector of the target text, the first embedding vector of each of the reference text, and the second embedding vector of each of the reference text comprises: determining a word embedding vector of the target text, a character embedding vector of the target text, a word embedding vector of each of the reference text, and a character embedding vector of each of the reference text respectively; determining the first embedding vector of the target text based on the word embedding vector of the target text, the character embedding vector of the target text, and a first preset vector determining model; determining the second embedding vector of the target text based on the target text and a second preset vector determining model; and determining, for each reference text, the first embedding vector of the reference text based on the word embedding vector of the reference text, the character embedding vector of the reference text, and the first preset vector determining model; and determining the second embedding vector of the reference text based on the reference text and the second preset vector determining model. 8. An electronic device, comprising: at least one processor; and a memory communicatively connected to the at least one processor, wherein: the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, such that the at least one processor can perform operations comprising: acquiring a target text; determining at least one entity mention included in the target text; determining a candidate entity corresponding to each of the entity mention based on a preset knowledge base; determining a reference text of each of the candidate entity and determining additional feature information of each of the candidate entity; and determining an entity linking result based on the target text, each of the reference text, and each piece of the additional feature information, wherein the determining the entity linking result based on the target text, each of the reference text, and each piece of the additional feature information comprises: determining a first embedding vector of the target text, a second embedding vector of the target text, a first embedding vector of each of the reference text, and a second embedding vector of each of the reference text respectively; splicing, for each reference text, the first embedding vector of the reference text, the second embedding vector of the reference text, and additional feature information of a candidate entity corresponding to the reference text, to obtain a first spliced vector; splicing the first embedding vector of the target text, the second embedding vector of the target text, and each of the first spliced vector, to obtain a second spliced vector; and determining a probability of linking each of the candidate entity to the entity mention based on each of the first spliced vector, the second spliced vector, and a preset classification model. 9. The electronic device according to claim 8 , wherein the determining the at least one entity mention included in the target text comprises: determining a text embedding vector and a relevant eigenvector of the target text; fusing the text embedding vector and the relevant eigenvector to obtain a fused vector; and determining the at least one entity mention based on the fused vector. 10. The electronic device according to claim 9 , wherein the determining the at least one entity mention based on the fused vector comprises: performing attention enhancement on the fused vector to obtain an enhanced vector; classifying the enhanced vector twice to obtain a head position and a tail position of each of the entity mention
Supervised learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Named entity recognition · CPC title
using natural language analysis · CPC title
Semantic analysis · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.