Method, electronic device, and storage medium for entity linking by determining a linking probability based on splicing of embedding vectors of a target and a reference text

US11704492B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11704492-B2
Application numberUS-202117213927-A
CountryUS
Kind codeB2
Filing dateMar 26, 2021
Priority dateApr 23, 2020
Publication dateJul 18, 2023
Grant dateJul 18, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, apparatus, device, and storage medium for entity linking is disclosed. The method includes: acquiring a target text; determining at least one entity mention included in the target text; determining a candidate entity corresponding to each of the entity mention based on a preset knowledge base; determining a reference text of each of the candidate entity and determining additional feature information of each of the candidate entity; and determining an entity linking result based on the target text, each of the reference text, and each piece of the additional feature information, wherein determining the entity linking result includes determining a probability of linking each of the candidate entity to the entity mention based on a splicing of a first embedding vector and a second embedding vector of the target text and a splicing of a first embedding vector and a second embedding vector of each respective reference text.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for entity linking, comprising: acquiring a target text; determining at least one entity mention included in the target text; determining a candidate entity corresponding to each of the entity mention based on a preset knowledge base; determining a reference text of each of the candidate entity and determining additional feature information of each of the candidate entity; and determining an entity linking result based on the target text, each of the reference text, and each piece of the additional feature information, wherein the determining the entity linking result based on the target text, each of the reference text, and each piece of the additional feature information comprises: determining a first embedding vector of the target text, a second embedding vector of the target text, a first embedding vector of each of the reference text, and a second embedding vector of each of the reference text respectively; splicing, for each reference text, the first embedding vector of the reference text, the second embedding vector of the reference text, and additional feature information of a candidate entity corresponding to the reference text, to obtain a first spliced vector; splicing the first embedding vector of the target text, the second embedding vector of the target text, and each of the first spliced vector, to obtain a second spliced vector; and determining a probability of linking each of the candidate entity to the entity mention based on each of the first spliced vector, the second spliced vector, and a preset classification model. 2. The method according to claim 1 , wherein the determining the at least one entity mention included in the target text comprises: determining a text embedding vector and a relevant eigenvector of the target text; fusing the text embedding vector and the relevant eigenvector to obtain a fused vector; and determining the at least one entity mention based on the fused vector. 3. The method according to claim 2 , wherein the determining the at least one entity mention based on the fused vector comprises: performing attention enhancement on the fused vector to obtain an enhanced vector; classifying the enhanced vector twice to obtain a head position and a tail position of each of the entity mention; and determining each of the entity mention based on the obtained head position and the obtained tail position. 4. The method according to claim 1 , wherein the determining the reference text of each of the candidate entity comprises: acquiring, for each candidate entity, at least one description text of the candidate entity; and splicing each of the description text to obtain the reference text of the candidate entity. 5. The method according to claim 1 , wherein the additional feature information comprises an entity embedding vector; and the determining the additional feature information of each of the candidate entity comprises: acquiring, for each candidate entity, description information of the candidate entity; acquiring a triplet sequence related to the candidate entity; and determining the entity embedding vector of the candidate entity based on the candidate entity, the description information, the triplet sequence, and a pretrained vector determining model. 6. The method according to claim 1 , wherein the additional feature information comprises at least one upperseat concept and a probability corresponding to each of the upperseat concept; and the determining the additional feature information of each of the candidate entity comprises: determining, for each candidate entity, at least one upperseat concept of the candidate entity and the probability corresponding to each of the upperseat concept based on the candidate entity and a preset concept predicting model, to obtain a probability sequence. 7. The method according to claim 1 , wherein the determining the first embedding vector of the target text, the second embedding vector of the target text, the first embedding vector of each of the reference text, and the second embedding vector of each of the reference text comprises: determining a word embedding vector of the target text, a character embedding vector of the target text, a word embedding vector of each of the reference text, and a character embedding vector of each of the reference text respectively; determining the first embedding vector of the target text based on the word embedding vector of the target text, the character embedding vector of the target text, and a first preset vector determining model; determining the second embedding vector of the target text based on the target text and a second preset vector determining model; and determining, for each reference text, the first embedding vector of the reference text based on the word embedding vector of the reference text, the character embedding vector of the reference text, and the first preset vector determining model; and determining the second embedding vector of the reference text based on the reference text and the second preset vector determining model. 8. An electronic device, comprising: at least one processor; and a memory communicatively connected to the at least one processor, wherein: the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, such that the at least one processor can perform operations comprising: acquiring a target text; determining at least one entity mention included in the target text; determining a candidate entity corresponding to each of the entity mention based on a preset knowledge base; determining a reference text of each of the candidate entity and determining additional feature information of each of the candidate entity; and determining an entity linking result based on the target text, each of the reference text, and each piece of the additional feature information, wherein the determining the entity linking result based on the target text, each of the reference text, and each piece of the additional feature information comprises: determining a first embedding vector of the target text, a second embedding vector of the target text, a first embedding vector of each of the reference text, and a second embedding vector of each of the reference text respectively; splicing, for each reference text, the first embedding vector of the reference text, the second embedding vector of the reference text, and additional feature information of a candidate entity corresponding to the reference text, to obtain a first spliced vector; splicing the first embedding vector of the target text, the second embedding vector of the target text, and each of the first spliced vector, to obtain a second spliced vector; and determining a probability of linking each of the candidate entity to the entity mention based on each of the first spliced vector, the second spliced vector, and a preset classification model. 9. The electronic device according to claim 8 , wherein the determining the at least one entity mention included in the target text comprises: determining a text embedding vector and a relevant eigenvector of the target text; fusing the text embedding vector and the relevant eigenvector to obtain a fused vector; and determining the at least one entity mention based on the fused vector. 10. The electronic device according to claim 9 , wherein the determining the at least one entity mention based on the fused vector comprises: performing attention enhancement on the fused vector to obtain an enhanced vector; classifying the enhanced vector twice to obtain a head position and a tail position of each of the entity mention

Assignees

Inventors

Classifications

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • G06F40/295Primary

    Named entity recognition · CPC title

  • using natural language analysis · CPC title

  • Semantic analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11704492B2 cover?
A method, apparatus, device, and storage medium for entity linking is disclosed. The method includes: acquiring a target text; determining at least one entity mention included in the target text; determining a candidate entity corresponding to each of the entity mention based on a preset knowledge base; determining a reference text of each of the candidate entity and determining additional feat…
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F40/295. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 18 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).