Gazetteer integration for neural named entity recognition
US-2023205999-A1 · Jun 29, 2023 · US
US11989518B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11989518-B2 |
| Application number | US-202117506726-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 21, 2021 |
| Priority date | Oct 22, 2020 |
| Publication date | May 21, 2024 |
| Grant date | May 21, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A normalized processing method of a named entity includes: obtaining first text data; recognizing a named entity from the first text data; determining whether a first standard named entity exists in a standard named entity database according to the named entity; determining the first standard named entity as a normalized representation of the named entity in response to determining that the first standard named entity exists in the standard named entity database; and obtaining a second standard named entity from the standard named entity database and determining an obtained second standard named entity as the normalized representation of the named entity in response to determining that the first standard named entity does not exist in the standard named entity database.
Opening claim text (preview).
What is claimed is: 1. A normalized processing method of a named entity, comprising: obtaining first text data; recognizing a named entity from the first text data; determining whether a first standard named entity exists in a standard named entity database according to the named entity, the first standard named entity being a standard named entity whose character string matches a character string of one of the named entity and an extended named entity, and the extended named entity being obtained by performing a synonym substitution on at least part of words of the named entity; determining the first standard named entity as a normalized representation of the named entity in response to determining that the first standard named entity exists in the standard named entity database; and obtaining a second standard named entity from the standard named entity database, and determining an obtained second standard named entity as the normalized representation of the named entity in response to determining that the first standard named entity does not exist in the standard named entity database, the second standard named entity being a standard named entity whose word vector similarity to the named entity in the standard named entity database satisfies a preset condition, wherein obtaining the second standard named entity from the standard named entity database, includes: determining a word vector similarity between each standard named entity in the standard named entity database and the named entity based on a word vector similarity matching algorithm; and determining the standard named entity whose word vector similarity to the named entity in the standard named entity database satisfies the preset condition as the second standard named entity, wherein determining the word vector similarity between each standard named entity in the standard named entity database and the named entity based on the word vector similarity matching algorithm, includes: calculating a length of a longest common subsequence of the named entity and each standard named entity in the standard named entity database; sequencing standard named entities in the standard named entity database to obtain a standard named entity candidate list according to lengths of the longest common subsequences; and sequentially inputting each standard named entity in the standard named entity candidate list and the named entity into a semantic model based on a word vector, so as to obtain the word vector similarity between the named entity and the standard named entity, wherein the semantic model based on the word vector includes a bi-directional encoder representation from transformers (BERT) model; and a fully connected layer of the BERT model is implemented by using a softmax classifier or a sigmoid classifier. 2. The normalized processing method according to claim 1 , wherein recognizing the named entity from the first text data, includes: deleting a first text in the first text data to obtain second text data, the first text including at least one stop word and/or at least one designated symbol; and recognizing the named entity from the second text data. 3. The normalized processing method according to claim 2 , wherein the second text data is a long text, and recognizing the named entity from the second text data, includes: using a first named entity recognition algorithm to recognize the named entity from the second text data, the first named entity recognition algorithm being a named entity recognition algorithm for the long text. 4. The normalized processing method according to claim 3 , wherein before recognizing the named entity from the second text data, recognizing the named entity from the first text data, further includes: determining whether a text length of the second text data is greater than a preset text length threshold; using the second text data as the long text in response to determining that the text length of the second text data is greater than the preset text length threshold. 5. The normalized processing method according to claim 3 , wherein the first named entity recognition algorithm includes a named entity recognition algorithm based on a bi-directional long-short term memory network (BiLSTM) and a conditional random field (CRF). 6. The normalized processing method according to claim 2 , wherein the second text data is a short text, and recognizing the named entity from the second text data, includes: using a second named entity recognition algorithm to recognize the named entity from the second text data, the second named entity recognition algorithm being a named entity recognition algorithm for the short text. 7. The normalized processing method according to claim 6 , wherein before recognizing the named entity from the second text data, recognizing the named entity from the first text data, further includes: determining whether a text length of the second text data is greater than a preset text length threshold; using the second text data as the short text in response to determining that the text length of the second text data is less than or equal to the preset text length threshold. 8. The normalized processing method according to claim 6 , wherein the second named entity recognition algorithm includes a named entity recognition algorithm based on a regular expression. 9. The normalized processing method according to claim 1 , wherein determining whether the first standard named entity exists in the standard named entity database according to the named entity, includes: searching for the standard named entity whose character string matches the character string of the named entity in the standard named entity database; and searching for the standard named entity whose character string matches the character string of the extended named entity in the standard named entity database in response to determining that the standard named entity whose character string matches the character string of the named entity is not found, wherein the found standard named entity whose character string matches the character string of the named entity or the extended named entity is used as the first standard named entity. 10. The normalized processing method according to claim 9 , wherein the extended named entity is obtained by performing a complete synonym substitution on the named entity, and the complete synonym substitution is a synonym substitution on the named entity as a whole. 11. The normalized processing method according to claim 9 , wherein the extended named entity is obtained by performing a partial synonym substitution on the named entity, and the partial synonym substitution is a synonym substitution on at least one named entity word segmentation obtained by performing a word segmentation processing on the named entity. 12. The normalized processing method according to claim 11 , wherein performing the partial synonym substitution on the named entity, includes: performing the word segmentation processing on the named entity to obtain a plurality of named entity word segmentations; and traversing a partial synonym mapping table according to the plurality of named entity word segmentations, and substituting at least one traversed named entity word segmentation for a synonym to obtain the extended named entity. 13. The normalized processing method according to claim 1 , wherein the preset condition is that the word vector similarity between the named entity and the standard named entity reaches a preset similarity threshold, or the preset condition is that the named entity and one standard named entity in the standard named entity database have a highest wo
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Supervised learning · CPC title
Named entity recognition · CPC title
using natural language analysis · CPC title
Thesauruses; Synonyms · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.