Normalized processing method and apparatus of named entity, and electronic device

US11989518B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11989518-B2
Application numberUS-202117506726-A
CountryUS
Kind codeB2
Filing dateOct 21, 2021
Priority dateOct 22, 2020
Publication dateMay 21, 2024
Grant dateMay 21, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A normalized processing method of a named entity includes: obtaining first text data; recognizing a named entity from the first text data; determining whether a first standard named entity exists in a standard named entity database according to the named entity; determining the first standard named entity as a normalized representation of the named entity in response to determining that the first standard named entity exists in the standard named entity database; and obtaining a second standard named entity from the standard named entity database and determining an obtained second standard named entity as the normalized representation of the named entity in response to determining that the first standard named entity does not exist in the standard named entity database.

First claim

Opening claim text (preview).

What is claimed is: 1. A normalized processing method of a named entity, comprising: obtaining first text data; recognizing a named entity from the first text data; determining whether a first standard named entity exists in a standard named entity database according to the named entity, the first standard named entity being a standard named entity whose character string matches a character string of one of the named entity and an extended named entity, and the extended named entity being obtained by performing a synonym substitution on at least part of words of the named entity; determining the first standard named entity as a normalized representation of the named entity in response to determining that the first standard named entity exists in the standard named entity database; and obtaining a second standard named entity from the standard named entity database, and determining an obtained second standard named entity as the normalized representation of the named entity in response to determining that the first standard named entity does not exist in the standard named entity database, the second standard named entity being a standard named entity whose word vector similarity to the named entity in the standard named entity database satisfies a preset condition, wherein obtaining the second standard named entity from the standard named entity database, includes: determining a word vector similarity between each standard named entity in the standard named entity database and the named entity based on a word vector similarity matching algorithm; and determining the standard named entity whose word vector similarity to the named entity in the standard named entity database satisfies the preset condition as the second standard named entity, wherein determining the word vector similarity between each standard named entity in the standard named entity database and the named entity based on the word vector similarity matching algorithm, includes: calculating a length of a longest common subsequence of the named entity and each standard named entity in the standard named entity database; sequencing standard named entities in the standard named entity database to obtain a standard named entity candidate list according to lengths of the longest common subsequences; and sequentially inputting each standard named entity in the standard named entity candidate list and the named entity into a semantic model based on a word vector, so as to obtain the word vector similarity between the named entity and the standard named entity, wherein the semantic model based on the word vector includes a bi-directional encoder representation from transformers (BERT) model; and a fully connected layer of the BERT model is implemented by using a softmax classifier or a sigmoid classifier. 2. The normalized processing method according to claim 1 , wherein recognizing the named entity from the first text data, includes: deleting a first text in the first text data to obtain second text data, the first text including at least one stop word and/or at least one designated symbol; and recognizing the named entity from the second text data. 3. The normalized processing method according to claim 2 , wherein the second text data is a long text, and recognizing the named entity from the second text data, includes: using a first named entity recognition algorithm to recognize the named entity from the second text data, the first named entity recognition algorithm being a named entity recognition algorithm for the long text. 4. The normalized processing method according to claim 3 , wherein before recognizing the named entity from the second text data, recognizing the named entity from the first text data, further includes: determining whether a text length of the second text data is greater than a preset text length threshold; using the second text data as the long text in response to determining that the text length of the second text data is greater than the preset text length threshold. 5. The normalized processing method according to claim 3 , wherein the first named entity recognition algorithm includes a named entity recognition algorithm based on a bi-directional long-short term memory network (BiLSTM) and a conditional random field (CRF). 6. The normalized processing method according to claim 2 , wherein the second text data is a short text, and recognizing the named entity from the second text data, includes: using a second named entity recognition algorithm to recognize the named entity from the second text data, the second named entity recognition algorithm being a named entity recognition algorithm for the short text. 7. The normalized processing method according to claim 6 , wherein before recognizing the named entity from the second text data, recognizing the named entity from the first text data, further includes: determining whether a text length of the second text data is greater than a preset text length threshold; using the second text data as the short text in response to determining that the text length of the second text data is less than or equal to the preset text length threshold. 8. The normalized processing method according to claim 6 , wherein the second named entity recognition algorithm includes a named entity recognition algorithm based on a regular expression. 9. The normalized processing method according to claim 1 , wherein determining whether the first standard named entity exists in the standard named entity database according to the named entity, includes: searching for the standard named entity whose character string matches the character string of the named entity in the standard named entity database; and searching for the standard named entity whose character string matches the character string of the extended named entity in the standard named entity database in response to determining that the standard named entity whose character string matches the character string of the named entity is not found, wherein the found standard named entity whose character string matches the character string of the named entity or the extended named entity is used as the first standard named entity. 10. The normalized processing method according to claim 9 , wherein the extended named entity is obtained by performing a complete synonym substitution on the named entity, and the complete synonym substitution is a synonym substitution on the named entity as a whole. 11. The normalized processing method according to claim 9 , wherein the extended named entity is obtained by performing a partial synonym substitution on the named entity, and the partial synonym substitution is a synonym substitution on at least one named entity word segmentation obtained by performing a word segmentation processing on the named entity. 12. The normalized processing method according to claim 11 , wherein performing the partial synonym substitution on the named entity, includes: performing the word segmentation processing on the named entity to obtain a plurality of named entity word segmentations; and traversing a partial synonym mapping table according to the plurality of named entity word segmentations, and substituting at least one traversed named entity word segmentation for a synonym to obtain the extended named entity. 13. The normalized processing method according to claim 1 , wherein the preset condition is that the word vector similarity between the named entity and the standard named entity reaches a preset similarity threshold, or the preset condition is that the named entity and one standard named entity in the standard named entity database have a highest wo

Assignees

Inventors

Classifications

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Supervised learning · CPC title

  • G06F40/295Primary

    Named entity recognition · CPC title

  • using natural language analysis · CPC title

  • Thesauruses; Synonyms · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11989518B2 cover?
A normalized processing method of a named entity includes: obtaining first text data; recognizing a named entity from the first text data; determining whether a first standard named entity exists in a standard named entity database according to the named entity; determining the first standard named entity as a normalized representation of the named entity in response to determining that the fir…
Who is the assignee on this patent?
Boe Technology Group Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F40/295. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 21 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).