Method, apparatus, computer device and readable medium for knowledge hierarchical extraction of a text

US11514247B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11514247-B2
Application numberUS-201916713062-A
CountryUS
Kind codeB2
Filing dateDec 13, 2019
Priority dateMay 31, 2019
Publication dateNov 29, 2022
Grant dateNov 29, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, an apparatus, a computer device and a readable medium for knowledge hierarchical extraction of a text are disclosed. The method comprises: performing word segmentation on a designated text to obtain a word list, the word list including at least one word arranged in a sequence in the designated text; analyzing part-of-speech of each word in the word list in the designated text, to obtain a part-of-speech list corresponding to the word list; predicting a SPO triple included in the designated text according to the word list, the part-of-speech list and a pre-trained knowledge hierarchical extraction model. By the technical solutions, the SPO triple included in any designated text however loose its organization and structure is may be accurately extracted based on the pre-trained knowledge hierarchical extraction model. Compared to the prior art, the efficiency and accuracy of knowledge hierarchical extraction may be effectively improved.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for knowledge hierarchical extraction of a text, comprising: performing word segmentation on a designated text to obtain a word list, the word list including at least one word arranged in a sequence in the designated text; analyzing part-of-speech of each word in the word list in the designated text, to obtain a part-of-speech list corresponding to the word list; predicting a SPO triple included in the designated text according to the word list, the part-of-speech list and a pre-trained knowledge hierarchical extraction model, comprising: inputting the word list and the part-of-speech list into the knowledge hierarchical extraction model; obtaining, by an embedded layer, a word embedding expression based on the word list and a pre-trained word vector list obtaining a part-of-speech embedding expression based on the part-of-speech list and a pre-trained part-of-speech vector list obtaining, by a pre-trained Stacked Recurrent Neural Network layer, a bottom layer embedding expression which is of the designated text and carries context information, based on the word embedding expression and the part-of-speech embedding expression; and through two pre-trained fully-connected layers in turn, predicting a prediction relationship which is included in the designated text and whose prediction probability is greater than a preset probability threshold; further inputting the bottom layer embedding expression, the prediction probability of the prediction relationship and a feature expression corresponding to the prediction relationship into a pre-trained conditional random field network layer for sequence marking, so as to obtain an subject and an object corresponding to the prediction relationship; and outputting the SPO triple consisting of the subject, the object and the prediction relationship. 2. The method according to claim 1 , further comprising: judging, according to a preset parameter set, whether the SPO triple predicted complies with a SPO triple structure preset in the parameter set, the parameter set comprising at least one preset SPO triple structure, each preset SPO triple structure comprising content of a relationship, and types of a subject and an object; if the SPO triple predicted complies with a SPO triple structure preset in the parameter set, determining that the SPO triple predicted is a target SPO triple of the designated text; otherwise, if the SPO triple predicted does not comply with a SPO triple structure preset in the parameter set, deleting the SPO triple. 3. The method according to claim 1 , further comprising: before predicting the SPO triple included in the designated text according to the word list, the part-of-speech list and the pre-trained knowledge hierarchical extraction model, collecting a plurality of training texts and a known SPO triple included in each training text; training the knowledge hierarchical extraction model with the plurality of training texts and the known SPO triple included in each training text. 4. The method according to claim 3 , wherein the training the knowledge hierarchical extraction model with the plurality of training texts and the known SPO triple included in each training texts comprises: performing word segmentation for each training text to obtain a training word list; the training word list including at least one training word arranged in a sequence in the training text; analyzing a part-of-speech of each training word in the training word list in the training text to obtain a training part-of-speech list corresponding to the training word list; training the knowledge hierarchical extraction model according to the training word list and the training part-of-speech list of each training text and known SPO triple in each training text. 5. The method according to claim 4 , wherein the training the knowledge hierarchical extraction model according to the training word list and the training part-of-speech list of each training text and the known SPO triple in each training text comprises: initializing the word vector list, the part-of-speech vector list, parameters of the Stacked Recurrent Neural Network layer, parameters of the fully-connected layers and parameters of the conditional random field network layer in the knowledge hierarchical extraction model; inputting the training word list, the training part-of-speech list and the known SPO triple of each training text into the knowledge hierarchical extraction model, to obtain a predicted SPO triple output by the knowledge hierarchical extraction model; calculating a value of a loss function according to the known SPO triple and the predicted SPO triple; judging whether the value of the loss function is greater than or equal to a preset threshold; if the value of the loss function is greater than or equal to a preset threshold, adjusting the word vector list, the part-of-speech vector list, the parameters of the Stacked Recurrent Neural Network layer, the parameters of the fully-connected layers and the parameters of the conditional random field network layer in the knowledge hierarchical extraction model to make the value of the loss function smaller than the preset threshold; repeating the above steps, and constantly training the knowledge hierarchical extraction model with the training word list, the training part-of-speech list and the known SPO triple of each of the plurality of training texts in the above manner; if training times reach a preset training time threshold, or the value of the loss function is always smaller than a preset threshold within a range of consecutive preset times, determining the word vector list, the part-of-speech vector list, the parameters of the Stacked Recurrent Neural Network layer, the parameters of the fully-connected layers and the parameters of the conditional random field network layer in the knowledge hierarchical extraction model, and thereby determining the knowledge hierarchical extraction model. 6. The method according to claim 1 , wherein the pre-trained Stacked Recurrent Neural Network layer includes a plurality of LSTM units, allowing each layer of LSTM units to learn an output sequence of a previous layer in an alternate forward and backward sequence respectively. 7. A computer device, comprising: one or more processors, a memory for storing one or more programs, the one or more programs, when executed by said one or more processors, enable said one or more processors to implement a method for knowledge hierarchical extraction of a text, which comprises: performing word segmentation on a designated text to obtain a word list, the word list including at least one word arranged in a sequence in the designated text; analyzing part-of-speech of each word in the word list in the designated text, to obtain a part-of-speech list corresponding to the word list; predicting a SPO triple included in the designated text according to the word list, the part-of-speech list and a pre-trained knowledge hierarchical extraction model, comprising: inputting the word list and the part-of-speech list into the knowledge hierarchical extraction model; obtaining, by an embedded layer, a word embedding expression based on the word list and a pre-trained word vector list obtaining a part-of-speech embedding expression based on the part-of-speech list and a pre-trained part-of-speech vector list obtaining, by a pre-trained Stacked Recurrent Neural Network layer, a bottom layer embedding expression which is of the designated text and carries context information, based on the word embedding expression and the part-of-speech embedding expression; and through two pre-trained fully-connected layers in turn, predicting a prediction relationship which is included in the designated text and whose predict

Assignees

Inventors

Classifications

  • G06F40/205Primary

    Parsing · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

  • based on specific statistical tests · CPC title

  • characterised by the process organisation or structure, e.g. boosting cascade · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11514247B2 cover?
A method, an apparatus, a computer device and a readable medium for knowledge hierarchical extraction of a text are disclosed. The method comprises: performing word segmentation on a designated text to obtain a word list, the word list including at least one word arranged in a sequence in the designated text; analyzing part-of-speech of each word in the word list in the designated text, to obta…
Who is the assignee on this patent?
Baidu online network technology beijing co ltd
What technology area does this patent fall under?
Primary CPC classification G06F40/205. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 29 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).