Information extraction method, extraction model training method, apparatus and electronic device

US12079580B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12079580-B2
Application numberUS-202117348306-A
CountryUS
Kind codeB2
Filing dateJun 15, 2021
Priority dateNov 30, 2020
Publication dateSep 3, 2024
Grant dateSep 3, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An information extraction method, an extraction model training method, an apparatus and an electronic device all relate to knowledge graphs. A specific implementation includes acquiring an input text and determining a semantic vector of the input text according to the input text. Such implementation also includes inputting the semantic vector of the input text to a pre-acquired extraction model to obtain a first enhanced text of the input text. The first enhanced text is a text with a text score greater than a preset threshold output by the extraction model. The extraction model performs text extraction based on the semantic vector of the input text. Since the semantic vector has rich context semantics, the enhanced text extracted by the extraction model can be more in line with the context of the input text.

First claim

Opening claim text (preview).

What is claimed is: 1. An information extraction method, comprising: acquiring an input text; determining a semantic vector of the input text according to the input text; and inputting the semantic vector of the input text to a pre-acquired extraction model to obtain a first enhanced text of the input text; wherein the method further comprises: after inputting the semantic vector of the input text to the pre-acquired extraction model to obtain the first enhanced text of the input text; performing boundary correction on the first enhanced text according to the input text to obtain a target enhanced text; wherein determining the semantic vector of the input text according to the input text comprises: performing identification conversion on each word in the input text to obtain an identification sequence of identifications which correspond to words in a one-to-one manner; and inputting the identification sequence to a bidirectional encoder representation model from transformers to obtain the semantic vector of the input text. 2. The method according to claim 1 , wherein performing boundary correction on the first enhanced text according to the input text to obtain the target enhanced text comprises: performing word segmentation on the input text to obtain a word segmentation result; and performing boundary correction on first and last positions of the first enhanced text according to the word segmentation result to obtain the target enhanced text. 3. The method according to claim 2 , wherein performing boundary correction on the first and last positions of the first enhanced text according to the word segmentation result to obtain the target enhanced text comprises: in a case that the first or last position of the first enhanced text does not match the word segmentation result, supplementing the first or last position of the first enhanced text according to the word segmentation result to obtain the target enhanced text. 4. An electronic device, comprising at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to cause the at least one processor to implement: acquiring an input text; determining a semantic vector of the input text according to the input text; inputting the semantic vector of the input text to a pre-acquired extraction model to obtain a first enhanced text of the input text; wherein the instructions are executed by the at least one processor to cause the at least one processor to implement: performing boundary correction on the first enhanced text according to the input text to obtain a target enhanced text; wherein the instructions are executed by the at least one processor to cause the at least one processor to implement: performing identification conversion on each word in the input text to obtain an identification sequence of identifications which correspond to words in a one-to-one manner; and inputting the identification sequence to a bidirectional encoder representation model from transformers to obtain the semantic vector of the input text. 5. The electronic device according to claim 4 , wherein the instructions are executed by the at least one processor to cause the at least one processor to implement: performing word segmentation on the input text to obtain a word segmentation result; and performing boundary correction on first and last positions of the first enhanced text according to the word segmentation result to obtain the target enhanced text. 6. The electronic device according to claim 5 , wherein, the instructions are executed by the at least one processor to cause the at least one processor to implement: in a case that the first or last position of the first enhanced text does not match the word segmentation result, supplementing the first or last position of the first enhanced text according to the word segmentation result to obtain the target enhanced text. 7. A non-transitory computer readable storage medium storing computer instructions, wherein the instructions are configured to cause a computer to implement the method according to claim 1 .

Assignees

Inventors

Classifications

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Partitioning the feature space · CPC title

  • Matching criteria, e.g. proximity measures · CPC title

  • Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12079580B2 cover?
An information extraction method, an extraction model training method, an apparatus and an electronic device all relate to knowledge graphs. A specific implementation includes acquiring an input text and determining a semantic vector of the input text according to the input text. Such implementation also includes inputting the semantic vector of the input text to a pre-acquired extraction model…
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tech Co Ltd, Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 03 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).