System and method for quality evaluation of collaborative text inputs
US-10482176-B2 · Nov 19, 2019 · US
US11615242B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11615242-B2 |
| Application number | US-202016940703-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 28, 2020 |
| Priority date | Dec 20, 2019 |
| Publication date | Mar 28, 2023 |
| Grant date | Mar 28, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and an apparatus for structuring data are related to information processing technologies in the field of natural language processing. By acquiring an unstructured text and inputting the unstructured text into an encoder-decoder model, an output sequence is obtained. The encoder-decoder model is trained using a training text marked with the attribute value of each attribute. A structured representation is generated based on the attributes corresponding to the attribute elements included in the output sequence and the attribute values comprised in the attribute elements.
Opening claim text (preview).
What is claimed is: 1. A method for structuring data, comprising: acquiring an unstructured text; inputting the unstructured text into an encoder-decoder model to obtain an output sequence, wherein the output sequence comprises a plurality of attribute elements, each attribute element corresponds to a respective attribute, and each attribute element comprises an attribute value of the respective attribute, wherein the encoder-decoder model is trained using a training text marked with the attribute value of each attribute; and generating a structured representation based on the attributes corresponding to the attribute elements comprised in the output sequence and the attribute values comprised in the attribute elements, wherein the encoder-decoder model comprises an encoder and a decoder, and inputting the unstructured text into the encoder-decoder model to obtain the output sequence comprises: performing a word segmentation on the unstructured text to obtain a plurality of word elements; sorting the plurality of word elements in order, to obtain an input sequence; inputting the word elements of the input sequence into the encoder to semantically encode the word elements to obtain a hidden state vector of each word element, wherein the hidden state vector indicates semantics of the respective word element and a context thereof; and decoding each hidden state vector by the decoder to obtain the attribute values of the output sequence, wherein the decoder has learned an attention weight of each hidden state vector with respect to each attribute value and a mapping relation between the hidden state vector that is weighted by the attention weight and the attribute value. 2. The method of claim 1 , wherein the output sequence is in a data exchange format, the output sequence in the data exchange format comprises at least one object, and each object comprises a plurality of attribute elements, wherein before inputting the unstructured text into the encoder-decoder model to obtain the output sequence, the method further comprises: acquiring a plurality of training texts, wherein each training text has marked information in the data exchange format, the marked information comprises at least one object corresponding to an entity described by the training text, and each object comprises the attribute value of the attribute for describing the entity, wherein an order of the attribute values of the attributes in the object is the same as an order of the attribute elements of the attributes in the output sequence; and training the encoder-decoder model by adopting the plurality of training texts to minimize an error between the output sequence of the encoder-decoder model and the marked information. 3. The method of claim 2 , wherein generating the structured representation based on the attributes corresponding to the attribute elements comprised in the output sequence and the attribute values comprised in the attribute elements comprises: for each object, extracting attribute elements belonging to the object from the output sequence in the data exchange format; generating the structured representation of the object based on the attribute value of each attribute comprised in the attribute elements extracted; and generating the structured representation of the unstructured text based on the structured representation of each object. 4. The method of claim 2 , wherein the attribute value of each attribute is one of a text position and an actual text, the attribute value is determined based on a value range of the attribute, and in cases that the value range is limited, the attribute value is the actual text, and in cases that the value range is unlimited, the attribute value is the text position, wherein before generating the structured representation, the method further comprises: for each attribute element, in cases that the attribute value is the text position, updating the attribute value to the word element at the text position in the unstructured text. 5. The method of claim 1 , wherein sorting the plurality of word elements in order, to obtain the input sequence comprises: inputting the plurality of word elements into an entity recognition model, to obtain an entity label of each word element; and splicing each word element with a respective entity label as a word element of the input sequence. 6. A computer device, comprising: at least one processor; and a memory, communicatively coupled to the at least one processor, wherein the memory has instructions executable by the at least one processor stored therein, when the instructions are executed by the at least one processor, wherein the at least one processor is configured to: acquire an unstructured text; input the unstructured text into an encoder-decoder model to obtain an output sequence, wherein the output sequence comprises a plurality of attribute elements, each attribute element corresponds to a respective attribute, and each attribute element comprises an attribute value of the respective attribute, wherein the encoder-decoder model is trained using a training text marked with the attribute value of each attribute; and generate a structured representation based on the attributes corresponding to the attribute elements comprised in the output sequence and the attribute values comprised in the attribute elements, wherein the encoder-decoder model comprises an encoder and a decoder, and the at least one processor is further configured to: perform a word segmentation on the unstructured text to obtain a plurality of word elements; sort the plurality of word elements in order, to obtain an input sequence; input the word elements of the input sequence into the encoder to semantically encode the word elements to obtain a hidden state vector of each word element, wherein the hidden state vector indicates semantics of the respective word element and a context thereof; and decode each hidden state vector by adopting the decoder to obtain the attribute values of the output sequence, wherein the decoder has learned an attention weight of each hidden state vector with respect to each attribute value and a mapping relation between the hidden state vector that is weighted by the attention weight and the attribute value. 7. The computer device of claim 6 , wherein the output sequence is in a data exchange format, the output sequence in the data exchange format comprises at least one object, and each object comprises a plurality of attribute elements, wherein the at least one processor is further configured to: acquire a plurality of training texts, wherein each training text has marked information in the data exchange format, the marked information comprises at least one object corresponding to an entity described by the training text, and each object comprises the attribute value of the attribute for describing the entity, wherein an order of the attribute values of the attributes of the object is the same as an order of the attribute elements of the attributes in the output sequence; and train the encoder-decoder model by adopting the plurality of training texts to minimize an error between the output sequence of the encoder-decoder model and the marked information. 8. The computer device of claim 7 , wherein the at least one processor is further configured to: for each object, extract attribute elements belonging to the object from the output sequence in the data exchange format; generate the structured representation of the object based on the attribute value of each attribute comprised in the attribute elements extracted; and generate the structured representation of the unstructured text based on the structured representation of each object. 9. The comput
Convolutional networks [CNN, ConvNet] · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Supervised learning · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.