System and method for automatically expanding input text

US10402494B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10402494-B2
Application numberUS-201715439416-A
CountryUS
Kind codeB2
Filing dateFeb 22, 2017
Priority dateDec 6, 2016
Publication dateSep 3, 2019
Grant dateSep 3, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided is a method of automatically expanding input text. The method includes receiving input text composed of a plurality of documents, extracting a sentence pair that is present in different documents among the plurality of documents, setting the extracted sentence pair as an input of an encoder of a sequence-to-sequence model, setting an output of the encoder as an output of a decoder of the sequence-to-sequence model and generating a sentence corresponding to the input, and generating expanded text based on the generated sentence.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of automatically expanding input text, the method comprising: receiving input text composed of a plurality of documents; extracting a sentence pair that is present in different documents among the plurality of documents; setting the extracted sentence pair as an input of an encoder of a sequence-to-sequence model; setting an output of the encoder as an output of a decoder of the sequence-to-sequence model and generating a sentence corresponding to the input; and generating expanded text based on the generated sentence, wherein the extracting of a sentence pair that is present in different documents among the plurality of documents comprises: randomly extracting a plurality of sentence pairs that are present in different documents among the plurality of documents; and sorting the plurality of sentence pairs based on a distance between words of two sentences constituting each of the plurality of sentence pairs. 2. The method of claim 1 , further comprising: receiving learning text including a plurality of documents composed of a plurality of sentences; extracting a sentence chain from the learning text; and generating the sequence-to-sequence model, based on the extracted sentence chain, wherein the sentence chain is composed of a set including three sentences related to one document. 3. The method of claim 2 , wherein the extracting of a sentence chain from the learning text comprises extracting the sentence chain from each of the plurality of documents included in the learning text in sequence. 4. The method of claim 3 , wherein the extracting of a sentence chain from the learning text comprises: extracting a word chain from one sentence included in one of the plurality of documents; storing the extracted word chain in a word chain candidate list; determining whether the word chain is extracted from all words included in the one sentence; extracting at least one sentence chain corresponding to the word chain included in the word chain candidate list and storing the extracted sentence chain in a sentence chain list when the word chain is extracted from all of the words; determining whether the sentence chain is extracted from all sentences included in the one document; and outputting a sentence chain list corresponding to the document when the sentence chain is extracted from all of the sentences. 5. The method of claim 4 , wherein: the words included in the sentence are set as a vector value through word embedding; and positions of the words on a vector space are determined based on similarity between contexts thereof. 6. The method of claim 5 , wherein the extracting of a word chain comprises: detecting a word positioned closest to any one of the words included in the sentence on the vector space to generate a partial word chain; and detecting a word positioned closest to the partial word chain on the vector space to extract the word chain. 7. The method of claim 4 , wherein the extracting and storing of at least one sentence chain in a sentence chain list comprises sorting the sentence chain to correspond to a word chain selected from among word chains included in the word chain candidate list based on a priority that is based on a predetermined determination criterion and storing the sorted sentence chain in the sentence chain list. 8. The method of claim 2 , wherein the generating of the sequence-to-sequence model comprises: selecting a sentence chain for any one of the plurality of documents included in the learning text; setting two of the three related sentences included in the sentence chain as the input of the encoder of the sequence-to-sequence model and setting the remaining sentence as the output of the decoder; and learning the sentences set as the input and output to generate the sequence-to-sequence model. 9. The method of claim 1 , wherein the extracting of a sentence pair that is present in different documents among the plurality of documents comprises: sorting the plurality of sentence pairs based on a distance between words of two sentences constituting each of the plurality of sentence pairs. 10. The method of claim 1 , wherein the setting of the extracted sentence pair as an input of an encoder of a sequence-to-sequence model comprises setting the sorted sentence pairs as the input of the encoder in sorting order. 11. The method of claim 10 , wherein the generating of expanded text based on the generated sentence comprises: storing the generated sentence in a text expansion candidate list; filtering the text expansion candidate list based on similarity to a pre-generated language model; and generating the filtered text expansion candidate list as the expanded text. 12. The method of claim 11 , further comprising shuffling the plurality of sentences included in the input text, wherein after the generated sentence is stored in the text expansion candidate list and then included in the plurality of sentences, the shuffling is performed on the plurality of sentences. 13. The method of claim 2 , wherein the extracting of a sentence pair that is present in different documents among the plurality of documents comprises: embedding the extracted sentence pair based on a pre-learned recurrent neural network language model and expressing the embedded sentence pair as a vector; and learning a 1-hop model configured to use a sentence chain including two sentences constituting the sentence pair and a counter sentence chain corresponding to the sentence chain to classify the sentence chain and the counter sentence chain, wherein the 1-hop model configures the embedded sentence through layer P 1 and layer H 1 as a deep neural network model and obtains a resultant value to determine the sentence chain through layer O 1 . 14. The method of claim 13 , wherein the extracting of a sentence pair that is present in different documents among the plurality of documents comprises: randomly extracting a plurality of sentence pairs that are present in different documents among the plurality of documents; and sorting the plurality of sentence pairs based on the 1-hop model. 15. The method of claim 13 , further comprising learning a 2-hop model configured to use a sentence chain including the three related sentences and a counter sentence chain corresponding to the sentence chain to classify the sentence chain and the counter sentence chain, wherein the 2-hop model has an output value of the layer H 1 of the 1-hop model and an embedded sentence that is not included in the sentence pair as an input of layer P 2 and obtains a resultant value to determine the sentence chain through layer H 2 and layer O 2 . 16. The method of claim 13 , wherein the recurrent neural network language model is any one of a Long Short-Term Memory (LSTM) and a Gated Recurrent Unit (GRU). 17. A text expansion system configured to automatically expand input text, the text expansion system comprising: a communication module configured to transmit and receive data to and from an external device; a memory configured to store a program for generating expanded text from the input text; and a processor configured to execute the program stored in the memory, wherein, by executing the program, the processor extracts a sentence pair that is present in different documents among a plurality of documents when input text composed of the plurality of documents is received, inputs the extracted sentence pair to an encoder of a sequence-to-sequence model, generates a sentence corresponding to the input as an output of a decoder of

Assignees

Inventors

Classifications

  • G06F40/56Primary

    Natural language generation · CPC title

  • G06F40/289Primary

    Phrasal analysis, e.g. finite state techniques or chunking · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10402494B2 cover?
Provided is a method of automatically expanding input text. The method includes receiving input text composed of a plurality of documents, extracting a sentence pair that is present in different documents among the plurality of documents, setting the extracted sentence pair as an input of an encoder of a sequence-to-sequence model, setting an output of the encoder as an output of a decoder of t…
Who is the assignee on this patent?
Electronics & Telecommunications Res Inst
What technology area does this patent fall under?
Primary CPC classification G06F40/56. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 03 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).