Deep embedding for natural language content based on semantic dependencies

US10380259B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10380259-B2
Application numberUS-201715601016-A
CountryUS
Kind codeB2
Filing dateMay 22, 2017
Priority dateMay 22, 2017
Publication dateAug 13, 2019
Grant dateAug 13, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Mechanisms are provided to perform embedding of content of a natural language document. The mechanisms receive a document data object of an electronic document and analyze a structure of the electronic document to identify one or more structural document elements that have a relationship with the document data object. A dependency data structure is generated, representing the electronic document, where edges define relationships between document elements and at least one edge represents at least one relationship between the one or more structural document elements and the document data object. The mechanisms embed the document data object based on the at least one relationship to thereby represent the document data object as a vector data structure. The mechanisms perform natural language processing on the portion of natural language content based on the vector data structure. The one or more structural document elements are non-local non-contiguous with the document data object.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, in a data processing system comprising a processor and a memory, the memory comprising instructions that are executed by the processor to configure the processor to implement a natural language embedding engine, the method comprising: receiving, by the natural language embedding engine executing on the processor, a document data object of an electronic document; analyzing, by the natural language embedding engine, a structure of the electronic document to identify one or more structural document elements that have a relationship with the document data object; generating, by the natural language embedding engine, a dependency data structure representing the electronic document, wherein edges in the dependency data structure define relationships between document elements, and wherein at least one edge is generated in the dependency data structure to represent at least one relationship between the one or more structural document elements and the document data object; executing, by the natural language embedding engine, an embedding operation on the document data object based on the at least one relationship in the dependency data structure to thereby represent the document data object as a vector data structure; and performing, by a natural language processing engine executing in the data processing system, a natural language processing operation on the document data object based on the vector data structure, wherein the one or more structural document elements comprise one or more structural document elements that are non-local non-contiguous with the document data object, wherein the natural language processing system is a question and answer system, and wherein preforming the natural language processing operation on the document data object based on the vector data structure comprises performing, by the question and answer system, a question answering operation based on a received input natural language question, and generating at least one answer to the received input natural language question based on the vector data structure associated with the document data object. 2. The method of claim 1 , wherein the document data object is at least one of a natural language text data object comprising a portion of natural language textual content of the electronic document, or a non-natural language text data object representing an image, table, or other portion of non-textual content in the electronic document. 3. The method of claim 1 , wherein the document data object comprises a natural language sentence of the electronic document, and wherein the one or more structural document elements comprise at least one of a title of the electronic document or a section title of a section within the electronic document. 4. The method of claim 1 , wherein the document data object comprises an image or table in content of the electronic document, and wherein the at least one structural document element comprises a reference to the image or table. 5. The method of claim 1 , wherein the one or more structural document elements comprise at least one of: a link to another electronic document, wherein the at least one edge representing at least one relationship between the one or more structural document elements and the document data object comprises an edge representing a relationship between content of the other electronic document, and the document data object, or an association of the document data object with data in an external knowledge base, wherein the at least one edge representing at least one relationship between the one or more structural document elements and the document data object comprises an edge representing a relationship between content of the external knowledge base, and the document data object. 6. The method of claim 1 , wherein analyzing the structure of the electronic document to identify the one or more structural document elements that have a relationship with the document data object comprises applying one or more rules defining dependency relationships between various types of structural document elements and document data objects in content of electronic documents. 7. The method of claim 1 , wherein generating the dependency data structure comprises: generating edges as a dependency tuple having a first tuple element identifying a dependent document element, a second tuple element representing a dependency relationship, and a third tuple element representing a document element which depends on the first write element; and aggregating, for each document element in the electronic document, dependency triples referencing the document element. 8. The method of claim 1 , wherein executing an embedding operation on the document data object based on the at least one relationship in the dependency data structure to thereby represent the document data object as a vector data structure comprises: inputting the document data object into a trained neural network comprising a plurality of embedding encoders and at least one embedding decoder; processing, by the plurality of embedding encoders, the document data object to generate an embedded document data object comprising the vector data structure, wherein each embedding encoder performs an encoding operation on the document data object with respect to a different type of structural document element; and outputting, by the neural network, the embedded document data object to the natural language processing engine. 9. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed in a data processing system, causes data processing system to: receive, by a natural language embedding engine executing in the data processing system, a document data object of an electronic document; analyzing, by the natural language embedding engine, a structure of the electronic document to identify one or more structural document elements that have a relationship with the document data object; generate, by the natural language embedding engine, a dependency data structure representing the electronic document, wherein edges in the dependency data structure define relationships between document elements, and wherein at least one edge is generated in the dependency data structure to represent at least one relationship between the one or more structural document elements and the document data object; execute, by the natural language embedding engine, an embedding operation on the document data object based on the at least one relationship in the dependency data structure to thereby represent the document data object as a vector data structure; and perform, by a natural language processing engine executing in the data processing system, a natural language processing operation on the document data object based on the vector data structure, wherein the one or more structural document elements comprise one or more structural document elements that are non-local non-contiguous with the document data object, wherein the natural language processing system is a question and answer system, and wherein preforming the natural language processing operation on the document data object based on the vector data structure comprises performing, by the question and answer system, a question answering operation based on a received input natural language question, and generating at least one answer to the received input natural language question based on the vector data structure associated with the document data object. 10. The computer program product of claim 9 , wherein the document data object is at least one of a natural language text data object comprisin

Assignees

Inventors

Classifications

  • G06N5/022Primary

    Knowledge engineering; Knowledge acquisition · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Combinations of networks · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

  • Inference or reasoning models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10380259B2 cover?
Mechanisms are provided to perform embedding of content of a natural language document. The mechanisms receive a document data object of an electronic document and analyze a structure of the electronic document to identify one or more structural document elements that have a relationship with the document data object. A dependency data structure is generated, representing the electronic documen…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N5/022. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 13 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).