Automated natural language splitting for generation of knowledge graphs

US11822892B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11822892-B2
Application numberUS-202017124451-A
CountryUS
Kind codeB2
Filing dateDec 16, 2020
Priority dateDec 16, 2020
Publication dateNov 21, 2023
Grant dateNov 21, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Splitting a natural language sentence into primitive phrases retaining relations of terms includes receiving a natural language sentence, building a parse tree from the natural language sentence using a natural language parser, and recursively identifying discourse markers in subtrees of the parse tree, starting with the highest ranking discourse marker in the parse tree, thereby separating each of the respective subtrees at the respective discourse marker using a set of predefined rules until a set of basic subtrees remains. The recursive identification includes looking-ahead for identifying long ranging discourse markers before identifying local discourse markers.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for splitting a natural language sentence into primitive phrases retaining relations of terms, the method comprising: receiving, by one or more processors, a natural language sentence; building, by the one or more processors, a parse tree from the natural language sentence using a natural language parser; and recursively identifying, by the one or more processors, discourse markers in subtrees of the parse tree using a top-down tree automata, starting with a highest ranking discourse marker in the parse tree to separate each of a respective subtree at a respective discourse marker using a set of predefined rules until a set of basic subtrees remains, the remaining set of basic subtrees being discourse marker free, wherein the recursive identification of the discourse markers comprises looking-ahead for identifying long ranging discourse markers before identifying local discourse markers, the recursive identification of the discourse markers being carried out from one side of the parse tree to another side of the parse tree; combining, by the one or more processors, a first portion of the parse tree related to a discourse marker with a second portion of the parse tree excluding the first portion of the parse tree related to the discourse marker, the first portion of the parse tree being located directly before and directly after the discourse marker, the first portion of the parse tree related to the discourse marker partially overlapping with the second portion of the parse tree excluding the first portion of the parse tree related to the discourse marker; and building, by the one or more processors, a knowledge graph using relations of terms extracted from the set of basic subtrees as input, the knowledge graph being communicatively connected to a user interface unit. 2. The method according to claim 1 , wherein the top-down tree automata is used for the recursive identification of at least one of the discourse markers and the separation of each of the respective subtrees into the set of basic subtrees. 3. The method according to claim 1 , wherein each component of the set of basic subtree represents a basic relation of terms. 4. The method according to claim 1 , further comprising: recombining, by the one or more processors, basic phrases based on the set of basic subtrees. 5. The method according to claim 1 , wherein the knowledge graph comprises a domain specific knowledge graph. 6. The method according to claim 1 , wherein the user interface unit allows manually adapting the knowledge graph. 7. The method according to claim 1 , further comprising: resolving, by the one or more processors, co-references in the natural language sentence before performing the recursive identification. 8. The method according to claim 1 , further comprising: using, by the one or more processors, a configuration component for at least one of configuring domain specific terms and parameters for selecting a rule out of a set of rules for a discourse marker. 9. The method according to claim 1 , further comprising: separating, by the one or more processors, a longer natural language text into at least one of separate natural language sentences and natural language phrases. 10. The method according to claim 1 , wherein the parse tree is at least one of a constituency-based parse tree and a dependency-based parse tree. 11. The method according to claim 10 , further comprising: building, by the one or more processors, the constituency-based parse tree using as constituency parser at least one of a Benepar constituency parser, a Stanford coreNLP constituency parser, a Natural language toolkit constituency parser, and an AllenNLP constituency parser. 12. The method according to claim 1 , wherein the recursive identification of the discourse markers being carried out from one side of the parse tree to another side is performed in addition to a top-down approach, wherein carrying out the recursive identification from the one side of the parse tree to another side further includes proceeding from at least one of left to right and right to left of the parse tree with depth coming first. 13. A natural language splitting system for splitting a natural language sentence into primitive phrases retaining relations of terms, the natural language splitting system comprising: one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage devices, and program instructions stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, wherein the natural language splitting system is capable of performing a method comprising: receiving, by one or more processors, a natural language sentence; building, by the one or more processors, a parse tree from the natural language sentence using a natural language parser; and recursively identifying, by the one or more processors, discourse markers in subtrees of the parse tree using a top-down tree automata, starting with a highest ranking discourse marker in the parse tree to separate each of a respective subtree at a respective discourse marker using a set of predefined rules until a set of basic subtrees remains, the remaining set of basic subtrees being discourse marker free, wherein the recursive identification of the discourse markers comprises looking-ahead for identifying long ranging discourse markers before identifying local discourse markers, the recursive identification of the discourse markers being carried out from one side of the parse tree to another side of the parse tree; combining, by the one or more processors, a first portion of the parse tree related to a discourse marker with a second portion of the parse tree excluding the first portion of the parse tree related to the discourse marker, the first portion of the parse tree being located directly before and directly after the discourse marker, the first portion of the parse tree related to the discourse marker partially overlapping with the second portion of the parse tree excluding the first portion of the parse tree related to the discourse marker; and building, by the one or more processors, a knowledge graph using relations of terms extracted from the set of basic subtrees as input, the knowledge graph being communicatively connected to a user interface unit. 14. The natural language splitting system according to claim 13 , wherein the top-down tree automata is used for the recursive identification of at least one of the discourse markers and the separation of each of the respective subtrees into the set of basic subtrees. 15. The natural language splitting system according to claim 13 , wherein each component of the set of basic subtree represents a basic relation of terms. 16. The natural language splitting system according to claim 13 , further comprising: recombining, by the one or more processors, basic phrases based on the set of basic subtrees. 17. The natural language splitting system according to claim 13 , wherein the knowledge graph comprises a domain specific knowledge graph. 18. The natural language splitting system according to claim 13 , wherein the user interface unit allows manually adapting the knowledge graph. 19. The natural language splitting system according to claim 13 , further comprising: resolving, by the one or more processors, co-references in the natural language sentence before performing the recursive identification.

Assignees

Inventors

Classifications

  • G06F40/35Primary

    Discourse or dialogue representation · CPC title

  • G06F40/205Primary

    Parsing · CPC title

  • Knowledge representation; Symbolic representation · CPC title

  • Extracting rules from data · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11822892B2 cover?
Splitting a natural language sentence into primitive phrases retaining relations of terms includes receiving a natural language sentence, building a parse tree from the natural language sentence using a natural language parser, and recursively identifying discourse markers in subtrees of the parse tree, starting with the highest ranking discourse marker in the parse tree, thereby separating eac…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/35. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 21 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).