What technology area does this patent fall under?

Primary CPC classification G06F40/35. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 21 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Automated natural language splitting for generation of knowledge graphs

US11822892B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11822892-B2
Application number	US-202017124451-A
Country	US
Kind code	B2
Filing date	Dec 16, 2020
Priority date	Dec 16, 2020
Publication date	Nov 21, 2023
Grant date	Nov 21, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Splitting a natural language sentence into primitive phrases retaining relations of terms includes receiving a natural language sentence, building a parse tree from the natural language sentence using a natural language parser, and recursively identifying discourse markers in subtrees of the parse tree, starting with the highest ranking discourse marker in the parse tree, thereby separating each of the respective subtrees at the respective discourse marker using a set of predefined rules until a set of basic subtrees remains. The recursive identification includes looking-ahead for identifying long ranging discourse markers before identifying local discourse markers.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for splitting a natural language sentence into primitive phrases retaining relations of terms, the method comprising: receiving, by one or more processors, a natural language sentence; building, by the one or more processors, a parse tree from the natural language sentence using a natural language parser; and recursively identifying, by the one or more processors, discourse markers in subtrees of the parse tree using a top-down tree automata, starting with a highest ranking discourse marker in the parse tree to separate each of a respective subtree at a respective discourse marker using a set of predefined rules until a set of basic subtrees remains, the remaining set of basic subtrees being discourse marker free, wherein the recursive identification of the discourse markers comprises looking-ahead for identifying long ranging discourse markers before identifying local discourse markers, the recursive identification of the discourse markers being carried out from one side of the parse tree to another side of the parse tree; combining, by the one or more processors, a first portion of the parse tree related to a discourse marker with a second portion of the parse tree excluding the first portion of the parse tree related to the discourse marker, the first portion of the parse tree being located directly before and directly after the discourse marker, the first portion of the parse tree related to the discourse marker partially overlapping with the second portion of the parse tree excluding the first portion of the parse tree related to the discourse marker; and building, by the one or more processors, a knowledge graph using relations of terms extracted from the set of basic subtrees as input, the knowledge graph being communicatively connected to a user interface unit. 2. The method according to claim 1 , wherein the top-down tree automata is used for the recursive identification of at least one of the discourse markers and the separation of each of the respective subtrees into the set of basic subtrees. 3. The method according to claim 1 , wherein each component of the set of basic subtree represents a basic relation of terms. 4. The method according to claim 1 , further comprising: recombining, by the one or more processors, basic phrases based on the set of basic subtrees. 5. The method according to claim 1 , wherein the knowledge graph comprises a domain specific knowledge graph. 6. The method according to claim 1 , wherein the user interface unit allows manually adapting the knowledge graph. 7. The method according to claim 1 , further comprising: resolving, by the one or more processors, co-references in the natural language sentence before performing the recursive identification. 8. The method according to claim 1 , further comprising: using, by the one or more processors, a configuration component for at least one of configuring domain specific terms and parameters for selecting a rule out of a set of rules for a discourse marker. 9. The method according to claim 1 , further comprising: separating, by the one or more processors, a longer natural language text into at least one of separate natural language sentences and natural language phrases. 10. The method according to claim 1 , wherein the parse tree is at least one of a constituency-based parse tree and a dependency-based parse tree. 11. The method according to claim 10 , further comprising: building, by the one or more processors, the constituency-based parse tree using as constituency parser at least one of a Benepar constituency parser, a Stanford coreNLP constituency parser, a Natural language toolkit constituency parser, and an AllenNLP constituency parser. 12. The method according to claim 1 , wherein the recursive identification of the discourse markers being carried out from one side of the parse tree to another side is performed in addition to a top-down approach, wherein carrying out the recursive identification from the one side of the parse tree to another side further includes proceeding from at least one of left to right and right to left of the parse tree with depth coming first. 13. A natural language splitting system for splitting a natural language sentence into primitive phrases retaining relations of terms, the natural language splitting system comprising: one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage devices, and program instructions stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, wherein the natural language splitting system is capable of performing a method comprising: receiving, by one or more processors, a natural language sentence; building, by the one or more processors, a parse tree from the natural language sentence using a natural language parser; and recursively identifying, by the one or more processors, discourse markers in subtrees of the parse tree using a top-down tree automata, starting with a highest ranking discourse marker in the parse tree to separate each of a respective subtree at a respective discourse marker using a set of predefined rules until a set of basic subtrees remains, the remaining set of basic subtrees being discourse marker free, wherein the recursive identification of the discourse markers comprises looking-ahead for identifying long ranging discourse markers before identifying local discourse markers, the recursive identification of the discourse markers being carried out from one side of the parse tree to another side of the parse tree; combining, by the one or more processors, a first portion of the parse tree related to a discourse marker with a second portion of the parse tree excluding the first portion of the parse tree related to the discourse marker, the first portion of the parse tree being located directly before and directly after the discourse marker, the first portion of the parse tree related to the discourse marker partially overlapping with the second portion of the parse tree excluding the first portion of the parse tree related to the discourse marker; and building, by the one or more processors, a knowledge graph using relations of terms extracted from the set of basic subtrees as input, the knowledge graph being communicatively connected to a user interface unit. 14. The natural language splitting system according to claim 13 , wherein the top-down tree automata is used for the recursive identification of at least one of the discourse markers and the separation of each of the respective subtrees into the set of basic subtrees. 15. The natural language splitting system according to claim 13 , wherein each component of the set of basic subtree represents a basic relation of terms. 16. The natural language splitting system according to claim 13 , further comprising: recombining, by the one or more processors, basic phrases based on the set of basic subtrees. 17. The natural language splitting system according to claim 13 , wherein the knowledge graph comprises a domain specific knowledge graph. 18. The natural language splitting system according to claim 13 , wherein the user interface unit allows manually adapting the knowledge graph. 19. The natural language splitting system according to claim 13 , further comprising: resolving, by the one or more processors, co-references in the natural language sentence before performing the recursive identification.

Assignees

Inventors

Classifications

G06F40/35Primary
Discourse or dialogue representation · CPC title
G06F40/205Primary
Parsing · CPC title
G06N5/02
Knowledge representation; Symbolic representation · CPC title
G06N5/025
Extracting rules from data · CPC title
G06N20/00
Machine learning · CPC title

Patent family

Related publications grouped by family.

View patent family 81942729

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11822892B2 cover?: Splitting a natural language sentence into primitive phrases retaining relations of terms includes receiving a natural language sentence, building a parse tree from the natural language sentence using a natural language parser, and recursively identifying discourse markers in subtrees of the parse tree, starting with the highest ranking discourse marker in the parse tree, thereby separating eac…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F40/35. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 21 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Subgraph guided knowledge graph question generation

Context saliency-based deictic parser for natural language processing

Coreference-aware representation learning for neural named entity recognition

Natural language processing and artificial intelligence based search system

Search indexing using discourse trees

System and method for semantic processing of natural language commands

Unsupervised learning of deep patterns for semantic parsing

Frequently asked questions