Creating machine learning models from structured intelligence databases

US2019102697A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2019102697-A1
Application numberUS-201715722196-A
CountryUS
Kind codeA1
Filing dateOct 2, 2017
Priority dateOct 2, 2017
Publication dateApr 4, 2019
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An approach for creating an artificial intelligence machine learning model is provided. In an embodiment, a set of unstructured documents stored in an intelligence database is selected. Attributes associated with entities contained in the selected unstructured documents are retrieved from structured data that is also stored within the intelligence database. In addition, a natural language scan of the unstructured documents is performed to identify relationships between the entities. These relationships and the attributes are used to annotate the originally selected documents. Then the machine learning model is automatically created based on the annotated documents. This machine learning model can be used to train an AI to perform a specific set of problem solving tasks.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for creating an artificial intelligence machine learning model, comprising: selecting a set of unstructured documents stored in an intelligence database; retrieving attributes associated with a set of entities in the set of unstructured documents from structured data within the intelligence database; performing a natural language scan of the unstructured documents to identify relationships between the entities; annotating the unstructured documents with the attributes and the relationships; and forming the machine learning model based on the annotated documents. 2 . The method of claim 1 , the method further comprising: forwarding the unstructured documents to an external tokenizer; retrieving, from the external tokenizer, a set of extracted words that are nouns from the unstructured documents; and designating the set of extracted words as the set of entities. 3 . The method of claim 1 , wherein the attributes are retrieved from the intelligence database include attribute names for the entities in the structured data. 4 . The method of claim 3 , wherein the attributes further include an entity to which an entity belongs, an attribute type, a relationship to a document, a semantic of an attribute, a semantic of the entity, and a value of an attribute. 5 . The method of claim 1 , wherein the identifying of the relationship further comprises analyzing a set of words in an unstructured document that connect a first entity and a second entity within the unstructured document, and wherein the annotating further comprises documenting the relationship in a first token associated with the first entity and in a second token associated with a second entity. 6 . The method of claim 1 , further comprising training the artificial intelligence using the machine learning model. 7 . The method of claim 1 , further comprising parsing, prior to the forming of the machine language model, the annotated documents to remove from a document unannotated portions of the document. 8 . A system for creating an artificial intelligence machine learning model, comprising: a memory medium comprising instructions; a bus coupled to the memory medium; and a processor coupled to the bus that when executing the instructions causes the system to: select a set of unstructured documents stored in an intelligence database; retrieve attributes associated with a set of entities in the set of unstructured documents from structured data within the intelligence database; perform a natural language scan of the unstructured documents to identify relationships between the entities; annotate the unstructured documents with the attributes and the relationships; and form the machine learning model based on the annotated documents. 9 . The system of claim 8 , the instructions further causing the system to: forward the unstructured documents to an external tokenizer; retrieve, from the external tokenizer, a set of extracted words that are nouns from the unstructured documents; and designate the set of extracted words as the set of entities. 10 . The system of claim 8 , wherein the attributes retrieved from the intelligence database include attribute names for the entities in the structured data. 11 . The system of claim 10 , wherein the attributes further include an entity to which an entity belongs, an attribute type, a relationship to a document, a semantic of an attribute, a semantic of the entity, and a value of an attribute. 12 . The system of claim 8 , wherein the identifying of the relationship further comprises analyzing a set of words in an unstructured document that connect a first entity and a second entity within the unstructured document, and wherein the annotating further comprises documenting the relationship in a first token associated with the first entity and in a second token associated with a second entity. 13 . The system of claim 8 , the instructions further causing the system to train the artificial intelligence using the machine learning model. 14 . The system of claim 8 , the instructions further causing the system to parse, prior to the forming of the machine language model, the annotated documents to remove from a document unannotated portions of the document. 15 . A computer program product for creating an artificial intelligence machine learning model, the computer program product comprising a computer readable storage media, and program instructions stored on the computer readable storage media, that cause at least one computer device to: select a set of unstructured documents stored in an intelligence database; retrieve attributes associated with a set of entities in the set of unstructured documents from structured data within the intelligence database; perform a natural language scan of the unstructured documents to identify relationships between the entities; annotate the unstructured documents with the attributes and the relationships; and form the machine learning model based on the annotated documents. 16 . The computer program product of claim 15 , the instructions further causing the at least one computer device to: forward the unstructured documents to an external tokenizer; retrieve, from the external tokenizer, a set of extracted words that are nouns from the unstructured documents; and designate the set of extracted words as the set of entities. 17 . The computer program product of claim 16 , wherein the attributes retrieved from the intelligence database include attribute names for the entities in the structured data, an entity to which an entity belongs, an attribute type, a relationship to a document, a semantic of an attribute, a semantic of the entity, and a value of an attribute. 18 . The computer program product of claim 15 , wherein the identifying of the relationship further comprises analyzing a set of words in an unstructured document that connect a first entity and a second entity within the unstructured document, and wherein the annotating further comprises documenting the relationship in a first token associated with the first entity and in a second token associated with a second entity. 19 . The computer program product of claim 15 , the instructions further causing the at least one computer device to train the artificial intelligence using the machine learning model. 20 . The computer program product of claim 15 , the instructions further causing the at least one computer device to parse, prior to the forming of the machine language model, the annotated documents to remove from a document unannotated portions of the document.

Assignees

Inventors

Classifications

  • Handling natural language data (speech analysis or synthesis, speech recognition G10L) · CPC title

  • Natural language analysis (semantic analysis of natural language G06F40/30) · CPC title

  • Frames · CPC title

  • Entity relationship models · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2019102697A1 cover?
An approach for creating an artificial intelligence machine learning model is provided. In an embodiment, a set of unstructured documents stored in an intelligence database is selected. Attributes associated with entities contained in the selected unstructured documents are retrieved from structured data that is also stored within the intelligence database. In addition, a natural language scan …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N99/005. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 04 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).