What technology area does this patent fall under?

Primary CPC classification G06F8/73. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Cognitive system with ingestion of natural language documents with embedded code

US9606990B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9606990-B2
Application number	US-201514817345-A
Country	US
Kind code	B2
Filing date	Aug 4, 2015
Priority date	Aug 4, 2015
Publication date	Mar 28, 2017
Grant date	Mar 28, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Mechanisms are provided for processing natural language content having a computer code segment. Natural language content is processed using a natural language processing (NLP) engine and a segment of content within the natural language content is identified that is not recognized by the NLP engine. The segment is analyzed to determine whether the segment contains computer code and, if so, a code segment annotation for the computer code is generated that provides a natural language description of functionality of the computer code in the segment. The code segment annotation is stored in association with the natural language content and natural language processing is performed using the NLP engine on the code segment annotation to further process the natural language content.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, in a data processing system comprising a processor and a memory, for processing natural language content comprising a computer code segment, the method comprising: processing, by the data processing system, the natural language content using a natural language processing (NLP) engine; identifying, by the data processing system, a segment of content within the natural language content that is not recognized by the NLP engine; analyzing, by the data processing system, the segment to determine whether the segment contains computer code; in response to determining that the segment contains computer code, generating, by the data processing system, one or more code segment annotations for the computer code, wherein the one or more code segment annotations provide a natural language description of functionality of the computer code in the segment; storing, by the data processing system, the one or more code segment annotations in association with the natural language content; and performing, by the data processing system, natural language processing, using the NLP engine, on the one or more code segment annotations to further process the natural language content. 2. The method of claim 1 , wherein generating one or more code segment annotations comprises: analyzing a portion of content, within the natural language content, within a defined range of the segment, to identify references in the natural language text in the portion of content to the computer code or to elements within the computer code; and generating the natural language description of functionality of the computer code in the segment based on the identified references. 3. The method of claim 1 , wherein the one or more code segment annotations further comprise content references that point to relevant portions of the natural language content that explicitly or implicitly refer to the segment or elements of the computer code within the segment, code segment references that point to the segment or elements within the computer code within the segment that are referenced by other portions of the natural language content, and relationships between the content references and code segment references. 4. The method of claim 1 , wherein the one or more code segment annotations further comprise an identification of a type of programming language in which the computer code is written and identifiable features within the computer code. 5. The method of claim 1 , wherein identifying a segment of content within the natural language content that is not recognized by the NLP engine comprises identifying the segment as a segment that is not recognized by a slot grammar based parsing mechanism implemented by the data processing system. 6. The method of claim 1 , wherein analyzing the segment to determine whether the segment contains computer code comprises applying one or more code segment detection rules and patterns, for one or more computer programming languages, to content of the segment to determine if the segment contains computer code. 7. The method of claim 1 , wherein generating one or more code segment annotations for the computer code comprises performing a computer programming language detection operation on the computer code of the segment by matching at least one of key word, key phrase, computer language constructs, tags, formatting rules, or code patterns for one or more computer programming languages to elements of the computer code of the segment. 8. The method of claim 1 , wherein generating one or more code segment annotations for the computer code comprises: performing a first computer programming language detection operation based on a set of recognizable key terms or key phrases for at least one computer programming language to generate a first set of hypotheses, wherein each hypothesis specifies a potential computer programming language used to generate the computer code of the segment; calculating, for each hypothesis in the first set of hypotheses, a corresponding evidential score value indicating a likelihood that the computer code of the segment corresponds to a computer programming language of the hypothesis; and generating a code segment annotation specifying a determined computer programming language of the computer code of the segment based on the first set of hypotheses and the corresponding evidential score values. 9. The method of claim 8 , wherein generating one or more code segment annotations for the computer code comprises: performing a second computer programming language detection operation on the first set of hypotheses based on analysis of a window of natural language text appearing either before or after the segment in the natural language content to generate a second set of hypotheses and corresponding evidential scores; determining the computer programming language of the computer code of the segment based on the second set of hypotheses; and generating a code segment annotation specifying the determined computer programming language of the computer code of the segment. 10. The method of claim 8 , wherein generating one or more code segment annotations for the computer code comprises performing a literal translation of the computer code in the segment into the natural language description of the computer code based on predefined rules and patterns for mapping computer code constructs of the determined computer programming language into an equivalent natural language representation. 11. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to: process the natural language content using a natural language processing (NLP) engine of the computing device; identify a segment of content within the natural language content that is not recognized by the NLP engine; analyze the segment to determine whether the segment contains computer code; in response to determining that the segment contains computer code, generating one or more code segment annotations for the computer code, wherein the one or more code segment annotations provide a natural language description of functionality of the computer code in the segment; store the one or more code segment annotations in association with the natural language content; and perform natural language processing, using the NLP engine, on the one or more code segment annotations to further process the natural language content. 12. The computer program product of claim 11 , wherein generating one or more code segment annotations comprises: analyzing a portion of content, within the natural language content, within a defined range of the segment, to identify references in the natural language text in the portion of content to the computer code or to elements within the computer code; and generating the natural language description of functionality of the computer code in the segment based on the identified references. 13. The computer program product of claim 11 , wherein the one or more code segment annotations further comprise content references that point to relevant portions of the natural language content that explicitly or implicitly refer to the segment or elements of the computer code within the segment, code segment references that point to the segment or elements within the computer code within the segment that are referenced by other portions of the natural language content, and relationships between the content references and code segment references. 14. The computer program product of claim

Assignees

Inventors

Classifications

G06F8/73Primary
Program documentation · CPC title
G06F40/169
Annotation, e.g. comment data or footnotes · CPC title
G06F40/263
Language identification · CPC title
G06F40/205
Parsing · CPC title
G06F40/58Primary
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title

Patent family

Related publications grouped by family.

View patent family 58052508

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9606990B2 cover?: Mechanisms are provided for processing natural language content having a computer code segment. Natural language content is processed using a natural language processing (NLP) engine and a segment of content within the natural language content is identified that is not recognized by the NLP engine. The segment is analyzed to determine whether the segment contains computer code and, if so, a cod…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F8/73. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).