Who is the assignee on this patent?

Fink Patrick W, Mcneil Kristin E, Parker Philip E, and 1 more

What technology area does this patent fall under?

Primary CPC classification G06F40/268. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 20 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Disambiguating words within a text segment

US9684648B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9684648-B2
Application number	US-201213485001-A
Country	US
Kind code	B2
Filing date	May 31, 2012
Priority date	May 31, 2012
Publication date	Jun 20, 2017
Grant date	Jun 20, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Determining a subject type for an entity in a text segment. A text segment is selected, which includes one or more single-word or multi-word entities. Natural language processing is performed on the selected text segment to identify entities that constitute subjects of the selected text segment. One entity is selected. A variant annotation is associated with the selected entity. The variant annotation reflects multiple subject types for the selected entity and a value for each subject type. The most probable subject type is determined for the selected entity, based on a combination of natural language processing rules and dictionary listings. The value of the annotation is incremented for the subject type corresponding to the most probable subject type for the selected entity, so that the highest value of the annotation indicates the most probable subject type for the selected entity within the selected text segment.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method for determining a subject type for an entity in a text segment, the method comprising: selecting, by a computer processor, a text segment, wherein the text segment includes one or more single-word or multi-word entities; performing, by the computer processor, natural language processing on the selected text segment to identify one or more entities that constitute subjects of the selected text segment; selecting, by the computer processor, one of the identified entities; associating, by the computer processor, a variant annotation with the selected entity, wherein the variant annotation is operable to reflect multiple subject types for the selected entity and a value for each subject type; determining, by the computer processor, the most probable subject type for the selected entity, based on a combination of natural language processing rules and dictionary listings, by examining single-word and multi-word non-subject entities that are located in spatial proximity to the selected entity to obtain information as to possible subject types for the selected entity; and incrementing, by the computer processor, the value of the annotation for the subject type corresponding to the most probable subject type for the selected entity, whereby the highest value of the annotation indicates the most probable subject type for the selected entity within the selected text segment, wherein in the event that a most probable subject type cannot be determined for the selected entity, incrementing the value of the annotation for two or more probable subject types for the selected entity to determine the most probable subject type for the selected entity. 2. The method of claim 1 , wherein each dictionary includes a list of words that share a common trait. 3. The method of claim 2 , wherein the common trait corresponds to a subject type that is used by the annotation. 4. The method of claim 1 , further comprising: determining a most probable subject subtype for the selected entity, wherein each subtype is part of a subject type and wherein the subtypes are included in the annotation; and incrementing the value of the annotation for the subject subtype corresponding to the most probable subject type for the selected entity, whereby the highest value of the annotation indicates the most probable subject subtype for the selected entity within the selected text segment. 5. A computer program product for determining a subject type for an entity in a text segment, the computer program product comprising: a non-transitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising: computer readable program code configured to select a text segment, wherein the text segment includes one or more single-word or multi-word entities; computer readable program code configured to perform natural language processing on the selected text segment to identify one or more entities that constitute subjects of the selected text segment; computer readable program code configured to select one of the identified entities; computer readable program code configured to associate a variant annotation with the selected entity, wherein the variant annotation is operable to reflect multiple subject types for the selected entity and a value for each subject type; computer readable program code configured to determine the most probable subject type for the selected entity, based on a combination of natural language processing rules and dictionary listings, by examining single-word and multi-word non-subject entities that are located in spatial proximity to the selected entity to obtain information as to possible subject types for the selected entity; and computer readable program code configured to increment the value of the annotation for the subject type corresponding to the most probable subject type for the selected entity, whereby the highest value of the annotation indicates the most probable subject type for the selected entity within the selected text segment, wherein in the event that a most probable subject type cannot be determined for the selected entity, computer readable program code configured to increment the value of the annotation for two or more probable subject types for the selected entity to determine the most probable subject type for the selected entity. 6. The computer program product of claim 5 , wherein each dictionary includes a list of words that share a common trait. 7. The computer program product of claim 6 , wherein the common trait corresponds to a subject type that is used by the annotation. 8. The computer program product of claim 5 , further comprising: computer readable program code configured to determine a most probable subject subtype for the selected entity, wherein each subtype is part of a subject type and wherein the subtypes are included in the annotation; and computer readable program code configured to increment the value of the annotation for the subject subtype corresponding to the most probable subject type for the selected entity, whereby the highest value of the annotation indicates the most probable subject subtype for the selected entity within the selected text segment. 9. A system for determining a subject type for an entity in a text segment, the system comprising: a processor; and a memory storing instructions that are executable by the processor, the instructions including instructions to: select a text segment, wherein the text segment includes one or more single-word or multi-word entities; perform natural language processing on the selected text segment to identify one or more entities that constitute subjects of the selected text segment; select one of the identified entities; associate a variant annotation with the selected entity, wherein the variant annotation is operable to reflect multiple subject types for the selected entity and a value for each subject type; determine the most probable subject type for the selected entity, based on a combination of natural language processing rules and dictionary listings, by examining single-word and multi-word non-subject entities that are located in spatial proximity to the selected entity to obtain information as to possible subject types for the selected entity; and increment the value of the annotation for the subject type corresponding to the most probable subject type for the selected entity, whereby the highest value of the annotation indicates the most probable subject type for the selected entity within the selected text segment, wherein in the event that a most probable subject type cannot be determined for the selected entity, increment the value of the annotation for two or more probable subject types for the selected entity to determine the most probable subject type for the selected entity. 10. The system of claim 9 , wherein each dictionary includes a list of words that share a common trait. 11. The system of claim 10 , wherein the common trait corresponds to a subject type that is used by the annotation. 12. The system of claim 9 , wherein the memory further includes instructions to: determine a most probable subject subtype for the selected entity, wherein each subtype is part of a subject type and wherein the subtypes are included in the annotation; and increment the value of the annotation for the subject subtype corresponding to the most probable subject type for the selected entity, whereby the highest value of the annotation indicates the most probable subject subtype for the selected entity within the selected text segment.

Assignees

Inventors

Classifications

G06F40/295
Named entity recognition · CPC title
G06F40/268Primary
Morphological analysis · CPC title
G06F17/278
Physics · mapped topic
G06F17/2755Primary
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 49671305

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9684648B2 cover?: Determining a subject type for an entity in a text segment. A text segment is selected, which includes one or more single-word or multi-word entities. Natural language processing is performed on the selected text segment to identify entities that constitute subjects of the selected text segment. One entity is selected. A variant annotation is associated with the selected entity. The variant ann…
Who is the assignee on this patent?: Fink Patrick W, Mcneil Kristin E, Parker Philip E, and 1 more
What technology area does this patent fall under?: Primary CPC classification G06F40/268. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 20 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).