Language model adaptation for specific texts
US-2015370784-A1 · Dec 24, 2015 · US
US9684648B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9684648-B2 |
| Application number | US-201213485001-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 31, 2012 |
| Priority date | May 31, 2012 |
| Publication date | Jun 20, 2017 |
| Grant date | Jun 20, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Determining a subject type for an entity in a text segment. A text segment is selected, which includes one or more single-word or multi-word entities. Natural language processing is performed on the selected text segment to identify entities that constitute subjects of the selected text segment. One entity is selected. A variant annotation is associated with the selected entity. The variant annotation reflects multiple subject types for the selected entity and a value for each subject type. The most probable subject type is determined for the selected entity, based on a combination of natural language processing rules and dictionary listings. The value of the annotation is incremented for the subject type corresponding to the most probable subject type for the selected entity, so that the highest value of the annotation indicates the most probable subject type for the selected entity within the selected text segment.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method for determining a subject type for an entity in a text segment, the method comprising: selecting, by a computer processor, a text segment, wherein the text segment includes one or more single-word or multi-word entities; performing, by the computer processor, natural language processing on the selected text segment to identify one or more entities that constitute subjects of the selected text segment; selecting, by the computer processor, one of the identified entities; associating, by the computer processor, a variant annotation with the selected entity, wherein the variant annotation is operable to reflect multiple subject types for the selected entity and a value for each subject type; determining, by the computer processor, the most probable subject type for the selected entity, based on a combination of natural language processing rules and dictionary listings, by examining single-word and multi-word non-subject entities that are located in spatial proximity to the selected entity to obtain information as to possible subject types for the selected entity; and incrementing, by the computer processor, the value of the annotation for the subject type corresponding to the most probable subject type for the selected entity, whereby the highest value of the annotation indicates the most probable subject type for the selected entity within the selected text segment, wherein in the event that a most probable subject type cannot be determined for the selected entity, incrementing the value of the annotation for two or more probable subject types for the selected entity to determine the most probable subject type for the selected entity. 2. The method of claim 1 , wherein each dictionary includes a list of words that share a common trait. 3. The method of claim 2 , wherein the common trait corresponds to a subject type that is used by the annotation. 4. The method of claim 1 , further comprising: determining a most probable subject subtype for the selected entity, wherein each subtype is part of a subject type and wherein the subtypes are included in the annotation; and incrementing the value of the annotation for the subject subtype corresponding to the most probable subject type for the selected entity, whereby the highest value of the annotation indicates the most probable subject subtype for the selected entity within the selected text segment. 5. A computer program product for determining a subject type for an entity in a text segment, the computer program product comprising: a non-transitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising: computer readable program code configured to select a text segment, wherein the text segment includes one or more single-word or multi-word entities; computer readable program code configured to perform natural language processing on the selected text segment to identify one or more entities that constitute subjects of the selected text segment; computer readable program code configured to select one of the identified entities; computer readable program code configured to associate a variant annotation with the selected entity, wherein the variant annotation is operable to reflect multiple subject types for the selected entity and a value for each subject type; computer readable program code configured to determine the most probable subject type for the selected entity, based on a combination of natural language processing rules and dictionary listings, by examining single-word and multi-word non-subject entities that are located in spatial proximity to the selected entity to obtain information as to possible subject types for the selected entity; and computer readable program code configured to increment the value of the annotation for the subject type corresponding to the most probable subject type for the selected entity, whereby the highest value of the annotation indicates the most probable subject type for the selected entity within the selected text segment, wherein in the event that a most probable subject type cannot be determined for the selected entity, computer readable program code configured to increment the value of the annotation for two or more probable subject types for the selected entity to determine the most probable subject type for the selected entity. 6. The computer program product of claim 5 , wherein each dictionary includes a list of words that share a common trait. 7. The computer program product of claim 6 , wherein the common trait corresponds to a subject type that is used by the annotation. 8. The computer program product of claim 5 , further comprising: computer readable program code configured to determine a most probable subject subtype for the selected entity, wherein each subtype is part of a subject type and wherein the subtypes are included in the annotation; and computer readable program code configured to increment the value of the annotation for the subject subtype corresponding to the most probable subject type for the selected entity, whereby the highest value of the annotation indicates the most probable subject subtype for the selected entity within the selected text segment. 9. A system for determining a subject type for an entity in a text segment, the system comprising: a processor; and a memory storing instructions that are executable by the processor, the instructions including instructions to: select a text segment, wherein the text segment includes one or more single-word or multi-word entities; perform natural language processing on the selected text segment to identify one or more entities that constitute subjects of the selected text segment; select one of the identified entities; associate a variant annotation with the selected entity, wherein the variant annotation is operable to reflect multiple subject types for the selected entity and a value for each subject type; determine the most probable subject type for the selected entity, based on a combination of natural language processing rules and dictionary listings, by examining single-word and multi-word non-subject entities that are located in spatial proximity to the selected entity to obtain information as to possible subject types for the selected entity; and increment the value of the annotation for the subject type corresponding to the most probable subject type for the selected entity, whereby the highest value of the annotation indicates the most probable subject type for the selected entity within the selected text segment, wherein in the event that a most probable subject type cannot be determined for the selected entity, increment the value of the annotation for two or more probable subject types for the selected entity to determine the most probable subject type for the selected entity. 10. The system of claim 9 , wherein each dictionary includes a list of words that share a common trait. 11. The system of claim 10 , wherein the common trait corresponds to a subject type that is used by the annotation. 12. The system of claim 9 , wherein the memory further includes instructions to: determine a most probable subject subtype for the selected entity, wherein each subtype is part of a subject type and wherein the subtypes are included in the annotation; and increment the value of the annotation for the subject subtype corresponding to the most probable subject type for the selected entity, whereby the highest value of the annotation indicates the most probable subject subtype for the selected entity within the selected text segment.
Named entity recognition · CPC title
Morphological analysis · CPC title
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.