Personalized approach to handling hypotheticals in text

US10360301B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10360301-B2
Application numberUS-201615289224-A
CountryUS
Kind codeB2
Filing dateOct 10, 2016
Priority dateOct 10, 2016
Publication dateJul 23, 2019
Grant dateJul 23, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Mechanisms receive natural language content and analyze the natural language content to generate a parse tree data structure. The mechanisms process the parse tree data structure to identify one or more instances of candidate hypothetical spans in the natural language content. Hypothetical spans are terms or phrases indicative of a hypothetical statement. The mechanisms calculate, for each candidate hypothetical span, a confidence score value indicative of a confidence that the candidate hypothetical span is an actual hypothetical span based on a personalized hypothetical dictionary data structure associated with a source of the natural language content. The mechanisms perform an operation based on the natural language content. The operation is performed with portions of the natural language content corresponding to the one or more identified instances of actual hypothetical spans being given different relative weights within portions of the natural language content than other portions of the natural language content.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, in a data processing system comprising at least one processor and at least one memory, the at least one memory comprising instructions which are executed by the at least one processor and specifically configure the at least one processor to perform the method, wherein the method comprises: receiving, by the data processing system, natural language content; analyzing, by the data processing system, the natural language content to generate a parse tree data structure; processing, by the data processing system, the parse tree data structure to identify one or more instances of candidate hypothetical spans in the natural language content, wherein hypothetical spans are terms or phrases indicative of a hypothetical statement; calculating, by the data processing system, for each candidate hypothetical span, a confidence score value indicative of a confidence that the candidate hypothetical span is an actual hypothetical span based on a personalized hypothetical dictionary data structure associated with a source of the natural language content; generating, by the data processing system, one or more instances of actual hypothetical spans based on the confidence score values associated with the candidate hypothetical spans; removing, by the data processing system, one or more sub-tree data structures of the parse tree data structure that correspond to the one or more instances of actual hypothetical spans, to thereby generate a hypothetical pruned parse tree data structure; and performing, by the data processing system, an operation based on the natural language content, wherein the operation is performed with portions of the natural language content, corresponding to the one or more identified instances of actual hypothetical spans, being given different relative weights, than other portions of the natural language content that do not correspond to the one or more identified instances of actual hypothetical spans, and wherein the operation is performed based on the hypothetical pruned parse tree data structure. 2. The method of claim 1 , further comprising: generating, by the data processing system, the personalized hypothetical dictionary data structure for the source of the natural language content based on analysis of writing style features utilized by the source of the natural language content. 3. The method of claim 1 , wherein generating one or more instances of actual hypothetical spans comprises comparing the confidence score values of the candidate hypothetical spans to at least one threshold value, wherein candidate hypothetical spans are added to the one or more instances of actual hypothetical spans in response to their corresponding confidence score values having a predetermined relationship to the at least one threshold value. 4. The method of claim 1 , wherein each source in a plurality of sources of natural language content has an associated personalized hypothetical dictionary data structure, and wherein at least two of the personalized hypothetical dictionary data structures have different hypothetical triggers determined based on analysis of the writing style features of the corresponding sources. 5. The method of claim 1 , wherein the personalized hypothetical dictionary data structure specifies one or more hypothetical triggers that are specific to the particular source associated with the personalized hypothetical dictionary data structure. 6. The method of claim 5 , wherein the one or more hypothetical triggers are identified through natural language processing of documents authored by the source to identify writing style features used by the source. 7. The method of claim 6 , wherein the source is an institution, and wherein the writing style features comprise rules, specified by the institution, indicating requirements of writing style to be used by authors when generating natural language content. 8. The method of claim 2 , wherein the writing style features comprise both structural and content features of natural language content generated by the source and learned through machine learning algorithms applied to the natural language content generated by the source. 9. The method of claim 2 , wherein the writing style features of the source comprise patterns of language usage identified through statistical analysis of sentence style in natural language content generated by the source. 10. The method of claim 1 , wherein processing the parse tree data structure to identify one or more instances of candidate hypothetical span comprises: identifying a hypothetical trigger within the parse tree data structure; and annotating the natural language content signifying the content within the hypothetical span to be associated with the hypothetical trigger. 11. The method of claim 1 , wherein the performing the operation comprises: training, by the data processing system, a model of a natural language processing system based on the generated one or more instances of actual hypothetical spans in the natural language content; and performing, by the natural language processing system, natural language processing of natural language content based on the trained model. 12. The method of claim 10 , wherein processing the parse tree data structure further comprises, for each instance of a hypothetical trigger found in the parse tree data structure: analyzing the hypothetical trigger using a dictionary data structure to determine a part-of-speech attribute of the hypothetical trigger; and utilizing the determined part-of-speech attribute to determine a measure of whether or not the hypothetical trigger corresponds to a hypothetical statement. 13. The method of claim 12 , wherein utilizing the determined part-of-speech attribute to determine a measure of whether or not the hypothetical trigger corresponds to a hypothetical statement comprises: generating a tuple representation of a sub-tree data structure corresponding to the hypothetical trigger; retrieving, from the dictionary data structure, one or more dictionary definitions of a term present in the hypothetical trigger; and determining a part-of-speech attribute of the hypothetical trigger based on a correlation of the tuple representation of the sub-tree data structure with the one or more dictionary definitions. 14. The method of claim 13 , wherein, in response to the part-of-speech attribute indicating that the hypothetical trigger is a noun, the sub-tree data structure corresponding to the hypothetical trigger is determined to not be directed to a hypothetical statement. 15. The method of claim 1 , wherein the data processing system comprises a medical treatment recommendation system, and wherein the operation comprises generating, by the medical treatment recommendation system, treatment recommendations based on content of a patient electronic medical record. 16. The method of claim 1 , wherein processing the parse tree data structure further comprises processing the parse tree data structure to identify instances of factual triggers, wherein factual triggers are terms or phrases indicative of a factual statement. 17. The method of claim 16 , further comprising: determining if a factual sub-tree is present within a hypothetical sub-tree; and in response to the factual sub-tree being present within a hypothetical sub-tree, removing the factual sub-tree from the hypothetical sub-tree to generate a modified hypothetical sub-tree prior to further processing of the modified hypothetical sub-tree. 18. A computer program product comprising a non-transitory c

Assignees

Inventors

Classifications

  • Morphological analysis · CPC title

  • Dictionaries · CPC title

  • for calculating health indices; for individual health risk assessment · CPC title

  • ICT specially adapted for the handling or processing of patient-related medical or healthcare data (for medical reports G16H15/00; for therapies or health-improving plans G16H20/00; for the handling or processing of medical images G16H30/00) · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10360301B2 cover?
Mechanisms receive natural language content and analyze the natural language content to generate a parse tree data structure. The mechanisms process the parse tree data structure to identify one or more instances of candidate hypothetical spans in the natural language content. Hypothetical spans are terms or phrases indicative of a hypothetical statement. The mechanisms calculate, for each cand…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 23 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).