Systems and methods for language feature generation over multi-layered word representation

US10073834B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10073834-B2
Application numberUS-201615018877-A
CountryUS
Kind codeB2
Filing dateFeb 9, 2016
Priority dateFeb 9, 2016
Publication dateSep 11, 2018
Grant dateSep 11, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

There is provided a computer-implemented method for outputting one or more cross-layer patterns to identify a target semantic phenomenon in text, the method comprising: extracting, for each word of at least some words of each training text fragment of training text fragments designated as representing a target semantic phenomenon, feature-values defined by respective layers; statistically analyzing the feature-values identified for the training text fragments to identify one or more cross-layer patterns comprising layers representing a common pattern for the training text fragments, the common cross-layer pattern defining one or more feature-values of a respective layer of one or more words and at least another feature-value of another respective layer of another word; and outputting the identified cross-layer pattern(s) for identifying a text fragment representing the target semantic phenomenon.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for outputting at least one cross-layer pattern to identify a target semantic phenomenon in a text document, the method comprising: using at least one hardware processor for executing a code for: extracting, for each word of at least some words of each training text fragment of a plurality of training text fragments designated as representing a target semantic phenomenon, a plurality of feature-values defined by respective layers; statistically analyzing the plurality of feature-values identified for the plurality of training text fragments to identify at least one cross-layer pattern comprising a plurality of layers representing a common pattern for the plurality of training text fragments, the common cross-layer pattern defining at least one feature-value of a respective layer of at least one word and at least another feature-value of another respective layer of another word; and generating instructions to present a marked human-readable text document by using the identified at least one cross-layer pattern for automatically marking at least one text fragment representing the target semantic phenomenon in a human-readable text document. 2. The method of claim 1 , further comprising: training a statistical classifier to identify the target semantic phenomenon by matching or correlating between feature-values extracted from a new text fragment and the at least one cross-layer pattern; and wherein the trained statistical classifier is used for analyzing the human-readable text document to identify the at least one text fragment representing the target semantic phenomenon. 3. The method of claim 2 , wherein the extracting of the plurality of feature-values defined by respective layers is performed for training text fragments designated as not representing the target semantic phenomenon, and the classifier is trained based on the feature-values extracted from the training text fragments designated as not representing the target semantic phenomenon. 4. The computer-implemented method of claim 1 , wherein the cross-layer pattern includes at least one negative feature-value that does not appear in a text fragment that includes the target semantic phenomenon. 5. The computer-implemented method of claim 1 , wherein each layer of the plurality of layers of the at least one cross-layer pattern is a member selected from the group consisting of: semantic, syntactic, domain knowledge, injection of knowledge by task expert, part-of-speech (POS) tag of the word, hypernym of the word, a named entity represented by the word, sentiment represented by the word, word appearing in a predefined lexicon. 6. The computer-implemented method of claim 1 , wherein the cross-layer pattern includes at least one word in the text fragment associated with multiple different layers. 7. The computer-implemented method of claim 1 , wherein the multiple different layers are combined for the at least one word. 8. The computer-implemented method of claim 1 , wherein the cross-layer pattern includes at least two different words in the text fragment each associated with a different layer. 9. The computer-implemented method of claim 1 , wherein the different layers associated with the at least two different words are defined by an order within the cross-layer pattern. 10. The computer-implemented method of claim 1 , wherein the target semantic phenomenon is a member of the group consisting of: a definition, a statement providing evidence for or against a topic, a statement made by an entity that something is the case about a topic without evidence, and a sentiment expressed by an entity about a topic. 11. The computer-implemented method of claim 1 , wherein the cross-layer pattern includes at least one defined gap between at least two layers each from a different word. 12. The computer-implemented method of claim 1 , wherein the cross-layer pattern is created by iteratively combining features to generate longer cross-layer patterns. 13. The computer-implemented method of claim 12 , further comprising applying a greedy analysis at the end of each iteration to identify the top predefined number of cross-layer patterns ranked according to probability of accurate prediction. 14. The computer-implemented method of claim 13 , wherein the top predefined number of cross-layer patterns are selected based on a correlation requirement with other previously selected higher ranking features. 15. The computer-implemented method of claim 12 , wherein combining features is performed by adding another feature of another word in combination and in order. 16. The computer-implemented method of claim 12 , wherein combining features is performed by adding another feature of the same word in combination. 17. A computer-implemented method for applying at least one cross-layer pattern to at least one text fragment to identify a target semantic phenomenon, the method comprising: extracting a plurality of feature-values from at least some words in each text fragment of a human-readable text, each feature-value defined by a respective layer; matching or correlating the plurality of feature-values with at least one cross-layer pattern; and outputting an indication of the target semantic phenomenon in each respective text fragment when a match or correlation is found. 18. The computer-implemented method of claim 17 , wherein the matching or correlating with at least one cross-layer pattern is performed by applying a trained statistical classifier to the plurality of feature-values. 19. A system that identifies a target semantic phenomenon in text, comprising: a data interface for receiving a plurality of training text fragment representing a target semantic phenomenon; a program store storing code; and at least one hardware processor coupled to the data interface and the program store for implementing the stored code, the code comprising: code to extract, for each word of at least some words of the plurality of training text fragment, a plurality of feature-values defined by respective layers; code to statistically analyze the plurality of feature-values to identify at least one cross-layer pattern comprising a plurality of layers representing a common pattern for the plurality of training text fragments, the common cross-layer pattern defining at least one feature-value of a respective layer of at least one word and at least another feature-value of another respective layer of another word; and code to generate instructions to present a marked human-readable text document by using the identified at least one cross-layer pattern for automatically marking at least one text fragment representing the target semantic phenomenon in the marked human-readable text document.

Assignees

Inventors

Classifications

  • Creation of semantic tools, e.g. ontology or thesauri · CPC title

  • into predefined classes · CPC title

  • G06F40/216Primary

    using statistical methods · CPC title

  • G06F40/295Primary

    Named entity recognition · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10073834B2 cover?
There is provided a computer-implemented method for outputting one or more cross-layer patterns to identify a target semantic phenomenon in text, the method comprising: extracting, for each word of at least some words of each training text fragment of training text fragments designated as representing a target semantic phenomenon, feature-values defined by respective layers; statistically analy…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/216. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 11 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).