Who is the assignee on this patent?

Grabarnik Genady, Kozakov Lev, Shwartz Larisa, and 1 more

What technology area does this patent fall under?

Primary CPC classification G06F40/169. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 20 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Credibility of text analysis engine performance evaluation by rating reference content

Patent metadata
Field	Value
Publication number	US-9524281-B2
Application number	US-201213477730-A
Country	US
Kind code	B2
Filing date	May 22, 2012
Priority date	Oct 9, 2008
Publication date	Dec 20, 2016
Grant date	Dec 20, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Evaluating the performance of a text analysis engine is provided. A plurality of pre-annotated reference documents and a set of annotation types associated with the pre-annotated reference documents are received. Annotation contexts of reference annotations in the plurality of pre-annotated reference documents are analyzed using the set of annotation types. Similar annotation contexts are identified between the reference annotations and the set of annotation types. Responsive to identifying the similar annotation contexts, the similar annotation contexts are clustered thereby forming a plurality of reference annotation clusters. A set of reference content heterogeneity scores are computed based on the number of reference annotation clusters for each annotation type in the set of annotation types. An integral reference content rate for the set of annotation types is then computed and output to a user.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, in a data processing system, for evaluating the performance of a text analysis engine, the method comprising: receiving a plurality of pre-annotated reference documents; receiving a set of annotation types associated with the pre-annotated reference documents; analyzing annotation contexts of reference annotations in the plurality of pre-annotated reference documents using the set of annotation types; identifying similar annotation contexts between the reference annotations and the set of annotation types; responsive to identifying the similar annotation contexts, clustering the similar annotation contexts thereby forming a plurality of reference annotation clusters; computing a set of reference content heterogeneity scores based on the number of reference annotation clusters for each annotation type in the set of annotation types; computing an integral reference content rate for the set of annotation types; and outputting the integral reference content rate to a user. 2. The method of claim 1 , wherein the annotation types associated with the pre-annotated reference documents are text analysis engine annotation types. 3. The method of claim 1 , wherein clustering the similar annotation contexts groups reference annotations into one or more clusters is based on a similarity of the context of the similar annotation contexts. 4. The method of claim 1 , wherein the set of reference content heterogeneity scores are computed using the following equation: CH ⁡ ( T ) = number_of ⁢ _reference ⁢ _annotation ⁢ _clusters ⁢ _for ⁢ _type ⁢ _T number_of ⁢ _content ⁢ _units ⁢ _in ⁢ _reference ⁢ _content , wherein the number of context units in the reference content is at least one of an amount of lines or an amount of sentences. 5. The method of claim 1 , wherein the integral reference content rate for the set of annotation types is computed using the following equation: ContentRate = ∑ n = 1 N ⁢ ⁢ _ ⁢ ⁢ types ⁢ 1 N_types ⁢ CH ⁡ ( T n ) , wherein N_types is the number of annotations types and wherein T n (n=1, N_types) are the plurality of annotations types. 6. The method of claim 1 , further comprising: computing performance rates for each annotation type in the set of annotation types. 7. The method of claim 6 , wherein the performance rates for each annotation type in the set of annotation types are at least one of a precision performance rate, a recall performance rate, or a F-measure performance rate. 8. The method of claim 7 , wherein the precision performance rate is computed using the following equation: precision = number_of ⁢ _correct ⁢ _annotations ⁢ _created ⁢ _by ⁢ _TAE number_of ⁢ _all ⁢ _annotations ⁢ _created ⁢ _by ⁢ _TAE wherein TAE is a text analysis engine. 9. The method of claim 7 , wherein the recall performance rate is computed using the following equation: recall = number_of ⁢ _correct ⁢ _annotations ⁢ _created ⁢ _by ⁢ _TAE number_of ⁢ _all ⁢ _annotations ⁢ _in ⁢ _the ⁢ _reference ⁢ _content wherein TAF is a text analysis engine. 10. The method of claim 7 , wherein the F-measure performance rate is computed using the following equation: F - measure = 2 * ( precision * recall ) ( precision + recall ) . 11. The method of claim 1 , further comprising: measuring a contribution of each annotation type to a projected usage domain; summing weighted content

Assignees

Inventors

Classifications

G06F16/355
Creation or modification of classes or clusters · CPC title
G06F16/285
Clustering or classification · CPC title
G06F40/169Primary
Annotation, e.g. comment data or footnotes · CPC title
G06F16/35Primary
Clustering; Classification · CPC title
G06F17/30705
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 42100005

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9524281B2 cover?: Evaluating the performance of a text analysis engine is provided. A plurality of pre-annotated reference documents and a set of annotation types associated with the pre-annotated reference documents are received. Annotation contexts of reference annotations in the plurality of pre-annotated reference documents are analyzed using the set of annotation types. Similar annotation contexts are ident…
Who is the assignee on this patent?: Grabarnik Genady, Kozakov Lev, Shwartz Larisa, and 1 more
What technology area does this patent fall under?: Primary CPC classification G06F40/169. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 20 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).