Automatically assessing document quality for domain-specific documentation

US10387564B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10387564-B2
Application numberUS-94497010-A
CountryUS
Kind codeB2
Filing dateNov 12, 2010
Priority dateNov 12, 2010
Publication dateAug 20, 2019
Grant dateAug 20, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and arrangements for document quality assessment. Documents are accepted and a quality specification containing predetermined quality criteria is assimilated. Each document is assessed based on the predetermined quality criteria, and a quality score is assigned to each document, the quality score being a function of positive and negative attributes assessed for each document.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for automatic document quality assessment against a quality specification to standardize the document quality assessment, the method comprising: utilizing at least one processor to execute computer code at a computer system, the computer code configured to perform the steps of: accepting a plurality of business documents at an input interface of the computer system, wherein business documents in at least a subset of the plurality of business documents comprise objects other than text; receiving, from a user and at a graphical user interface of the computer system, predetermined quality criteria, wherein each of the predetermined quality criteria are identified as one of positive criteria and negative criteria and one of domain-specific criteria and domain-independent criteria, wherein the predetermined quality criteria are based upon quality attributes that identify attributes that result in a quality business document and wherein at least a portion of the predetermined quality criteria are directed to the objects other than text; creating, using a processor, the quality specification from the received predetermined quality criteria, wherein the Quality specification comprises the predetermined quality criteria, each of the predetermined quality criteria within the quality specification being identified as (i) positive or negative and (ii) domain-specific or domain-independent and having a weight assigned to each of the positive and negative criteria; identifying, using a processor, characteristics of each of the plurality of business documents, wherein said identifying comprises using a plurality of annotators, each annotator automatically identifying at least a subset of said characteristics, wherein the plurality of annotators are run, in sequence, on each of the plurality of documents, and wherein the annotators in the plurality of annotators are Unstructured Information Management Analysis (UIMA) annotators implemented by the at least one processor; automatically providing a standard quality assessment for the plurality of business documents and automatically approving business documents meeting the quality specification by: automatically assessing, using a processor, each of the plurality of business documents against the quality specification, wherein said assessing comprises analyzing the identified characteristics of each of the plurality of business documents against the predetermined quality criteria, wherein the automatically assessing of the plurality of business documents comprises assessing all of the plurality of business documents against the same created quality specification, and determining, using a processor and based upon the assessing, a quality score, based on the predetermined quality criteria and the identified characteristics, for each of the plurality of business documents, wherein the quality score is normalized and the quality score comprising: an additive function of positive attributes, identified in each of the plurality of business documents, multiplied by the weight associated with the positive attribute, wherein the positive attributes are based on the positive criteria, and a subtractive function of negative attributes, identified in each of the plurality of business documents, multiplied by the weight associated with the negative attribute, wherein the negative attributes are based on the negative criteria; automatically approving business documents from the plurality of documents by filtering the plurality of business documents based upon the quality score corresponding to each business document and approving business documents having a quality score exceeding a predetermined threshold as approved quality documents; and providing the approved quality documents to the user on the graphical user interface. 2. The method according to claim 1 , wherein said determining comprises: applying relative weights to each of the positive and negative attributes; summing the weighted positive attributes; dividing the summed weighted positive attributes by the number of positive attributes to yield a weighted positive attribute average; summing the weighted negative attributes; dividing the summed negative positive attributes by the number of negative attributes to yield a weighted negative attribute average; and subtracting the weighted negative attribute average from the weighted positive attribute average to yield the quality score. 3. The method according to claim 1 , further comprising applying a quality score threshold and accepting as output solely those business documents meeting or exceeding the quality score threshold. 4. The method according to claim 1 , wherein the predetermined quality criteria relate to document quality in connection with one or more from the group consisting of: grammar, usage and diction. 5. The method according to claim 1 , wherein the positive attributes comprise one or more from the group consisting of: sectioning in a document, size of sections in a document, sentence length, mentioning of predetermined concepts in a document, use of predetermined terminology in a document, inclusion of information relating to a mercantile order. 6. The method according to claim 1 , wherein the negative attributes comprise one or more from the group consisting of: mentioning of predetermined non-relevant concepts in a document, use of acronyms, use of passive voice, excessive discussion of a predetermined concept in a document. 7. The method according to claim 1 , wherein said assessing comprises performing operations on the business documents and the quality specification via the plurality of annotators. 8. The method according to claim 1 , wherein said assessing comprises automatically assessing and reconciling the quality specification with each business document to yield a tuple corresponding to each business document, the tuple including an identifier of the business document and the quality score for the business document. 9. The method according to claim 1 , wherein the quality specification is provided by the user. 10. An apparatus for automatic document quality assessment against a quality specification to standardize the document quality assessment, the apparatus comprising: one or more processors; and a computer readable storage medium having computer readable program code embodied therewith and executable by the one or more processors, the computer readable program code comprising: computer readable program code configured to accept business documents at an input interface of the apparatus, wherein business documents in at least a subset of the plurality of business documents comprise objects other than text; computer readable program code configured to receive, from a user from a user and at a graphical user interface of the apparatus, predetermined quality criteria, wherein each of the predetermined quality criteria are identified as one of positive criteria and negative criteria and one of domain-specific criteria and domain-independent criteria, wherein the predetermined quality criteria are based upon quality attributes that identify attributes that result in a quality business document and wherein at least a portion of the predetermined quality criteria are directed to the objects other than text; computer readable program code configured to create the quality specification from the received predetermined quality criteria, wherein the quality specification comprises the predetermined quality criteria, each of the predetermined quality criteria within the quality specification being identified as (i) positive or negative and (ii) domain-specific or domain-independent and having a weight assigned to each of the

Assignees

Inventors

Classifications

  • G06F40/253Primary

    Grammatical analysis; Style critique · CPC title

  • G06F17/274Primary

    Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10387564B2 cover?
Methods and arrangements for document quality assessment. Documents are accepted and a quality specification containing predetermined quality criteria is assimilated. Each document is assessed based on the predetermined quality criteria, and a quality score is assigned to each document, the quality score being a function of positive and negative attributes assessed for each document.
Who is the assignee on this patent?
Ananthanarayanan Rema, Srivastava Biplav, IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/253. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 20 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).