Apparatus, method and computer-accessible medium for explaining classifications of documents

US9836455B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9836455-B2
Application numberUS-201214001242-A
CountryUS
Kind codeB2
Filing dateFeb 23, 2012
Priority dateFeb 23, 2011
Publication dateDec 5, 2017
Grant dateDec 5, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Classification of collections of items such as words, which are called “document classification,” and more specifically explaining a classification of a document, such as a web-page or website. This can include exemplary procedure, system and/or computer-accessible medium to find explanations, as well as a framework to assess the procedure's performance. An explanation is defined as a set of words (e.g., terms, more generally) such that removing words within this set from the document changes the predicted class from the class of interest. The exemplary procedure system and/or computer-accessible medium can include a classification of web pages as containing adult content, e.g., to allow advertising on safe web pages only. The explanations can be concise and document-specific, and provide insight into the reasons for the classification decisions, into the workings of the classification models, and into the business application itself. Other exemplary aspects describe how explaining documents' classifications can assist in improving the data quality and model performance.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer readable medium including instructions thereon that are accessible by a hardware processing arrangement, wherein, when the processing arrangement executes the instructions, the processing arrangement is configured to generate information associated with at least one first classification of at least one document, comprising: (a) identifying at least one characteristic of the at least one document, wherein the at least one characteristic includes a plurality of items; (b) obtaining at least one second classification of the at least one document based on the at least one characteristic of the at least one document; (c) removing at least one of the items from the at least one document; (d) obtaining the at least one first classification based on the removal of the at least one of the items; and (e) generating the information associated with the at least one first classification of the at least one document by repeating procedures (c) and (d) until the at least one first classification is different from the at least one second classification. 2. The non-transitory computer readable medium of claim 1 , wherein the items include at least one of a (i) a plurality of words, (ii) a combination of words, (iii) at least one Uniform Resource Locator, or (iv) at least one locations visited by at least one device. 3. The non-transitory computer readable medium of claim 2 , wherein the at least one characteristic includes a plurality of words, and wherein the processing arrangement is further configured to generate the information by removing at least one of the words when performing procedures (c) and (d). 4. The non-transitory computer readable medium of claim 3 , wherein the at least one characteristic further includes a combination of words, and wherein the processing arrangement is further configured to generate the information by removing each word and every combination of words from the at least one document when performing procedures (c) and (d). 5. The non-transitory computer readable medium of claim 4 , wherein the processing arrangement is further configured to omit at least some of the words or at least some combination of words when performing procedures (c) and (d). 6. The non-transitory computer readable medium of claim 5 , wherein the processing arrangement is further configured to omit at least some of the words or at least some combination of words based on at least one of a pruning heuristic search or a hill climbing search. 7. The non-transitory computer readable medium of claim 1 , wherein the information includes a minimum-size explanation. 8. The non-transitory computer readable medium of claim 1 , wherein the information includes a plurality of minimum explanations. 9. The computer readable medium of claim 1 , wherein the obtaining of the at least one second classification includes determining the at least one second classification of the at least one document based on the at least one characteristic of the at least one document. 10. A non-transitory computer readable medium including instructions thereon that are accessible by a hardware processing arrangement, wherein, when the processing arrangement executes the instructions, the processing arrangement is configured to generate information associated with at least one first classification of a collection, comprising: (a) identifying at least one characteristic of the collection, wherein the at least one characteristic includes a plurality of items; (b) obtaining at least one second classification of the collection based on the at least one characteristic of the collection; (c) removing at least one of the items from the at least one document; (d) obtaining the at least one first classification based on the removal of the at least one of the items; and (e) generating the information associated with the at least one first classification of the collection by repeating procedures (c) and (d) until the at least one first classification is different than the at least one second classification. 11. The non-transitory computer readable medium of claim 10 , wherein the information includes at least one of an explanation or a hyper-explanation of the at least one first classification of the collection, and wherein the at least one first classification is one of a plurality of classifications. 12. The non-transitory computer readable medium of claim 11 , wherein the at least one of the explanation or the hyper-explanation is absent evidence indicating any of the at least one first classification and the at least one second classification. 13. The non-transitory computer readable medium of claim 12 , wherein the at least one of the explanation or the hyper-explanation includes an indication of insufficient vocabulary. 14. The non-transitory computer readable medium of claim 11 , wherein the at least one of the explanation or the hyper-explanation includes evidence exclusively indicating at least one of a negative classification or a default classification. 15. The non-transitory computer readable medium of claim 14 , wherein the at least one of the explanation or the hyper-explanation is absent evidence of a positive classification. 16. The non-transitory computer readable medium of claim 11 , wherein the at least one of the explanation or the hyper-explanation includes evidence exclusively indicating a positive classification. 17. The non-transitory computer readable medium of claim 16 , wherein the at least one of the explanation or the hyper-explanation is absent evidence indicating at least one of a negative classification or a default classification. 18. The non-transitory computer readable medium of claim 11 , wherein the at least one of the explanation or the hyper-explanation includes evidence indicating a default classification. 19. The non-transitory computer readable medium of claim 11 , wherein the at least one of the explanation or the hyper-explanation includes an incorrect prior classification. 20. The non-transitory computer readable medium of claim 11 , wherein at least one set of training data associated with a classifier facilitates generating the at least one of the explanation or the hyper-explanation. 21. The non-transitory computer readable medium of claim 20 , wherein the at least one set of training data includes a set of nearest neighbors that facilitates generating the at least one of the explanation or the hyper-explanation. 22. The computer readable medium of claim 10 , wherein the obtaining of the at least one second classification includes determining the at least one second classification of the at least one collection based on the at least one characteristic of the at least one document. 23. The computer readable medium of claim 10 , wherein the items include at least one of a (i) a plurality of words, (ii) a combination of words, (iii) at least one Uniform Resource Locator, or (iv) at least one locations visited by at least one device. 24. A method for generating information associated with at least one first classification of a collection, comprising: (a) identifying at least one characteristic of the collection, wherein the at least one characteristic includes a plurality of items; (b) obtaining at least one second classification of the collection based on the at least one characteristic of the collection; (c) removing at least one of the items from the at least one document; (d) obtaining the at least

Assignees

Inventors

Classifications

  • Recognition of textual entities · CPC title

  • G06F40/169Primary

    Annotation, e.g. comment data or footnotes · CPC title

  • G06F40/40Primary

    Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9836455B2 cover?
Classification of collections of items such as words, which are called “document classification,” and more specifically explaining a classification of a document, such as a web-page or website. This can include exemplary procedure, system and/or computer-accessible medium to find explanations, as well as a framework to assess the procedure's performance. An explanation is defined as a set of wo…
Who is the assignee on this patent?
Martens David, Provost Foster, Univ New York, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F40/169. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 05 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).