Automatic document classification via content analysis at storage time

US2016171084A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016171084-A1
Application numberUS-201615053172-A
CountryUS
Kind codeA1
Filing dateFeb 25, 2016
Priority dateDec 3, 2012
Publication dateJun 16, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are disclosed for efficiently and automatically classifying textual documents or files. In some embodiments, the classification process is integrated into or otherwise made part of the storage function, such that when the user initiates a save process for a given file, the file is processed through a classifier prior to (or contemporaneously with) completing the save function. In some such embodiments, textual content of the file is analyzed using natural language processing to identify a main or substantial concept discussed in the file, and one or more corresponding tags are then assigned to that file. Subsequently, the user can access that file based on the one or more tags, for instance, through a user interface that allows the user to select one or more content categories associated with the assigned tags. The files can be text-based, but may include other content as well, such as images, video, and audio.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer readable medium having instructions encoded thereon that, when executed by one or more processors, cause a digital content classification process to be carried out, the process comprising: defining an initial association that exists between (a) one or more tags that represent a digital content segment and (b) a subject matter categorization; providing, to a user, the digital content segment and the subject matter categorization; receiving, from the user, a modified subject matter categorization; and modifying the initial association to produce a modified association that exists between (a) the one or more tags and (b) the modified subject matter categorization. 2 . The computer readable medium of claim 1 , wherein the digital content classification process further comprises analyzing the digital content segment to determine the one or more tags. 3 . The computer readable medium of claim 1 , wherein the subject matter categorization comprises a first plurality of subject matter categories, and the modified subject matter categorization comprises a second plurality of subject matter categories, wherein the second plurality of subject matter categories includes every subject matter category in the first plurality, as well as a new subject matter category. 4 . The computer readable medium of claim 1 , wherein the subject matter categorization comprises a first plurality of subject matter categories, and the modified subject matter categorization comprises a second plurality of subject matter categories, wherein every subject matter category in the second plurality is also included in the first plurality, but wherein the second plurality has fewer subject matter categories than the first plurality. 5 . The computer readable medium of claim 1 , wherein the subject matter categorization comprises a first plurality of subject matter categories, and the modified subject matter categorization comprises a second plurality of subject matter categories, wherein none of the subject matter categories in the first plurality is included in the second plurality. 6 . The computer readable medium of claim 1 , wherein the digital content classification process further comprises prompting the user to evaluate the subject matter categorization before receiving the modified subject matter categorization. 7 . The computer readable medium of claim 1 , wherein the digital content classification process further comprises providing the user with a first option to accept the subject matter categorization, and a second option to modify the subject matter categorization. 8 . The computer readable medium of claim 1 , wherein the digital content segment is textual content contained within an electronic file. 9 . The computer readable medium of claim 1 , wherein the digital content classification process further comprises receiving the digital content segment in response to a command, received from the user, to save a file containing the digital content segment in a content repository, wherein the subject matter categorization is provided to the user after the command is received. 10 . The computer readable medium of claim 1 , wherein the digital content classification process further comprises: receiving the digital content segment in response to a command, received from the user, to save a file containing the digital content segment in a content repository, wherein the subject matter categorization is provided to the user after the command is received; and saving the file in the content repository, wherein the file is associated with metadata that is also saved in the content repository, and wherein the metadata includes the one or more tags and the modified subject matter categorization. 11 . The computer readable medium of claim 1 , wherein the digital content classification process further comprises: receiving, from a second user, a second digital content segment that is also represented by the one or more tags; and providing, to the second user, the modified subject matter categorization. 12 . An electronic file classification methodology, comprising: analyzing digital content contained in an electronic file to determine an initial classification for the electronic file, wherein the initial classification comprises a first set of one or more subject matter categories with which the electronic file is associated; presenting the initial classification to a user having access to the electronic file; receiving, from the user, a modified classification for the electronic file, wherein the modified classification comprises a second set of one or more subject matter categories with which the electronic file is associated; and assigning the modified classification to the electronic file. 13 . The electronic file classification methodology of claim 12 , wherein the digital content is analyzed in response to receiving a command to store the electronic file in a content repository. 14 . The electronic file classification methodology of claim 12 , wherein: the digital content is analyzed in response to receiving a command to store the electronic file in a content repository; and the electronic file classification methodology further comprises (a) generating a modified filename that includes an identifier associated with the modified classification, and (b) storing the electronic file in the content repository using the modified filename. 15 . The electronic file classification methodology of claim 12 , further comprising defining a lookup table data structure that includes the subject matter categories comprising the modified classification, wherein said subject matter categories are indexed to a filename associated with the electronic file. 16 . The electronic file classification methodology of claim 12 , further comprising prompting the user to provide feedback on the initial classification, wherein the modified classification is received after prompting the user to provide the feedback. 17 . The electronic file classification methodology of claim 12 , wherein: the digital content is analyzed in response to receiving a command to store the electronic file in a content repository; and the file classification methodology further comprises storing the electronic file in the content repository, the stored electronic file being associated with metadata that defines the modified classification. 18 . A digital content classification system that includes a memory device and a processor that is operatively coupled to the memory device, wherein the processor is configured to execute instructions stored in the memory device, that, when executed, cause the processor to carry out a digital content classification process, the process comprising: defining an initial association between (a) one or more tags that represent a digital content segment and (b) a subject matter categorization; providing, to a user, the digital content segment and the subject matter categorization; receiving, from the user, a modified subject matter categorization; and modifying the initial association to produce a modified association that exists between (a) the one or more tags and (b) the modified subject matter categorization. 19 . The digital content classification system of claim 18 , wherein the process further comprises: providing, to a second user, the digital content segment and the modified subject matter categorization; receiving, from the second user, a further modified subject matter categorization; and fur

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016171084A1 cover?
Techniques are disclosed for efficiently and automatically classifying textual documents or files. In some embodiments, the classification process is integrated into or otherwise made part of the storage function, such that when the user initiates a save process for a given file, the file is processed through a classifier prior to (or contemporaneously with) completing the save function. In som…
Who is the assignee on this patent?
Adobe Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06F17/30598. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 16 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).