Software classification

US2020183678A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2020183678-A1
Application numberUS-201616341120-A
CountryUS
Kind codeA1
Filing dateDec 8, 2016
Priority dateDec 8, 2016
Publication dateJun 11, 2020
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Examples described relate to classifying software. In an example, a determination may be made whether a software installation directory includes a file to run software. In response to a determination that the software installation directory includes a file to run the software, information may be extracted from text data associated with the software installation directory using named entity recognition technique. The files in the software installation directory may be classified as one of a primary file, a secondary file, or a tertiary file based on respective relevance scores of the files, wherein the respective relevance scores may represent respective relevance of the files against the extracted information.

First claim

Opening claim text (preview).

1 . A method comprising: by a processor determining whether a software installation directory includes a file to run software; in response to the determination that the software installation directory includes the file to run the software, extracting information from text data associated with the software installation directory using a named entity recognition technique; and classifying files in the software installation directory as one of a primary file, a secondary file, or a tertiary file based on respective relevance scores of the files, wherein the respective relevance scores of the files represent respective relevance of the files against the extracted information. 2 . The method of claim 1 , wherein the information includes a publisher of software in the software installation directory, a name of the software, and a version of the software, 3 . The method of claim 1 , further comprising determining the respective relevance scores of the files. 4 . The method of claim 3 , wherein determining the respective relevance scores of the files includes: converting respective file entries of the files into respective text queries, wherein the respective file entries represent the files in a file system; and querying the respective text queries against the extracted information. 5 . The method of claim 3 , further comprising removing stop words from the extracted information prior to determining the respective relevance scores of the files. 6 . A system comprising: a determination engine to determine whether a software installation directory includes a file to run software; an extraction engine to, in response to the determination that the software installation directory includes the file to run the software, extract information from text data associated with the software installation directory using a named entity recognition technique, wherein the information includes a publisher of software in the software installation directory, a name of the software, and a version of the software; and a relevance engine to determine respective relevance scores of files in the software installation directory, wherein the respective relevance scores of the files represent respective relevance of the files against the extracted information; and a classification engine to classify the files in the software installation directory as one of a main file, an associated file, or a third party file based on the respective relevance scores of the files. 7 . The system of claim 6 , wherein the extraction engine to identify the publisher of the software using DBpedia ontology. 8 . The system of claim 6 , wherein the main file includes the file to run the software. 9 . The system of claim 6 , wherein the associated file includes an ancillary file from the publisher of the software. 10 . The system of claim 6 , wherein the third party file includes a file from another publisher other than the publisher of the software. 11 . A non-transitory machine-readable storage medium comprising instructions, the instructions executable by a processor to: determine whether a software installation directory includes a file to run software; in response to the determination that the software installation directory includes the file to run the software, extract named entities from text data associated with the software installation directory using named entity recognition technique, wherein the named entities include a publisher of software in the software installation directory, a name of the software, and a version of the software; classify files in the software installation directory as one of a main file, an associated file, or a third-party file based on respective relevance scores of the files, wherein the respective relevance scores of the files represent respective relevance of the files against the named entities; and display the classified files. 12 . The storage medium of claim 11 , wherein the instructions to determine include instructions to use a gradient boosted decision trees model to determine whether the software installation directory includes a file to run the software. 13 . The storage medium of claim 11 , wherein the main file includes a file with a highest relevance score above a pre-defined first threshold. 14 . The storage medium of claim 11 , wherein the third party file includes a file with a relevance score less than a pre-defined second threshold. 15 . The storage medium of claim 11 , wherein the associated file includes a file with a relevance score less than the pre-defined first threshold and more than the pre-defined second threshold.

Assignees

Inventors

Classifications

  • G06F8/71Primary

    Version control (security arrangements therefor G06F21/57); Configuration management · CPC title

  • G06F8/70Primary

    Software maintenance or management · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020183678A1 cover?
Examples described relate to classifying software. In an example, a determination may be made whether a software installation directory includes a file to run software. In response to a determination that the software installation directory includes a file to run the software, information may be extracted from text data associated with the software installation directory using named entity reco…
Who is the assignee on this patent?
Tan Xiang, Wang Jin, Song Qiuxia, and 3 more
What technology area does this patent fall under?
Primary CPC classification G06F8/71. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 11 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).