Assisting application classification using predicted subscriber behavior
US-9906452-B1 · Feb 27, 2018 · US
US2020183678A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2020183678-A1 |
| Application number | US-201616341120-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 8, 2016 |
| Priority date | Dec 8, 2016 |
| Publication date | Jun 11, 2020 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Examples described relate to classifying software. In an example, a determination may be made whether a software installation directory includes a file to run software. In response to a determination that the software installation directory includes a file to run the software, information may be extracted from text data associated with the software installation directory using named entity recognition technique. The files in the software installation directory may be classified as one of a primary file, a secondary file, or a tertiary file based on respective relevance scores of the files, wherein the respective relevance scores may represent respective relevance of the files against the extracted information.
Opening claim text (preview).
1 . A method comprising: by a processor determining whether a software installation directory includes a file to run software; in response to the determination that the software installation directory includes the file to run the software, extracting information from text data associated with the software installation directory using a named entity recognition technique; and classifying files in the software installation directory as one of a primary file, a secondary file, or a tertiary file based on respective relevance scores of the files, wherein the respective relevance scores of the files represent respective relevance of the files against the extracted information. 2 . The method of claim 1 , wherein the information includes a publisher of software in the software installation directory, a name of the software, and a version of the software, 3 . The method of claim 1 , further comprising determining the respective relevance scores of the files. 4 . The method of claim 3 , wherein determining the respective relevance scores of the files includes: converting respective file entries of the files into respective text queries, wherein the respective file entries represent the files in a file system; and querying the respective text queries against the extracted information. 5 . The method of claim 3 , further comprising removing stop words from the extracted information prior to determining the respective relevance scores of the files. 6 . A system comprising: a determination engine to determine whether a software installation directory includes a file to run software; an extraction engine to, in response to the determination that the software installation directory includes the file to run the software, extract information from text data associated with the software installation directory using a named entity recognition technique, wherein the information includes a publisher of software in the software installation directory, a name of the software, and a version of the software; and a relevance engine to determine respective relevance scores of files in the software installation directory, wherein the respective relevance scores of the files represent respective relevance of the files against the extracted information; and a classification engine to classify the files in the software installation directory as one of a main file, an associated file, or a third party file based on the respective relevance scores of the files. 7 . The system of claim 6 , wherein the extraction engine to identify the publisher of the software using DBpedia ontology. 8 . The system of claim 6 , wherein the main file includes the file to run the software. 9 . The system of claim 6 , wherein the associated file includes an ancillary file from the publisher of the software. 10 . The system of claim 6 , wherein the third party file includes a file from another publisher other than the publisher of the software. 11 . A non-transitory machine-readable storage medium comprising instructions, the instructions executable by a processor to: determine whether a software installation directory includes a file to run software; in response to the determination that the software installation directory includes the file to run the software, extract named entities from text data associated with the software installation directory using named entity recognition technique, wherein the named entities include a publisher of software in the software installation directory, a name of the software, and a version of the software; classify files in the software installation directory as one of a main file, an associated file, or a third-party file based on respective relevance scores of the files, wherein the respective relevance scores of the files represent respective relevance of the files against the named entities; and display the classified files. 12 . The storage medium of claim 11 , wherein the instructions to determine include instructions to use a gradient boosted decision trees model to determine whether the software installation directory includes a file to run the software. 13 . The storage medium of claim 11 , wherein the main file includes a file with a highest relevance score above a pre-defined first threshold. 14 . The storage medium of claim 11 , wherein the third party file includes a file with a relevance score less than a pre-defined second threshold. 15 . The storage medium of claim 11 , wherein the associated file includes a file with a relevance score less than the pre-defined first threshold and more than the pre-defined second threshold.
Related publications grouped by family.
Answers are generated from the same data shown on this page.