Latent semantic indexing in application classification

US10229190B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10229190-B2
Application numberUS-201414272366-A
CountryUS
Kind codeB2
Filing dateMay 7, 2014
Priority dateDec 31, 2013
Publication dateMar 12, 2019
Grant dateMar 12, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An application classifier classifies applications using latent semantic indexing (LSI) vectors of the applications. The application classifier uses a machine-learned model generated based on pairs of LSI vectors of positive and negative training sets of applications, where the positive training set includes applications within a desired category and the negative training set includes applications outside of the desired category. For a given application, the application classifier determines whether the application belongs to the desired category based on similarity of an LSI vector of the application and LSI vectors of positive and negative exemplar applications, as determined by the machine-learned model. If the LSI vector of the application is similar to an LSI vector of at least one positive exemplar application and not similar to an LSI vector of any of the negative exemplar applications, the application is determined to belong to the desired category.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of classifying applications, the method comprising: receiving application data associated with each of a plurality of applications; computing, by at least one processor, a latent semantic indexing (LSI) vector for each of the plurality of applications based on the application data associated with the plurality of applications; determining, by the at least one processor, a training subset of the plurality of applications, wherein the training subset includes at least one first application that belongs to a category and at least one second application that does not belong to the category, and wherein the training subset comprises a positive training set that includes the at least one first application that belongs to the category and a negative training set that includes the at least one second application that does not belong to the category; generating, by the at least one processor, a computer model based on the LSI vectors for applications in the training subset, wherein generating the computer model includes determining pairs of applications in the training subset, wherein each of the pairs of applications includes two applications, and wherein each of the two applications of each of the pairs of applications is selected from one of the positive training set and the negative training set, determining a training score for each of the pairs of applications, wherein each of the determined training scores is assigned to a respective pair of the pairs of applications in the training subset, and generating the computer model based on the LSI vectors for the two applications of each of the pairs of applications and the training score assigned to each of the pair of applications; determining, by the at least one processor, an exemplar subset of the plurality of applications, wherein the exemplar subset includes at least one third application that belongs to the category and at least one fourth application that does not belong to the category; determining, by the at least one processor, a set of applications of the plurality of applications belonging to the category based on the computer model, LSI vector for set of applications, and LSI vector for one or more applications in the exemplar subset, wherein the determining the set of applications of the plurality of applications belonging to the category comprises: identifying at least one first radius each corresponding to the at least one third application based on LSI vector for the at least one third application, identifying at least one second radius each corresponding to the at least one fourth application based on LSI vector for the at least one fourth application, and determining the set of applications which have LSI vector being inside a first area defined by the at least one first radius and being outside a second area defined by the at least one second radius, receiving a search query from electronic device; and transmitting a search result including at least one of the set of applications belonging to the category associated with the search query to the electronic device for displaying. 2. The method of claim 1 , wherein the training score indicates whether both of the two applications of the pairs of applications belong to the category. 3. The method of claim 2 , wherein the training score further indicates a degree of confidence associated with one or both of the two applications of the pairs of applications belonging to the category or not belonging to the category. 4. The method of claim 1 , wherein the determining the set of applications of the plurality of applications belonging to the category comprises: inputting the LSI vector for the set of applications of the plurality of applications and the LSI vector for the one or more application in the exemplar subset into the computer model, and computing a similarity score for the set of applications based on the computer model, the LSI vector for the set of applications and the LSI vector for the one or more application; and identifying the set of applications of the plurality of applications belonging to the category based on the similarity score. 5. The method of claim 4 , wherein determining the set of applications of the plurality of applications belonging to the category based on the similarity score for the application and the one or more applications in the exemplar subset comprises: determining that the set of applications of the plurality of applications is similar to the at least one third application in the exemplar subset that belongs to the category based on one or more of the similarity score; determining that the set of applications of the plurality of applications is not similar to the at least one fourth application in the exemplar subset that does not belong to the category based on one or more of the similarity score; and determining that the set of applications belongs to the category. 6. The method of claim 5 , wherein the set of applications is determined to be similar to the at least one third application or not to be similar to the at least one fourth application based on a comparison between one or more of the similarity score and a threshold value. 7. The method of claim 1 , further comprising: providing an indication of the determination that the set of applications of the plurality of applications belongs to the category to a user; receiving a user input that indicates that the determination is incorrect; and in response to receiving the user input, determining that a specific application belongs to the category, wherein whether the specific application belongs to the category or does not belong to the category is specified by the user input. 8. The method of claim 7 , further comprising: determining whether another application of the plurality of applications belongs to the category based on the computer model, the LSI vector for the other application, and the LSI vector for one or more applications in the exemplar subset, including the specific application. 9. A system for classifying applications, the system comprising: at least one processor; wherein the at least one processor is configured to: receive application data associated with each of a plurality of applications, compute a latent semantic indexing (LSI) vector for the each of the plurality of applications based on the application data associated with the plurality of applications, determine a training subset of the plurality of applications, wherein the training subset includes at least one application that belongs to a category and at least one application that does not belong to the category, and generate a computer model based on the LSI vectors for the applications in the training subset, and wherein the training subset comprises a positive training set that includes the at least one first application that belongs to the category and a negative training set that includes the at least one second application that does not belong to the category, generate a computer model based on the LSI vectors for the plurality of applications in the training subset to (i) determine pairs of applications in the training subset, wherein each of the pairs of applications includes two applications, and wherein each of the two applications of each of the pairs of applications is selected from one of the positive training set and the negative training set, (ii) determine a training score for each of the pairs of applications, wherein each of the determined training scores is assigned to a respective pair of the pairs of applications in the training subset, and (iii) generate the computer model based on the LSI vectors for the two applications of each of the pairs applications and the training score assig

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10229190B2 cover?
An application classifier classifies applications using latent semantic indexing (LSI) vectors of the applications. The application classifier uses a machine-learned model generated based on pairs of LSI vectors of positive and negative training sets of applications, where the positive training set includes applications within a desired category and the negative training set includes applicatio…
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F16/3346. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 12 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).