Identifying similar applications
US-2015169759-A1 · Jun 18, 2015 · US
US10229190B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10229190-B2 |
| Application number | US-201414272366-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 7, 2014 |
| Priority date | Dec 31, 2013 |
| Publication date | Mar 12, 2019 |
| Grant date | Mar 12, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An application classifier classifies applications using latent semantic indexing (LSI) vectors of the applications. The application classifier uses a machine-learned model generated based on pairs of LSI vectors of positive and negative training sets of applications, where the positive training set includes applications within a desired category and the negative training set includes applications outside of the desired category. For a given application, the application classifier determines whether the application belongs to the desired category based on similarity of an LSI vector of the application and LSI vectors of positive and negative exemplar applications, as determined by the machine-learned model. If the LSI vector of the application is similar to an LSI vector of at least one positive exemplar application and not similar to an LSI vector of any of the negative exemplar applications, the application is determined to belong to the desired category.
Opening claim text (preview).
What is claimed is: 1. A method of classifying applications, the method comprising: receiving application data associated with each of a plurality of applications; computing, by at least one processor, a latent semantic indexing (LSI) vector for each of the plurality of applications based on the application data associated with the plurality of applications; determining, by the at least one processor, a training subset of the plurality of applications, wherein the training subset includes at least one first application that belongs to a category and at least one second application that does not belong to the category, and wherein the training subset comprises a positive training set that includes the at least one first application that belongs to the category and a negative training set that includes the at least one second application that does not belong to the category; generating, by the at least one processor, a computer model based on the LSI vectors for applications in the training subset, wherein generating the computer model includes determining pairs of applications in the training subset, wherein each of the pairs of applications includes two applications, and wherein each of the two applications of each of the pairs of applications is selected from one of the positive training set and the negative training set, determining a training score for each of the pairs of applications, wherein each of the determined training scores is assigned to a respective pair of the pairs of applications in the training subset, and generating the computer model based on the LSI vectors for the two applications of each of the pairs of applications and the training score assigned to each of the pair of applications; determining, by the at least one processor, an exemplar subset of the plurality of applications, wherein the exemplar subset includes at least one third application that belongs to the category and at least one fourth application that does not belong to the category; determining, by the at least one processor, a set of applications of the plurality of applications belonging to the category based on the computer model, LSI vector for set of applications, and LSI vector for one or more applications in the exemplar subset, wherein the determining the set of applications of the plurality of applications belonging to the category comprises: identifying at least one first radius each corresponding to the at least one third application based on LSI vector for the at least one third application, identifying at least one second radius each corresponding to the at least one fourth application based on LSI vector for the at least one fourth application, and determining the set of applications which have LSI vector being inside a first area defined by the at least one first radius and being outside a second area defined by the at least one second radius, receiving a search query from electronic device; and transmitting a search result including at least one of the set of applications belonging to the category associated with the search query to the electronic device for displaying. 2. The method of claim 1 , wherein the training score indicates whether both of the two applications of the pairs of applications belong to the category. 3. The method of claim 2 , wherein the training score further indicates a degree of confidence associated with one or both of the two applications of the pairs of applications belonging to the category or not belonging to the category. 4. The method of claim 1 , wherein the determining the set of applications of the plurality of applications belonging to the category comprises: inputting the LSI vector for the set of applications of the plurality of applications and the LSI vector for the one or more application in the exemplar subset into the computer model, and computing a similarity score for the set of applications based on the computer model, the LSI vector for the set of applications and the LSI vector for the one or more application; and identifying the set of applications of the plurality of applications belonging to the category based on the similarity score. 5. The method of claim 4 , wherein determining the set of applications of the plurality of applications belonging to the category based on the similarity score for the application and the one or more applications in the exemplar subset comprises: determining that the set of applications of the plurality of applications is similar to the at least one third application in the exemplar subset that belongs to the category based on one or more of the similarity score; determining that the set of applications of the plurality of applications is not similar to the at least one fourth application in the exemplar subset that does not belong to the category based on one or more of the similarity score; and determining that the set of applications belongs to the category. 6. The method of claim 5 , wherein the set of applications is determined to be similar to the at least one third application or not to be similar to the at least one fourth application based on a comparison between one or more of the similarity score and a threshold value. 7. The method of claim 1 , further comprising: providing an indication of the determination that the set of applications of the plurality of applications belongs to the category to a user; receiving a user input that indicates that the determination is incorrect; and in response to receiving the user input, determining that a specific application belongs to the category, wherein whether the specific application belongs to the category or does not belong to the category is specified by the user input. 8. The method of claim 7 , further comprising: determining whether another application of the plurality of applications belongs to the category based on the computer model, the LSI vector for the other application, and the LSI vector for one or more applications in the exemplar subset, including the specific application. 9. A system for classifying applications, the system comprising: at least one processor; wherein the at least one processor is configured to: receive application data associated with each of a plurality of applications, compute a latent semantic indexing (LSI) vector for the each of the plurality of applications based on the application data associated with the plurality of applications, determine a training subset of the plurality of applications, wherein the training subset includes at least one application that belongs to a category and at least one application that does not belong to the category, and generate a computer model based on the LSI vectors for the applications in the training subset, and wherein the training subset comprises a positive training set that includes the at least one first application that belongs to the category and a negative training set that includes the at least one second application that does not belong to the category, generate a computer model based on the LSI vectors for the plurality of applications in the training subset to (i) determine pairs of applications in the training subset, wherein each of the pairs of applications includes two applications, and wherein each of the two applications of each of the pairs of applications is selected from one of the positive training set and the negative training set, (ii) determine a training score for each of the pairs of applications, wherein each of the determined training scores is assigned to a respective pair of the pairs of applications in the training subset, and (iii) generate the computer model based on the LSI vectors for the two applications of each of the pairs applications and the training score assig
using probabilistic model · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.