Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06Q30/0631. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Multilingual content based recommendation system

US9898773B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9898773-B2
Application number	US-201414546719-A
Country	US
Kind code	B2
Filing date	Nov 18, 2014
Priority date	Nov 18, 2014
Publication date	Feb 20, 2018
Grant date	Feb 20, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Example apparatus and methods access multiple sources of information concerning features for applications, clean the data from the multiple sources, extract features from the cleaned data, selectively weight the sources, data or extracted features and produce a feature vector. The feature vector may then be used in a single language feature space or in a multi-language feature space. Feature spaces may then be used to find similarities between applications to facilitate recommending applications. In one embodiment, different feature spaces may be connected using a graph where nodes represent items and edges represent similarity relationships between items based on related feature spaces. Traversing the graph may allow similarities to be found that might not otherwise be possible. For example, while there may be no direct English to Hebrew similarity relationship, there may be English to French and French to Hebrew relationships that can be followed in the graph.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: accessing electronic data from multiple different sources, where the electronic data represents unstructured text in two or more different languages, where the unstructured text represents titles or descriptions for applications, books, movies, or video games available in an electronic marketplace; extracting one or more features from the data; producing a plurality of feature vectors from the one or more features where a feature vector comprises one or more elements; producing from the plurality of feature vectors, one or more feature spaces from which a content-based similarity recommendation can be made; producing a graph with nodes and edges where the nodes represent computer applications, books, movies, or video games available in an electronic marketplace and the edges represent content-based similarity relationships, where the content-based similarity relationships are based, at least in part, on the one or more feature spaces, wherein the graph is represented using a latent vector space model that provides a distance function between nodes in the graph; and producing a content-based similarity score for two nodes that are not directly connected by an edge in the graph, where the content-based similarity score is a function of two or more other content-based similarity scores computed for other pairs of nodes in the graph. 2. The method of claim 1 , where cleaning the electronic data comprises performing one or more processes independently or interdependently, where the one or more processes change the capitalization of a word in the electronic data, separate concatenated words in the electronic data, merge synonyms in the electronic data, remove a non-Unicode symbol in the electronic data, remove a banned word in the electronic data, remove an uninformative word in the electronic data, or translate a word in the electronic data. 3. The method of claim 1 , comprising producing weights for members of the multiple different data sources, for different types of data, for different types of features, or for different features, and where producing the feature space from the plurality of features depends, at least in part, on the weights. 4. The method of claim 3 , the one or more elements being single words, types of nouns, n-grams, short phrases, symbols, acronyms, or abbreviations. 5. The method of claim 4 , where the one or more feature spaces are associated with multiple languages. 6. The method of claim 4 , cleaning the electronic data to make clean data from which feature vectors can be produced. 7. A computer-readable storage medium storing computer-executable instructions that when executed by a computer control the computer to perform a method, the method comprising: accessing electronic data from multiple different sources, where the electronic data represents unstructured text in two or more different languages, where the unstructured text represents titles or descriptions for applications available in an electronic marketplace; cleaning the electronic data to make clean data from which feature vectors can be produced, where cleaning the electronic data comprises performing one or more processes independently or interdependently, where the one or more processes change the capitalization of a word in the electronic data, separate concatenated words in the electronic data, merge synonyms in the electronic data, remove a non-Unicode symbol in the electronic data, remove a banned word in the electronic data, remove an uninformative word in the electronic data, or translate a word in the electronic data; extracting one or more features from the cleaned data using tokenization, n-gram extraction, proper noun detection, lemmatization, or stemming; producing weights for members of the multiple different data sources, for different types of data, for different types of features, or for different features; producing scores for the one or more features based, at least in part, on the weights and on term frequency—inverse document frequency (TF-IDF) or latent semantic indexing; producing a plurality of feature vectors from the one or more features based, at least in part, on the scores, where a feature vector comprises one or more elements, the one or more elements being single words, types of nouns, n-grams, short phrases, symbols, acronyms, or abbreviations; producing from the plurality of feature vectors, one or more feature spaces from which a content-based similarity recommendation can be made, where the one or more feature spaces are associated with single languages, where the one or more feature spaces depend, at least in part, on the weights; producing a graph whose nodes represents the applications available in the electronic marketplace and whose edges represent content-based similarity relationships, where the content-based similarity relationships are based, at least in part, on the one or more feature spaces, and producing a content-based similarity score for two nodes that are not directly connected by an edge in the graph, where the content-based similarity score is a function of two or more other content-based similarity scores computed for other pairs of nodes in the graph. 8. The media of claim 7 , where the graph is represented using a latent vector space model, where the graph or latent vector space model provides a distance function between items and items or between items and users. 9. The media of claim 7 , further comprising generating a feature vector of the plurality of feature vectors by: cleaning electronic data from one or more sources to produce cleaned data; extracting one or more features from the cleaned data; determining weights for the one or more sources, for the cleaned data, or for the one or more features, and producing a feature vector from the one or more features. 10. The media of claim 9 , further comprising scoring a feature ƒ in the feature vector for an item a according to: a [ƒ]= L [ƒ]·Σ T|ƒεT W T [ƒ]·SCORE T [ƒ] where: SCORE T [ƒ] is the score of the feature ƒ in treatment T, W T [ƒ] are treatment weights, and L[ƒ] are preferred words weight for the feature ƒ. 11. A method for content recommendation, comprising: accessing electronic data from multiple different sources, where the electronic data represents unstructured text in two or more different languages, where the unstructured text represents titles or descriptions for items available in an electronic marketplace; extracting one or more features from the data; producing a plurality of feature vectors from the one or more features where a feature vector comprises one or more elements; producing from the plurality of feature vectors, one or more feature spaces from which a content-based similarity recommendation can be made; producing a graph with nodes and edges where the nodes represent items available in an electronic marketplace and the edges represent content-based similarity relationships, where the content-based similarity relationships are based, at least in part, on the one or more feature spaces, wherein the graph is represented using a latent vector space model that provides a distance function between nodes in the graph; and producing a content-based similarity score for two nodes that are not directly connected by an edge in the graph, where the content-based similarity score is a function of two or more other content-based similarity scores computed for other pairs of nodes in the graph. 12. The method of claim 11 , where cleaning the electronic data comprises performing one or more processes independently or interdependently, where the one or more processes change the capi

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06Q30/0631Primary
Recommending goods or services · CPC title
G06F16/3344
using natural language analysis · CPC title
G06F16/313
Selection or weighting of terms for indexing · CPC title
G06F40/30
Semantic analysis · CPC title
G06F17/30616
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 55962107

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9898773B2 cover?: Example apparatus and methods access multiple sources of information concerning features for applications, clean the data from the multiple sources, extract features from the cleaned data, selectively weight the sources, data or extracted features and produce a feature vector. The feature vector may then be used in a single language feature space or in a multi-language feature space. Feature sp…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06Q30/0631. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).