Semi-supervised identity aggregation of profiles using statistical methods
US-9654594-B2 · May 16, 2017 · US
US10296546B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10296546-B2 |
| Application number | US-201414551365-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 24, 2014 |
| Priority date | Nov 24, 2014 |
| Publication date | May 21, 2019 |
| Grant date | May 21, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are disclosed for identifying the same online user across different communication networks, and further creating a unified profile for that user. The unified profile is an aggregation of publicly available user profile attributes across the different networks. In an embodiment, the techniques are implemented as a computer implemented methodology, including: (1) feature space analysis to identify relevant user features that allows for clusterization of the given target network(s), (2) unsupervised candidate selection to identify one or more candidate user profiles from each target network and that are likely belonging to a target user or so-called queried user, and (3) supervised user identification to identify a likely matching user profile for that target user from each target network. A unified user profile can then be built from data taken from all matched user profiles, and effectively allows a marketer to better understand that user and hence execute more informed targeting.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, comprising: identifying a first network A that comprises a first collection of user profiles, each of which is characterized by a value that is associated with a characteristic feature; identifying a target network B that comprises a second collection of user profiles, each of which is characterized by a value that is associated with the characteristic feature; setting a distance measure threshold AB that depends on a distribution of values for the characteristic feature in both the first network A and the target network B; defining a plurality of user profile clusters in the target network B based on the characteristic feature and the distance measure threshold AB, wherein each of the user profile clusters comprises at least one of the user profiles in the target network B, and wherein each of the user profile clusters has a cluster centroid that is representative of the user profiles comprising the cluster; after defining the plurality of user profile clusters, receiving a target user query that identifies a query user profile on the first network A, the query user profile having a query value associated with the characteristic feature; identifying a particular user profile cluster in the target network B having a minimum distance from its respective cluster centroid to the query user profile, wherein the particular user profile cluster associated with the minimum distance establishes a set of candidate user profiles on the target network B; ranking, via a supervised classifier, the candidate user profiles included in the set, so as to identify a first match candidate user profile for the target user query; and generating a unified user profile that includes (a) a first feature that is included in the query user profile on network A, and (b) a second feature that is included in the first match candidate user profile on network B. 2. The computer-implemented method of claim 1 , further comprising: setting a distance measure threshold AC that depends on a distribution of values for the characteristic feature in both the first network A and a target network C that comprises a third collection of user profiles; defining a second plurality of user profile clusters in the target network C based on the characteristic feature and the distance measure threshold AC, wherein each of the user profile clusters in the second plurality comprises at least one of the user profiles in the target network C, and wherein each of the user profile clusters in the second plurality has a cluster centroid that is representative of the user profiles comprising the cluster; identifying a second particular user profile cluster in the target network C having a minimum distance from its respective cluster centroid to the query user profile, wherein the second particular user profile cluster associated with the minimum distance establishes an additional set of candidate user profiles on the target network C; ranking, via the supervised classifier, the candidate user profiles included in the additional set, so as to identify a second match candidate user profile for the target user query; and supplementing the unified user profile to further include a third feature from the second match candidate user profile. 3. The computer-implemented method of claim 1 , further comprising, before receiving the target user query: setting a distance measure threshold AC that depends on a distribution of values for the characteristic feature in both the first network A and a target network C that comprises a third collection of user profiles; and defining a second plurality of user profile clusters in the target network C based on the characteristic feature and the distance measure threshold AC, wherein each of the user profile clusters in the second plurality comprises at least one of the user profiles in the target network C, and wherein each of the user profile clusters in the second plurality has a cluster centroid that is representative of the user profiles comprising the cluster. 4. The computer-implemented method of claim 1 , wherein the distance measure threshold AB is set and recalculated before receiving the target user query. 5. The computer-implemented method of claim 1 , further comprising, before receiving the target user query, setting a distance measure threshold AC that depends on a distribution of values for the characteristic feature in both the first network A and a target network C. 6. The computer-implemented method of claim 1 , further comprising identifying, from amongst the plurality of user profile clusters, a sibling cluster having a cluster centroid that is within a pre-established mathematical distance from the query user profile. 7. The computer-implemented method of claim 6 wherein the cluster centroid of the sibling cluster is mathematically closer to the query user profile than any of the other user profile clusters, other than the particular user profile cluster. 8. The computer-implemented method of claim 1 , wherein the cluster centroid of the particular user profile cluster is defined as an average frequency distribution of characters of a particular feature in the particular user profile cluster. 9. The computer-implemented method of claim 1 , wherein the minimum distance is a square of a Euclidean distance between an average frequency distribution of characters of the characteristic feature of the query user profile on the first network A and an average frequency distribution of characters of a corresponding feature in the particular user profile cluster. 10. The computer-implemented method of claim 1 wherein ranking includes assigning match probabilities to each of the candidate user profiles, and identifying a best match based on the assigned match probabilities. 11. A non-transient computer program product having instructions encoded thereon that when executed by one or more processors causes a process to be carried out, the process comprising: identifying a first network A that comprises a first collection of user profiles, each of which is characterized by a value that is associated with a characteristic feature; identifying a target network B that comprises a second collection of user profiles, each of which is characterized by a value that is associated with the characteristic feature; setting a distance measure threshold AB that depends on a distribution of values for the characteristic feature in both the first network A and the target network B; defining a plurality of user profile clusters in the target network B based on the characteristic feature and the distance measure threshold AB, wherein each of the user profile clusters comprises at least one of the user profiles in the target network B, and wherein each of the user profile clusters has a cluster centroid that is representative of the user profiles comprising the cluster; after defining the plurality of user profile clusters, receiving a target user query that identifies a query user profile on the first network A, the query user profile having a query value associated with the characteristic feature; identifying a particular user profile cluster in the target network B having a minimum distance from its respective cluster centroid to the query user profile, wherein the particular user profile cluster associated with the minimum distance establishes a set of candidate user profiles on the target network B; ranking, via a supervised classifier, the candidate user profiles included in the set, so as to identify a first match candidate user profile for the target user query; and generating a unified user profile that includes (a) a first feature that is included in the
Business processes related to social networking or social networking services · CPC title
Query execution (filtering based on additional data G06F16/335) · CPC title
Search customisation based on social or collaborative filtering · CPC title
Search customisation based on user profiles and personalisation · CPC title
based on user profile or attribute · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.