Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06Q50/265. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 05 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Robust matching for identity screening

US10200397B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10200397-B2
Application number	US-201615195923-A
Country	US
Kind code	B2
Filing date	Jun 28, 2016
Priority date	Jun 28, 2016
Publication date	Feb 5, 2019
Grant date	Feb 5, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The techniques described herein are directed to robust matching for identity screening. In some examples, the techniques can include generating a similarity score for received identity information compared to a reference record. In some examples, the techniques can utilize a region associated with the received identity information to weight tokens composing the identity information or of the reference record to adjust the similarity score. Moreover, the techniques can include multiple tokenizers, transformation providers, and token weight providers and the techniques can be configured to select between the multiple tokenizers, transformation providers, and token weight providers based at least in part on a region to improve the accuracy of the similarity score. The techniques can determine whether or not to flag or otherwise affirm an identity of an individual or entity associated with the entity information based at least in part on the similarity score.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: one or more hardware processors; multiple tokenizers configured to tokenize, by the one or more hardware processors and based at least in part on an identified region, a query string to receive query tokens; a transformation provider configured to: generate, by the one or more hardware processors, one or more transformation rules based at least in part on the identified region and the query tokens; rank, by the one or more hardware processors, the one or more transformation rules based at least in part on the identified region and the query tokens; select, by the one or more hardware processors, one or more of the transformation rules, wherein a number of the one or more transformation rules selected is based on the rank of the one or more transformation rules and a tolerated risk value; and transform, by the one or more hardware processors and based at least in part on the selected one or more transformation rules, the query tokens to obtain a query record including transformed tokens that account for regional token variations, the regional token variations being associated with the identified region; and a token weight provider configured to assign, by the one or more hardware processors, token weights for the transformed tokens of the query record based at least in part on the identified region; a comparer configured to determine, by the one or more hardware processors and based at least in part on the token weights, similarity values between the transformed tokens of the query record and a reference record. 2. The system as recited in claim 1 , further comprising: a signature generator configured to: generate, by the one or more hardware processors, query signatures corresponding to the query record; index, by the one or more hardware processors, the query signatures; generate, by the one or more hardware processors, reference signatures corresponding to a reference record; and index, by the one or more hardware processors, the reference signatures; and the comparer configured to: identify, by the one or more hardware processors, candidate records from the index of signatures corresponding to the reference record, the candidate records corresponding to a one or more of the reference signatures that have signatures within an edit distance of the query signatures; and determine, by the one or more hardware processors, similarity values between the transformed tokens of the query record and the candidate records. 3. The system as recited in claim 1 , wherein a tokenizer of the multiple tokenizers is configured to: tokenize query strings in a first language, tokenize query strings in a second language, or tokenize query strings in the first language for a different dialect or cultural context. 4. The system as recited in claim 1 , wherein the one or more transformation rules are configured to transliterate, by the one or more hardware processors and based at least in part on the identified region or an identified language of the query string, the query string to obtain transliterated query tokens. 5. The system as recited in claim 1 , wherein the transformation rules include a list of synonym pairs corresponding to one or more of regions or languages; and wherein transforming the query tokens into the query record comprises populating, by the one or more hardware processors, the query record with synonyms corresponding to a subset of the query tokens based at least in part on one or more of the identified region or an identified language. 6. The system as recited in claim 5 , wherein the synonym pairs further include synonym costs associated with the synonym pairs; and wherein the transformation provider is further configured to calculate, by the one or more hardware processors, transformation costs based on one or more of synonym costs of synonym pairs used to transform the query tokens or edit distances between the query tokens and corresponding transformed query tokens, the edit distances including a quantification of how dissimilar. 7. The system as recited in claim 6 , the comparer further configured to: based at least in part on the transformation costs, modify, by the one or more hardware processors, the similarity values to obtain modified similarity values; and average, by the one or more hardware processors, the modified similarity values to receive a similarity score. 8. The system as recited in claim 1 , the token weight provider assigning the token weights based at least in part on calculating an inverse document frequency of tokens included in the reference record. 9. A method comprising: identifying a first region associated with a query string and a second region associated with the query string; based at least in part on the identified first region and a language associated with the identified first region: selecting, from multiple tokenizers, a first tokenizer associated with the identified first region or the language of the identified first region; and generating a first set of transformation rules based at least in part on the identified first region and the language of the identified first region; based at least in part on the identified second region and a language associated with the identified second region: selecting, from the multiple tokenizers, a second tokenizer associated with the identified second region or the language of the identified second region; and generating a second set of transformation rules based on the identified second region and the language of the identified second region; tokenizing the query string by the first tokenizer to receive query tokens and the query string by the second tokenizer to receive additional query tokens; transforming, by one or more transformation rules of the first set of transformation rules, the query tokens to form a query record; transforming, by one or more transformation rules of the second set of transformation rules, the additional query tokens to form an addendum to the query record; weighting tokens of the query record and weighting tokens of a reference record based at least in part on frequencies with which the tokens of the query record appear in the reference record and frequencies with which the tokens of the reference record appear in the reference record, respectively; and weighting tokens of the addendum based at least in part on frequencies with which the tokens of the addendum appear in the reference record. 10. The method as recited in claim 9 , wherein a language associated with the first region and a language associated with the second region are different. 11. The method as recited in claim 9 , further comprising: retrieving a first reference record corresponding to the identified first region; and retrieving a second reference record corresponding to the identified second region. 12. The method as recited in claim 11 , wherein the transformation rules include rules to transliterate or translate one or more of the query tokens, the additional query tokens, the first reference record, or the second reference record. 13. The method as recited in claim 9 , further comprising: determining a similarity score between a token of the query record and a token of the reference record by fuzzy matching tokens of the query record with tokens of the reference record, the fuzzy matching based at least in part on weights of the tokens of one or more of the query record or the reference record and transformation costs associated with the one or more transformation rules. 14. The method as recited in claim 13 , wherein a weight of a token of the query record

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06F16/3334
Selection or weighting of terms from queries, including natural language queries · CPC title
G06F16/332
Query formulation · CPC title
G06F21/30
Authentication, i.e. establishing the identity or authorisation of security principals · CPC title
G06Q50/265Primary
Personal security, identity or safety · CPC title
G06F40/284
Lexical analysis, e.g. tokenisation or collocates · CPC title

Patent family

Related publications grouped by family.

View patent family 59285350

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10200397B2 cover?: The techniques described herein are directed to robust matching for identity screening. In some examples, the techniques can include generating a similarity score for received identity information compared to a reference record. In some examples, the techniques can utilize a region associated with the received identity information to weight tokens composing the identity information or of the re…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06Q50/265. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 05 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Dynamic transaction coordination

Leveraging corporal data for data parsing and predicting

Qualification of match results

Methods and systems for classifying data using a hierarchical taxonomy

Method for disambiguated features in unstructured text

Integrated fuzzy joins in database management systems

Frequently asked questions