Identifying entity synonyms

US9600566B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9600566-B2
Application numberUS-77996410-A
CountryUS
Kind codeB2
Filing dateMay 14, 2010
Priority dateMay 14, 2010
Publication dateMar 21, 2017
Grant dateMar 21, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments for identifying an entity synonym of an entity are described. A query log is stored in a database located on at least one computing device. A candidate generation module can select a candidate query in the query log that shares a click on a URL with the entity. A correlated tag module can generate a set of phrase-tag pairs for the entity and the candidate query and measure a mutual information value for each phrase-tag pair. A candidate filtering module can determine a click similarity value between the candidate query and the entity based on a set of URLs selected in the search engine results and a tag similarity value based on the mutual information values. A candidate query is selected as an entity synonym if the click similarity value and the tag similarity value are greater than predetermined thresholds respectively.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method performed by a computing device, the method comprising: selecting, from a query log stored in a database, a first query associated with a first set of Uniform Resource Locators (URLs) returned by a search engine responsive to the first query and a second query associated with a second set of URLs returned by the search engine responsive to the second query; measuring a click similarity between the first query and the second query, the click similarity being measured based on similarity of first click behavior of users with respect to the first set of URLs to second click behavior of the users with respect to the second set of URLs; measuring first mutual information values between first phrases and first tags of the first query based on the first click behavior, wherein the first phrases and the first tags comprise different tokens of the first query; measuring second mutual information values between second phrases and second tags of the second query based on the second click behavior, wherein the second phrases and the second tags comprise different tokens of the second query; measuring a tag similarity between the first query and the second query using the first mutual information values and the second mutual information values; determining whether the first query and the second query are synonyms using both the click similarity and the tag similarity; and when the first query and the second query are determined to be synonyms, storing the second query as a query synonym of the first query. 2. The method of claim 1 , further comprising setting predetermined threshold values for the click similarity and the tag similarity, and using the predetermined threshold values to determine whether the first query and the second query are synonyms. 3. The method of claim 1 , wherein the tag similarity is measured by identifying: shared URLs that appear in both the first set and the second set; first other URLs that were clicked by the users that include the first phrases of the first query but do not include the first tags of the first query, and second other URLs that were clicked by the users that include the second phrases of the second query but do not include the second tags of the second query. 4. The method of claim 1 , wherein measuring the first mutual information values includes determining numbers of clicks by the users on: shared URLs that are returned by the search engine responsive to the entire first query, other first URLs that are returned by the search engine responsive to other first queries that include the first phrases without the first tags, and further first URLs that are returned by the search engine responsive to further first queries that include the first tags without the first phrases. 5. The method of claim 4 , wherein measuring the second mutual information values includes determining numbers of clicks by the users on: the shared URLs when returned by the search engine responsive to the entire second query, other second URLs that are returned by the search engine responsive to other second queries that include the second phrases without the second tags, and further second URLs that are returned by the search engine responsive to further second queries that include the second tags without the second phrases. 6. The method of claim 1 , wherein measuring the click similarity includes determining numbers of clicks by the users on individual URLs in: the first set of URLs, the second set of URLs, and shared URLs that appear in both the first set and the second set. 7. The method of claim 6 , wherein the first tags include first prefixes and first suffixes that appear in the first query with the first phrases, and the second tags include second prefixes and second suffixes that appear in the second query with the second phrases. 8. A system comprising: a machine-readable memory device or storage device storing instructions; and a hardware processor configured to execute the instructions, wherein the instructions, when executed by the hardware processor, cause the hardware processor to: select, from a query log stored in a database, a first query associated with a first set of Uniform Resource Locators (URLs) returned by a search engine responsive to the first query and a second query associated with a second set of URLs returned by the search engine responsive to the second query; measure a click similarity between the first query and the second query, the click similarity being measured based on similarity of first click behavior of users with respect to the first set of URLs to second click behavior of the users with respect to the second set of URLs; measure first mutual information values between first phrases and first tags of the first query based on the first click behavior, wherein the first phrases and the first tags comprise different tokens of the first query; measure second mutual information values between second phrases and second tags of the second query based on the second click behavior, wherein the second phrases and the second tags comprise different tokens of the second query; measure a tag similarity between the first query and the second query using the first mutual information values and the second mutual information values; determine whether the first query and the second query are synonyms using both the click similarity and the tag similarity; and when the first query and the second query are determined to be synonyms, store the second query as a query synonym of the first query. 9. The system of claim 8 , wherein the first tags and the second tags comprise context words that appear with the first phrases in the first query and that also appear with the second phrases in the second query. 10. The system of claim 8 , wherein the tag similarity reflects similarity between the first mutual information values and the second mutual information values. 11. The system of claim 10 , wherein the instructions, when executed by the hardware processor, cause the hardware processor to: apply a cosine similarity function to the first mutual information values and the second mutual information values to determine the tag similarity. 12. A hardware memory device or hardware storage device comprising instructions which, when executed by a hardware processor, cause the hardware processor to perform acts comprising: selecting, from a query log stored in a database, a first query associated with a first set of Uniform Resource Locators (URLs) returned by a search engine responsive to the first query and a second query associated with a second set of URLs returned by the search engine responsive to the second query; measuring a click similarity between the first query and the second query, the click similarity being measured based on similarity of first click behavior of users with respect to the first set of URLs to second click behavior of the users with respect to the second set of URLs; measuring first mutual information values between first phrases and first tags of the first query based on the first click behavior, wherein the first phrases and the first tags comprise different tokens of the first query; measuring second mutual information values between second phrases and second tags of the second query based on the second click behavior, wherein the second phrases and the second tags comprise different tokens of the second query; measuring a tag similarity between the first query and the second query using the first mutual information values and the second mutual information values; determining whether the first query and the second query are synonyms using both the c

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9600566B2 cover?
Embodiments for identifying an entity synonym of an entity are described. A query log is stored in a database located on at least one computing device. A candidate generation module can select a candidate query in the query log that shares a click on a URL with the entity. A correlated tag module can generate a set of phrase-tag pairs for the entity and the candidate query and measure a mutual …
Who is the assignee on this patent?
Ganti Venkatesh, Xin Dong, Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/3344. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 21 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).