Determining corresponding terms written in different formats

US9734197B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9734197-B2
Application numberUS-201414199249-A
CountryUS
Kind codeB2
Filing dateMar 6, 2014
Priority dateJul 6, 2000
Publication dateAug 15, 2017
Grant dateAug 15, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatus consistent with the invention allow a user to submit an ambiguous search query and to receive relevant search results. Queries can be expressed using character sets and/or languages that are different from the character set and/or language of at least some of the data that is to be searched. A translation between these character sets and/or languages can be performed by examining the use of terms in aligned text. Probabilities can be associated with each possible translation. Refinements can be made to these probabilities by examining user interactions with the search results.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving an input term in a first language format; identifying two different groups of hyperlinks that each link to a same plurality of intermediary documents, wherein the different groups of hyperlinks have anchor texts in different respective language formats, including: identifying a first group of hyperlinks each having a respective first anchor text that includes the input term in the first language format, and identifying a second group of hyperlinks each having a respective second anchor text in a second language format; determining, from all of the second anchor texts of the second group of hyperlinks, a second term in the second language format that corresponds to the input term in the first language format, including: computing, by one or more computers, a total count of terms, including duplicates, occurring in all of the second anchor texts of the second group of hyperlinks, computing, by one or more computers, a respective individual count of occurrences, in all of the second anchor texts of the second group of hyperlinks, of each of a plurality of terms in the second language format that occur in all of the second anchor texts of the second group of hyperlinks, computing, by one or more computers, a respective score for each of the plurality of terms in the second language format that occur in all of the second anchor texts of the second group of hyperlinks, including comparing, for each term of the plurality of terms in the second language format that occur in all of the second anchor texts of the second group of hyperlinks, the respective individual count for the term to the total count of terms occurring in the second anchor texts of the second group of hyperlinks, and designating a highest-scoring term among the plurality of terms in the second language format that occur in all of the second anchor texts of the second group of hyperlinks as the second term in the second language format that corresponds to the input term in the first language format; receiving a first query having the input term in the first language format; generating a revised query that includes the second term in the second language format; obtaining search results using the revised query; and providing the search results obtained using the revised query in response to receiving the first query. 2. The method of claim 1 , wherein the first language format comprises a first character set, and the second language format comprises a different, second character set. 3. The method of claim 2 , wherein the first language format is romaji, romaja, or pinyin; and wherein the second character set is katakana, hiragana, kanji, hangul, hanja, or traditional Chinese characters. 4. The method of claim 1 , wherein the first language format comprises a first language and the second language format comprises a different, second language. 5. The method of claim 1 , wherein the documents comprise web pages. 6. The method of claim 1 , wherein obtaining the search results using the revised query comprises: searching a database for information in the second language format using the revised query having the second term in the second language format. 7. A computer program product embodied on a non-transitory computer-readable medium, the computer program product including instructions, which when executed by a computer system, are operable to cause the computer system to perform operations comprising: receiving an input term in a first language format; identifying two different groups of hyperlinks that each link to a same plurality of intermediary documents, wherein the different groups of hyperlinks have anchor texts in different respective language formats, including: identifying a first group of hyperlinks each having a respective first anchor text that includes the input term in the first language format, and identifying a second group of hyperlinks each having a respective second anchor text in a second language format; determining, from all of the second anchor texts of the second group of hyperlinks, a second term in the second language format that corresponds to the input term in the first language format, including: computing a total count of terms, including duplicates, occurring in all of the second anchor texts of the second group of hyperlinks, computing a respective individual count of occurrences, in all of the second anchor texts of the second group of hyperlinks, of each of a plurality of terms in the second language format that occur in all of the second anchor texts of the second group of hyperlinks, computing a respective score for each of the plurality of terms in the second language format that occur in all of the second anchor texts of the second group of hyperlinks including comparing, for each term of the plurality of terms in the second language format that occur in all of the second anchor texts of the second group of hyperlinks, the respective individual count for the term to the total count of terms occurring in the second anchor texts of the second group of hyperlinks, and designating a highest-scoring term among the plurality of terms in the second language format that occur in all of the second anchor texts of the second group of hyperlinks as the second term in the second language format that corresponds to the input term in the first language format; receiving a first query having the input term in the first language format; generating a revised query that includes the second term in the second language format; obtaining search results using the revised query; and providing the search results obtained using the revised query in response to receiving the first query. 8. The computer program product of claim 7 , wherein the first language format comprises a first character set, and the second language format comprises a different, second character set. 9. The computer program product of claim 8 , wherein the first language format is romaji, romaja, or pinyin; and wherein the second character set is katakana, hiragana, kanji, hangul, hanja, or traditional Chinese characters. 10. The computer program product of claim 7 , wherein the first language format comprises a first language and the second language format comprises a different, second language. 11. The computer program product of claim 7 , wherein the documents comprise web pages. 12. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving an input term in a first language format; identifying two different groups of hyperlinks that each link to a same plurality of intermediary documents, wherein the different groups of hyperlinks have anchor texts in different respective language formats, including: identifying a first group of hyperlinks each having a respective first anchor text that includes the input term in the first language format, and identifying a second group of hyperlinks each having a respective second anchor text in a second language format; determining, from all of the second anchor texts of the second group of hyperlinks, a second term in the second language format that corresponds to the input term in the first language format, including: computing a total count of terms, including duplicates, occurring in all of the second anchor texts of the second group of hyperlinks, computing a respective individual count of occurrences, in all of the second anchor texts of the second group of hyperlinks of each of a plurality of terms

Assignees

Inventors

Classifications

  • Access to data in other repository systems, e.g. legacy data or dynamic Web page generation · CPC title

  • Query translation · CPC title

  • Use of codes for handling textual entities · CPC title

  • Query translation · CPC title

  • G06F3/0237Primary

    using prediction or retrieval techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9734197B2 cover?
Methods and apparatus consistent with the invention allow a user to submit an ambiguous search query and to receive relevant search results. Queries can be expressed using character sets and/or languages that are different from the character set and/or language of at least some of the data that is to be searched. A translation between these character sets and/or languages can be performed by ex…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/0237. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 15 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).