Domain-based ranking in document search

US9836538B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9836538-B2
Application numberUS-39731409-A
CountryUS
Kind codeB2
Filing dateMar 3, 2009
Priority dateMar 3, 2009
Publication dateDec 5, 2017
Grant dateDec 5, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one example, documents that are examined by a search process may be scored in a manner that is specific to a domain. A domain may be a substantive area, such as medicine, sports, etc. Different scoring approaches that take aspects of the domain into account may be applied to the documents, thereby producing different scores than might have been produced by a simple comparison of the terms in the query with the terms in the documents. These domain-based approaches may take a query into account in scoring the documents, or may be query-independent. Each approach may be implemented by a scorer. The combined output of the scorers may be used to generate a score for each document. Documents then may be ranked based on the scores, and search results may be provided.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-readable storage device that stores executable instructions that, when executed by a computer, cause the computer to perform operations comprising: receiving a query; calculating scores for a plurality of documents obtained with respect to the received query by comparing terms in said query with terms in said documents; calling a same first function implemented by each of a plurality of domain-based scorers of different types, to determine, without utilizing one or more documents of the plurality of documents, which of said domain-based scorers will contribute and which will not contribute to scoring of said documents in response to the calculation of said scores for the plurality of documents, wherein the same first function is used to determine whether the received query is too vague and will not be scored or is not too vaoue and will be scored, and wherein determining whether the received query is too vague or not too vague is based upon each domain-based scorer using its own set of first criteria for determining whether the received query is too vague or not too vague, each of said domain-based scorers calculating a domain-based score based on features of said documents or of said query that are specific to a substantive field of knowledge after the calculation of the scores for the plurality of documents, said each of said plurality of domain-based scorers implementing its own version of a same second function to calculate the domain-based score of said documents without obtaining said documents again with respect to the received query, wherein the same second function includes receiving document identifiers to identify said documents in a database and returning scores for said documents and using the returned scores as input into an aggregation formula, wherein each domain-based scorer uses its own set of second criteria within the aggregation formula, wherein said same second function of each of the plurality of domain-based scorers utilizes said documents which have already received scores based on the terms in said query to calculate the domain-based scores of said documents; including, on a list, those domain-based scorers that indicate, through said same first function, that they will contribute to scoring of said documents; using a configurable parameter selected based on a different scoring scheme by those ones of said domain-based scorers that are on said list to adjust said scores, whereby adjusted scores of said documents are created by combining the contributions from all of the said domain-based scorers; creating a set of search results based on the adjusted scores of said documents; and presenting said search results to a user. 2. The computer-readable storage device of claim 1 , wherein said operations further comprise: identifying a number of concepts, in the at least one document, associated with the at least one term in the query; and increasing the domain-based score of the at least one document by a variable amount based on the number, wherein the variable amount decreases when the number is more and less than a predefined number. 3. The computer-readable storage device of claim 1 , further comprising: reducing the domain-based score of the at least one document based on the at least one document having an amount of concepts relevant to the query in excess of a predefined number. 4. The computer-readable storage device of claim 1 , wherein said operations further comprise: identifying a set of concepts associated with the at least one term in the query; and either: decreasing a domain-based score of the at least one document based on how many concepts in the set of concepts are not in the at least one document; or decreasing a domain-based score of the at least one document based on how many concepts in the at least one document are not in the set of concepts. 5. The computer-readable storage device of claim 1 , further comprising: identifying a first set of concepts in the query; identifying a second set of concepts in a summary of the at least one document; determining that the first set has a defined level of similarity to the second set; and based on the first set having the defined level of similarity to the second set, increasing a domain-based score of the at least one document. 6. The computer-readable storage device of claim 1 , wherein a first one of the domain-based scorers evaluates the at least one document without regard to the query based on a determination that the query is vague with respect to domain based scoring of the at least one document for which the first score has been determined. 7. The computer-readable storage device of claim 1 , further comprising: determining a number of concepts from a domain that appears in the at least one document; determining that the number falls within a range; and modifying a domain-based score of the at least one document based on the number falls within the range, wherein the domain-based score is increased on determining that the number falls within a first range, the domain-based score remains same on determining that the number falls within a second range, and the domain-based score is decreased on determining that the number falls within a third range. 8. The computer-readable storage device of claim 1 , further comprising: determining a number of concepts from a domain that appears in the at least one document; and modifying a domain-based score of the at least one document by an amount that is based on the number. 9. The computer-readable storage device of claim 1 , further comprising: determining that the at least one document has a concept, from a domain, that has a level of popularity across a group of documents in the corpus; and based on the concept being in the at least one document, modifying a domain-based score of the at least one document. 10. The computer-readable storage device of claim 1 , wherein the operations further comprise: calling the same first function in each of the plurality of scorers to determine which of the plurality of scorers will form the set of domain-based scorers to produce the domain-based score for the at least one document, wherein each of the plurality of scorers implements its own version of the function. 11. A system that responds to a document search request, the system comprising: a memory; and a processor programmed to: receive a query; calculate scores for a plurality of documents obtained with respect to the received query by comparing terms in said query with terms in said documents; call a same first function implemented by each of a plurality of domain-based scorers of different types, to determine, without utilizing one or more documents of the plurality of documents, which of said domain-based scorers will contribute and which will not contribute to scoring of said documents in response to the calculation of said scores for the plurality of documents, wherein the same first function is used to determine whether the received query is too vague and will not be scored or is not too vague and will be scored, and wherein determining whether the received query is too vague or not too vague is based upon each domain-based scorer using its own set of first criteria for determining whether the received query is too vague or not too vague, each of said domain-based scorers calculating a domain-based score based on features of said documents or of said query that are specific to a substantive field of knowledge after the calculation of the scores for the plurality of documents, said each of said plurality of domain-based scorers implementing its own version of a same second function to calc

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9836538B2 cover?
In one example, documents that are examined by a search process may be scored in a manner that is specific to a domain. A domain may be a substantive area, such as medicine, sports, etc. Different scoring approaches that take aspects of the domain into account may be applied to the documents, thereby producing different scores than might have been produced by a simple comparison of the terms in…
Who is the assignee on this patent?
Rappaport Alain Thierry, Adamson Daniel, Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/951. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 05 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).