Document ranking based on entity frequency

US9679018B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9679018-B1
Application numberUS-201414183936-A
CountryUS
Kind codeB1
Filing dateFeb 19, 2014
Priority dateNov 14, 2013
Publication dateJun 13, 2017
Grant dateJun 13, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for document ranking. One of the methods includes receiving a request for one or more documents, obtaining a set of documents responsive to the request, and obtaining, from a user profile associated with a source of the request, representations of one or more topics of interest to a user. The method also includes selecting, from the set of documents, at least one document associated with a particular topic that matches at least one of the one or more topics of interest to the user, for the at least one selected document, obtaining a value corresponding to an inverse document frequency of documents associated with the particular topic in a corpus of documents, and generating a score for the at least one document based at least in part on the value corresponding to the inverse document frequency.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, from a client device and by a document system, a request for one or more documents; obtaining, by the document system and from a corpus of documents, a set of documents responsive to the request; obtaining, from a user profile associated with a source of the request, representations of one or more topics of interest to a user; selecting, by the document system and using an index, at least one document from the set of documents that is associated with a particular topic that matches at least one of the one or more topics of interest to the user; for the at least one selected document, obtaining a value corresponding to an inverse document frequency of the particular topic in the corpus of documents, comprising: identifying a first number of documents in the corpus of documents that have been created during a limited time period that ranges from a prior time to a current time, wherein the first number of documents is less than a total number of documents in the corpus of documents; and identifying a second number of documents in the corpus of documents that have been created during the limited time period and that reference the particular topic, wherein the second number of documents is less than the first number of documents, wherein the value corresponding to the inverse document frequency is based on a ratio of the first number of documents and the second number of documents; generating a score for the at least one document based at least in part on the value corresponding to the inverse document frequency that is based on the ratio of the first number of documents and the second number of documents; determining that the score for the at least one document satisfies a threshold score that indicates that the particular topic of the at least one document is an infrequent topic in the corpus of documents; and responsive to determining that the score for the at least one document satisfies the threshold score that indicates that the particular topic of the at least one document is an infrequent topic in the corpus of documents, transmitting information associated with the at least one document from the document system to the client device in response to the request, wherein the transmitted information includes information for rendering an interface that provides access to the at least one document. 2. The method of claim 1 , further comprising ranking the set of documents that are responsive to the request for one or more documents, wherein the at least one document is ranked based at least in part on the respective generated score. 3. The method of claim 2 , further comprising transmitting information associated with one or more of the ranked documents to the user. 4. The method of claim 1 , wherein the representations of the one or more topics of interest to the user comprise a set of entity identifiers of entities, each entity corresponding to one of the one or more topics of interest, and each entity being represented by a node in a graph, the nodes for the one or more topics of interest corresponding to nodes associated with the at least one selected document. 5. The method of claim 1 , wherein the score for the at least one document is generated based on a function of the value corresponding to the inverse document frequency. 6. The method of claim 1 , further comprising associating each document in the corpus of documents with a topic, comprising: for each entity referenced in the document, determining a weight score for the entity that is proportional to a percentage of content in the document related to the entity; and designating the entity that has the highest weight score as the topic of the document. 7. A computer-implemented method comprising: receiving, from a client device and by a document system, a request for one or more documents; obtaining, by the document system and from a corpus of documents, a set of documents responsive to the request; obtaining, from a user profile associated with a source of the request, representations of a plurality of topics of interest to a user; selecting, by the document system and using an index, at least one document from the set of documents that is associated with a particular group of co-occurring topics that matches at least one group of topics in the plurality of topics of interest to the user; for the at least one selected document, obtaining a value corresponding to an inverse document frequency of the particular group of co-occurring topics in the corpus of documents, comprising: identifying a first number of documents in the corpus of documents that have been created during a limited time period that ranges from a prior time to a current time, wherein the first number of documents is less than a total number of documents in the corpus of documents; and identifying a second number of documents in the corpus of documents that have been created during the limited time period and that reference the particular group of co-occurring topics, wherein the second number of documents is less than the first number of documents, wherein the value corresponding to the inverse document frequency is based on a ratio of the first number of documents and the second number of documents; generating a score for the at least one document based at least in part on the value corresponding to the inverse document frequency that is based on the ratio of the first number of documents and the second number of documents; determining that the score for the at least one document satisfies a threshold score that indicates that the particular group of co-occurring topics of the at least one document is an infrequent group of co-occurring topics in the corpus of documents; and responsive to determining that the score for the at least one document satisfies the threshold score that indicates that the particular group of co-occurring topics of the at least one document is an infrequent group of co-occurring topics in the corpus of documents, transmitting information associated with the at least one document from the document system to the client device in response to the request, wherein the transmitted information includes information for rendering an interface that provides access to the at least one document. 8. The method of claim 7 , further comprising ranking the set of documents that are responsive to the request for one or more documents, wherein the at least one document is ranked based at least in part on the respective generated score. 9. The method of claim 8 , further comprising transmitting one or more of the ranked documents to the user. 10. The method of claim 7 , wherein the representations of the plurality of topics of interest to the user comprise a set of entity identifiers of entities, each entity corresponding to one of the one or more topics of interest, and each entity being represented by a node in a graph, the nodes for the one or more topics of interest corresponding to nodes associated with the at least one selected document. 11. The method of claim 7 , wherein the score for the at least one document is generated based on a function of the value corresponding to the inverse document frequency. 12. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving, from a client device and by a document system, a request for one or more documents; obtaining, by the document system and from a corpus of documents, a set of documents responsive to the request; obtainin

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • G06F16/335Primary

    Filtering based on additional data, e.g. user or group profiles (filtering in web context G06F16/9535, G06F16/9536) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9679018B1 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for document ranking. One of the methods includes receiving a request for one or more documents, obtaining a set of documents responsive to the request, and obtaining, from a user profile associated with a source of the request, representations of one or more topics of interest to a user. The method …
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06F17/3053. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 13 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).