Assigning document identification tags

US9411889B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9411889-B2
Application numberUS-201213419349-A
CountryUS
Kind codeB2
Filing dateMar 13, 2012
Priority dateJul 3, 2003
Publication dateAug 9, 2016
Grant dateAug 9, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Document identification tags are assigned to documents to be added to a collection of documents. Based on query-independent information about a new document, a document identification tag is assigned to the new document. The document identification tag so assigned is used in the indexing of the new document. When a list of document identification tags are produced by an index in response to a query, the list is approximately ordered with respect to a measure of query-independent relevance. In some embodiments, the measure of query-independent relevance is related to the connectivity matrix of the World Wide Web. In other embodiments, the measure is related to the recency of crawling. In still other embodiments, the measure is a mixture of these two. The provided systems and methods allow for real-time indexing of documents as they are crawled from a collection of documents.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of assigning a document identifier to a new document, the new document to be added to a collection of documents, the method being performed on a computer system having one or more processors and memory storing one or more programs for execution by the one or more processors, the method comprising: partitioning a set of document identifiers into a plurality of segments, each segment associated with a respective subset of the set of document identifiers, wherein the document identifiers comprise a predetermined set of monotonically ordered document identification tags; subdividing each of the segments into a plurality of tiers, wherein each tier is associated with a respective subset of the set of document identifiers, and wherein the plurality of tiers are monotonically ordered with respect to a query-independent document importance metric; receiving query-independent information about the new document, the information including a value of the query-independent document importance metric and a unique document identifier for the new document; selecting, based at least in part on the unique document identifier, one of the segments; selecting, based at least on the query-independent information, one of the tiers associated with the selected segment; assigning to the new document a document identifier from the respective subset of document identifiers associated with the selected tier, the assigned document identifier not previously assigned to any of the documents in the collection of documents, and repeating the receiving, selecting a segment, selecting a tier, and assigning, with respect to one or more additional new documents. 2. The method of claim 1 , wherein each tier in the plurality of tiers is associated with a respective predetermined range of metric values; and wherein selecting a tier comprises selecting the tier for which the query-independent document importance metric of the new document falls within the respective predetermined range of metric values associated with the selected tier. 3. The method of claim 2 , wherein the respective predetermined ranges of metric values associated with the plurality of tiers are non-overlapping. 4. The method of claim 2 , wherein the respective subset of the document identifiers associated with each tier monotonically increases with the position of the tier in the ordering; and wherein the respective predetermined range of metric values associated with each tier monotonically increases with the position of the tier in the ordering. 5. The method of claim 4 , wherein assigning a document identifier to the new document comprises assigning to the new document a minimum available document identifier identification tag from the respective subset of document identifiers associated with the selected tier. 6. The method of claim 4 , wherein assigning a document identifier to the new document comprises assigning to the new document a maximum available document identifier from the respective subset of document identifiers associated with the selected tier. 7. The method of claim 1 , further comprising: when a flush condition is satisfied, performing a flush operation, including building a sorted map, the sorted map relating globally unique identifiers to document identifiers assigned to documents since a prior flush operation. 8. The method of claim 7 , further comprising: when a merge condition is satisfied, performing a merge operation, the merge operation including merging a layered plurality of sorted maps produced by previous flushing operations, the merge operation further including producing a merged map relating globally unique identifiers to document identifiers assigned to documents. 9. The method of claim 1 , further comprising: when a flush condition is satisfied, performing a flush operation, the flush operation including building a first sorted map and a second sorted map; wherein the first sorted map is keyed and sorted by globally unique identifiers, and includes for each globally unique identifier a corresponding document identifier; and wherein the second sorted map is keyed and sorted by document identifiers assigned to documents since a prior flush operation, and includes for each such document identifier a corresponding globally unique identifier. 10. The method of claim 9 , wherein the globally unique identifiers are URL fingerprints. 11. The method of claim 10 , wherein each URL fingerprint comprises a value produced by applying a one way mapping function to an address associated with a document in the collection of documents. 12. A computer-implemented method of assigning a plurality of document identification tags to a plurality of new documents, the plurality of new documents to be added to a collection of documents, the method comprising: on a computer system having one or more processors and memory storing one or more programs for execution by the one or more processors: partitioning a set of valid globally unique document identifiers into a plurality of segments, each segment associated with a respective subset of the set of valid globally unique document identifiers; subdividing each of the segments into a plurality of tiers, wherein the plurality of tiers are monotonically ordered with respect to a query-independent document importance metric, each segment having an associated, predetermined set of monotonically ordered document identification tags, and each tier of a respective segment having an associated subset of the set of document identification tags for the respective segment; receiving query-independent information about a new document, the information including a value of the query-independent document importance metric and a globally unique document identifier for the new document; selecting, based at least in part on the globally unique document identifier, one of the segments; selecting, based at least on the query-independent information, one of the tiers associated with the selected segment; assigning to the new document a document identification tag from the subset of document identification tags associated with the selected tier, wherein the document identification tag assigned to the new document is unique with respect to document identification tags assigned to other documents in the collection of documents; and repeating the receiving, selecting a segment, selecting a tier, and assigning, with respect to one or more additional new documents; wherein the assigned document identification tags are assigned to documents in the collection of documents having globally unique document identifiers associated with the respective segment. 13. The method of claim 12 , wherein the set of document identification tags for a first segment includes a plurality of document identification tags in the set of document identification tags for a second segment. 14. A system for assigning a document identification tag to a new document, the new document to be added to a collection of documents, the system comprising: one or more processors; and memory storing one or more programs to be executed by the one or more processors; the one or more programs comprising instructions for: partitioning a set of monotonically ordered document identification tags into a plurality of segments, each segment associated with a respective subset of the set of monotonically ordered document identification tags; subdividing each of the segments into a plurality of tiers, wherein each tier is associated with a respective subset of the set of document identification tags, and wherein the plurality of tiers are monotonica

Assignees

Inventors

Classifications

  • comprising specially adapted graphical user interfaces [GUI] · CPC title

  • Entity profiles · CPC title

  • Database migration support · CPC title

  • for authentication of entities (cryptographic mechanisms or cryptographic arrangements for entity authentication H04L9/32) · CPC title

  • using ranking · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9411889B2 cover?
Document identification tags are assigned to documents to be added to a collection of documents. Based on query-independent information about a new document, a document identification tag is assigned to the new document. The document identification tag so assigned is used in the indexing of the new document. When a list of document identification tags are produced by an index in response to a q…
Who is the assignee on this patent?
Zhu Huican, Acharya Anurag, Google Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/951. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 09 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).