Method and Apparatus of Processing Nested Fragment Caching of a Web Page
US-2015363369-A1 · Dec 17, 2015 · US
US9672296B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9672296-B2 |
| Application number | US-72372710-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 15, 2010 |
| Priority date | May 6, 2004 |
| Publication date | Jun 6, 2017 |
| Grant date | Jun 6, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A repository server that provides stored copies of Web-accessible documents A client of the repository server may register a document in the repository server. The repository server makes a copy of the registered document and returns a repository URL for the copy to the client. The repository URL may be used to fetch the copy from the repository URL. Registration further relates the stored copy to its document URL, to an identifier for the stored copy, to a fingerprint that is a condensed representation of the stored copy's content and can be used to determine degrees of similarity other than match-no match, and to a set of stored copies having similar content. The fingerprints are used to compute similarity. The similarity computation further employs comparisons of links in the documents and of document URLS to determine whether it is necessary to use the fingerprints to compute similarity.
Opening claim text (preview).
The invention claimed is: 1. A method of comparing digital documents, the method comprising: filtering documents to reduce a total number of documents for fingerprint comparison at least by: performing a comparison of respective total numbers of links respectively included in a pair of documents in the documents and one or more websites pointed to by the respective total numbers of links based in part or in whole upon rates of change comprising a first rate of change and a second rate of change, wherein the first rate of change associated with at least one respective total number of links is slower than the second rate of change associated with contents of at least one document of the pair of documents in the documents, and the comparison is performed without comparing fingerprints of the pair of documents; and determining whether or not to compare the fingerprints of the pair of documents based at least in part upon the rates of change and whether a degree of difference in the respective total numbers of links from the comparison falls within a predetermined range; and in response to determining that the fingerprints should be compared, generating the fingerprints on determining the fingerprints are not existing, and comparing the fingerprints, at a computing system, to determine a similarity value indicating a degree of similarity of the pair of documents being compared, the similarity value expressing degrees of similarity in addition to whether the pair of documents being compared are identical or not identical. 2. The method set forth in claim 1 wherein: performing the comparison of the links comprises comparing a number of links in the documents. 3. The method set forth in claim 1 wherein: performing the comparison of the links comprises comparing destinations of the links in the documents. 4. The method set forth in claim 1 wherein: the documents comprise digital documents identified by identifiers; and the method includes relating the fingerprint for a digital document to the digital document's identifier, the digital documents whose identifiers are known and have fingerprints may be compared without reference to the documents or copies thereof. 5. The method set forth in claim 4 wherein: the identifiers are document locators used to locate the documents in a network. 6. The method set forth in claim 1 further comprising: relating a given one of the documents to similarity values for a set of other ones of the documents. 7. The method set forth in claim 6 wherein: the documents are identified by identifiers; and in relating the given one of the documents to the similarity values, an identifier for the given one of the documents is related to the identifiers and similarity values for the set of other ones of the documents. 8. The method set forth in claim 7 wherein: the identifiers are document locators used to locate the documents in a network. 9. The method set forth in claim 1 wherein: the documents comprise digital documents including both structural information and content information and a fingerprint for a document of the documents comprises: a structural encoding portion in which a first encoding preserves semantic information about components of the digital document's structural information; and a content encoding portion in which a second encoding preserves content information about components of the digital document's content. 10. The method set forth in claim 9 wherein: components of the structural encoding portion and components of the content encoding portion occur in the fingerprint in an order in which the components encoded in the encoding portions occur in the digital document. 11. The method set forth in claim 10 wherein: in the digital document, the components are nested. 12. The method set forth in claim 10 wherein: the components of the structural information are HTML tags. 13. The method set forth in claim 9 wherein generating the fingerprint further comprises: encoding the structural information using a first encoding preserving semantic information about the structural information; and encoding the content information using a second encoding. 14. The method set forth in claim 13 wherein: in encoding the structural information, the structural information is encoded in an order in which the structural information occurs in the digital document; and content information is encoded as it is encountered while encoding the structural information. 15. The method set forth in claim 14 wherein: the structural information is HTML tags. 16. The method set forth in claim 9 wherein the act of comparing the fingerprints further comprises: (a). finding an encoding of structural information in a first fingerprint; (b). finding a second fingerprint substring matching a first fingerprint substring that begins at the found encoding; (c). adding a length of the found substring to a running length total; (d). finding another encoding of structural information in the first fingerprint not contained in any found substring and repeating acts (a)-(c) with the other encoding of structural information in the first fingerprint; (e). repeating acts (a)-(d) until no further encodings of structural information can be found in act (d); and (f). using a length of one of the fingerprints and the running length total to compute a similarity value. 17. A computer program product embodied on a non-transitory computer readable medium, the non-transitory computer readable medium having stored a sequence of instructions executed by a processor causes the process to perform a set of acts, the set of acts comprising: filtering documents to reduce a total number of documents for fingerprint comparison at least by: performing a comparison of respective total numbers of links respectively included in a pair of documents in the documents and one or more websites pointed to by the respective total numbers of links based in part or in whole upon rates of change comprising a first rate of change and a second rate of change, wherein the first rate of change associated with at least one respective total number of links is slower than the second rate of change associated with contents of at least one document of the pair of documents in the documents, and the comparison is performed without comparing fingerprints of the pair of documents; and determining whether or not to compare the fingerprints of the pair of documents based at least in part upon the rates of change and whether a degree of difference in the respective total numbers of links from the comparison falls within a predetermined range; and in response to determining that the fingerprints should be compared, generating the fingerprints on determining the fingerprints are not existing, and comparing the fingerprints, at a computing system, to determine a similarity value indicating a degree of similarity of the pair of documents being compared, the similarity value expressing degrees of similarity in addition to whether the pair of documents being compared are identical or not identical. 18. The computer program product set forth in claim 17 , wherein: the documents comprise digital documents including both structural information and content information and a fingerprint for a document of the documents comprises: a structural encoding portion in which a first encoding preserves semantic information about components of the digital document's structural information; and a content encoding portion in which a second encoding preserves content
Physics · mapped topic
Archiving or backup · CPC title
Physics · mapped topic
of access to content, e.g. by caching · CPC title
Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.