Save session storage space by identifying similar contents and computing difference

US10579696B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10579696-B2
Application numberUS-201815902428-A
CountryUS
Kind codeB2
Filing dateFeb 22, 2018
Priority dateFeb 22, 2018
Publication dateMar 3, 2020
Grant dateMar 3, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An approach is provided in which an information handling system begins by storing a first URL and a corresponding first web page dataset. The information handling system then receives a request to store a second URL and a corresponding second web page dataset. The information handling system determines that the second URL corresponds to the first URL and, as such, the information handling system creates a diff web page dataset based on a difference between the first web page dataset and the second web page dataset. In turn, the information handling system stores the second URL and the diff web page dataset.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method implemented by an information handling system that includes a memory and a processor, the method comprising: storing a first URL and a corresponding first web page dataset; receiving a request to store a second URL and a corresponding second web page dataset; in response to determining that the second URL corresponds to the first URL, creating a diff web page dataset based on a difference between the first web page dataset and the second web page dataset, wherein the creating of the diff web page dataset further comprises: creating a word-based suffix tree based on the first web page dataset and the second web page dataset; and using the word-based suffix tree in the determining of the difference between the first web page dataset and the second web page dataset; and storing the second URL and the diff web page dataset. 2. The method of claim 1 wherein the creating of the diff web page dataset further comprises: determining a longest common string in the word-based suffix tree; and using the longest common string in the determining of the difference between the first web page dataset and the second web page dataset. 3. The method of claim 2 wherein, prior to creating of the word-based suffix tree, the method further comprises: removing data from the first web page dataset pertaining to first white space; and removing data from the second web page dataset pertaining to second white space. 4. The method of claim 1 wherein the first web page dataset is requested by a first user and the second web page dataset is requested by a second user that is different from the first user. 5. The method of claim 1 wherein the diff web page dataset comprises a link to the first web page dataset and also comprises difference data that is based on the difference between the first web page dataset and the second web page dataset. 6. The method of claim 5 further comprising: receiving a request to reconstruct the second web page dataset; in response to receiving the request to reconstruct the second web page dataset, retrieving the link to the first web page dataset from the diff web page dataset; using the retrieved link to retrieve the first web page dataset, applying the difference data to the retrieved first web page dataset to create a reconstructed second web page dataset; and providing the reconstructed second web page dataset. 7. The method of claim 1 wherein the diff web page dataset is smaller in data size compared against the second web page dataset. 8. An information handling system comprising: one or more processors; a memory coupled to at least one of the processors; a set of computer program instructions stored in the memory and executed by at least one of the processors in order to perform actions of: storing a first URL and a corresponding first web page dataset; receiving a request to store a second URL and a corresponding second web page dataset; in response to determining that the second URL corresponds to the first URL, creating a diff web page dataset based on a difference between the first web page dataset and the second web page dataset, wherein the creating of the diff web page dataset further comprises: creating a word-based suffix tree based on the first web page dataset and the second web page dataset; and using the word-based suffix tree in the determining of the difference between the first web page dataset and the second web page dataset; and storing the second URL and the cliff web page dataset. 9. The information handling system of claim 8 wherein the processors perform additional actions comprising: determining a longest common string in the word-based suffix tree; and using the longest common string in the determining of the difference between the first web page dataset and the second web page dataset. 10. The information handling system of claim 9 wherein, prior to creating of the word-based suffix tree, the processors perform additional actions comprising: removing data from the first web page dataset pertaining to first white space; and removing data from the second web page dataset pertaining to second white space. 11. The information handling system of claim 8 wherein the first web page dataset is requested by a first user and the second web page dataset is requested by a second user that is different from the first user. 12. The information handling system of claim 8 wherein the diff web page dataset comprises a link to the first web page dataset and also comprises difference data that is based on the difference between the first web page dataset and the second web page dataset. 13. The information handling system of claim 12 wherein the processors perform additional actions comprising: receiving a request to reconstruct the second web page dataset; in response to receiving the request to reconstruct the second web page dataset, retrieving the link to the first web page dataset from the diff web page dataset; using the retrieved link to retrieve the first web page dataset, applying the difference data to the retrieved first web page dataset to create a reconstructed second web page dataset; and providing the reconstructed second web page dataset. 14. The information handling system of claim 8 wherein the diff web page dataset is smaller in data size compared against the second web page dataset. 15. A computer program product stored in a computer readable storage medium, comprising computer program code that, when executed by an information handling system, causes the information handling system to perform actions comprising: storing a first URL and a corresponding first web page dataset; receiving a request to store a second URL and a corresponding second web page dataset; in response to determining that the second URL corresponds to the first URL, creating a diff web page dataset based on a difference between the first web page dataset and the second web page dataset, wherein the creating of the diff web page dataset further comprises: creating a word-based suffix tree based on the first web page dataset and the second web page dataset; and using the word-based suffix tree in the determining of the difference between the first web page dataset and the second web page dataset; and storing the second URL and the cliff web page dataset. 16. The computer program product of claim 15 wherein the information handling system perform further actions comprising: determining a longest common string in the word-based suffix tree; and using the longest common string in the determining of the difference between the first web page dataset and the second web page dataset. 17. The computer program product of claim 16 wherein, prior to creating of the word-based suffix tree, the information handling system performs further actions comprising: removing data from the first web page dataset pertaining to first white space; and removing data from the second web page dataset pertaining to second white space. 18. The computer program product of claim 15 wherein the first web page dataset is requested by a first user and the second web page dataset is requested by a second user that is different from the first user. 19. The computer program product of claim 15 wherein the diff web page dataset comprises a link to the first web page dataset and also comprises difference data that is based on the difference between the first web page dataset and the second web page dataset. 20. The computer program p

Assignees

Inventors

Classifications

  • URL specific, e.g. using aliases, detecting broken or misspelled links · CPC title

  • Display of layout of documents; Previewing · CPC title

  • Version control (for software G06F8/71) · CPC title

  • Access to data in other repository systems, e.g. legacy data or dynamic Web page generation · CPC title

  • of access to content, e.g. by caching · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10579696B2 cover?
An approach is provided in which an information handling system begins by storing a first URL and a corresponding first web page dataset. The information handling system then receives a request to store a second URL and a corresponding second web page dataset. The information handling system determines that the second URL corresponds to the first URL and, as such, the information handling syste…
Who is the assignee on this patent?
IBM, Inernational Business Machines Corp
What technology area does this patent fall under?
Primary CPC classification G06F16/9566. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 03 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).