Automatic crawling of encoded dynamic URLs

US9785710B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9785710-B2
Application numberUS-201113270806-A
CountryUS
Kind codeB2
Filing dateOct 11, 2011
Priority dateOct 11, 2011
Publication dateOct 10, 2017
Grant dateOct 10, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer program product for crawling URLs that are encoded and highly dynamic, the computer program product includes a non-transitory computer readable storage medium having computer readable program code embodied therewith. The computer readable program code includes computer readable program code configured to retrieve navigational state information corresponding to a URL and compare the navigational state information to previously stored navigational state information corresponding to one or more previously visited URLs. The computer readable program code also includes computer readable program code configured to determine if the URL has been previously visited and retrieve content associated with the URL if the URL has not been previously visited.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer program product for crawling URLs that are encoded and highly dynamic, the computer program product comprising: a non-transitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising: computer readable program code configured to: receiving an encoded URL corresponding to a web-portal page hosted by a web-portal server, wherein a text of the encoded URL does not provide information about content of the web-portal page, the content comprising a plurality of re-arrangeable portlets, and wherein the web-portal server provides, via a representational state transfer service, to a web-crawler, a navigational state information of the web-portal page using a predetermined format; initializing a list of parameters, wherein the list identifies that the web-portal page corresponding to the encoded URL has been visited, and determining the parameters to add to the list by: decoding the encoded URL by sending the encoded URL to the representational state transfer service from the web-portal server hosting the web-portal page; receiving the navigational state information of the web-portal page from the representational state transfer service, wherein the navigational state information is decoded from the encoded URL, and comprises a selection-node-ID and a resource-ID, the selection-node-ID being an identifier assigned to the web-portal page, and the resource-ID being an identifier associated with a resource used by the web-portal page, wherein, determining a type of the encoded URL based on the received representational state information, and when the type is determined to be a resource URL, storing the resource-ID in the list of parameters; and when the type is determined to be an engine URL, storing the selection-node-ID in the list of parameters, and when the type is determined to be a portlet, storing information of the portlet in the list of parameters; determining if the encoded URL has been previously visited based on the list of parameters; and retrieving the content associated with the encoded URL if the encoded URL has not been previously visited. 2. The computer program product of claim 1 , wherein the computer readable program code is further configured to: analyze content associated with the encoded URL; and explore an additional URL found in the content associated with the encoded URL. 3. The computer program product of claim 2 , wherein the computer readable program code is further configured to detect if the additional URL is a logout link prior to exploring the additional URL. 4. The computer program product of claim 1 , wherein the navigational state information from the representational state transfer service is received in XML format. 5. The computer program product of claim 1 , wherein the navigational state information includes information indicative of the type of the encoded URL. 6. The system of claim 1 , wherein the web-portal server uses websphere portal framework. 7. A system, comprising: a computing network including a processing device in communication with one or more non-transitory computer memory storage devices; and the computing network further configured to implement a method comprising: receiving an encoded URL corresponding to a web-portal page hosted by a web-portal server, wherein a text of the encoded URL does not provide information about content of the web-portal page, the content comprising a plurality of re-arrangeable portlets, and wherein the web-portal server provides, via a representational state transfer service, to a web-crawler, a navigational state information of the web-portal page using a predetermined format; initializing a list of parameters, wherein the list identifies that the web-portal page corresponding to the encoded URL has been visited, and determining the parameters to add to the list by: decoding the encoded URL by sending the encoded URL to the representational state transfer service from the web-portal server hosting the web-portal page; receiving the navigational state information of the web-portal page from the representational state transfer service, wherein the navigational state information is decoded from the encoded URL, and comprises a selection-node-ID and a resource-ID, the selection-node-ID being an identifier assigned to the web-portal page, and the resource-ID being an identifier associated with a resource used by the web-portal page, wherein, determining a type of the encoded URL based on the received representational state information, and when the type is determined to be a resource URL, storing the resource-ID in the list of parameters; and when the type is determined to be an engine URL, storing the selection-node-ID in the list of parameters, and when the type is determined to be a portlet, storing information of the portlet in the list of parameters; determining if the encoded URL has been previously visited based on the list of parameters; and retrieving the content associated with the encoded URL if the encoded URL has not been previously visited. 8. The system of claim 7 , wherein the web-portal server uses websphere portal framework.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9785710B2 cover?
A computer program product for crawling URLs that are encoded and highly dynamic, the computer program product includes a non-transitory computer readable storage medium having computer readable program code embodied therewith. The computer readable program code includes computer readable program code configured to retrieve navigational state information corresponding to a URL and compare the n…
Who is the assignee on this patent?
Brake Nevon C, Islam Obidul, Sharabani Adi, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F17/30864. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 10 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).