Method and system for incremental collection of forum replies

US9552435B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9552435-B2
Application numberUS-201113997257-A
CountryUS
Kind codeB2
Filing dateDec 22, 2011
Priority dateDec 22, 2010
Publication dateJan 24, 2017
Grant dateJan 24, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present application discloses methods and systems for incrementally collecting replies in a forum and belongs to the technical field of collecting network information. The method comprises periodically determining whether there is a newly-established post and a post having new replies in all forum list pages needed to be collected: if yes, extracting a main post and reply information from the newly-established post, and extracting the information of the new replies from the post having new replies. The system comprises a determining device ( 11 ) for periodically determining whether there is a newly-established post and a post having new replies in all forum list pages needed to be collected; and an extracting device ( 12 ) for extracting a main post and reply information from the newly-established post, and extracting the information of the new replies from the post having new replies. The present application can quickly, accurately and completely collect all main post/replies of a post, so that the drawback that the information of turned pages of a post are missed to be searched or cannot be searched through a general search engine may be overcome.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for incrementally collecting replies in a forum on a computer comprising a processor, the method comprising: determining, using the processor, whether there is a newly-established post or a post with new replies in a forum list page, according to a URL of a first page of the post and number of replies to the post; if it is determined that there is a newly-established post, extracting, using the processor, a main post of the newly-established post and reply information from the newly-established post; if it is determined that there is a post with new replies, calculating, using the processor, an origination and a number of the new replies to, based on the calculated origination and the calculated number, extract the new replies, wherein the determining further comprises: acquiring, using the processor, each URL of list page from a collection queue of list pages recording URLs of the at least one forum list page; retrieving, using the processor, the URL of the first page of each post and the number of current replies from webpage contents corresponding to the acquired URL; and determining, using the processor, if the post has been recorded in an information list of collected posts according to the retrieved URL of the first page, if not, determining, using the processor, that the post is a newly-established post, and the method further comprises: adding, using the processor, the retrieved URL of the first page and the retrieved number of current replies into an information list of collected posts. 2. The method according to claim 1 , wherein the determining further comprises: retrieving a URL of the first page of each post and the number of current replies from webpage contents corresponding to URLs of the forum list page; determining whether the post exists in an information list of collected posts according to the retrieved URL of the first page, and whether the retrieved current number of replies is larger than a number of present replies recorded in said information list, if yes, it is determined that the post has a new reply. 3. The method according to claim 2 , further comprising: adding the URL of the forum list page into a collection queue of forum list pages if a collection interval for the forum list page expires; retrieving URLs of list pages from the collection queue of forum list pages in a First-In-First-Out order. 4. The method according to claim 3 , wherein the collection interval is dynamically adjustable according to an update frequency of the forum of the URLs of list pages. 5. The method according to claim 3 , wherein the URLs retrieved from the collection queue of list pages meet a friendly access condition of the website of the retrieved URLs of list pages. 6. The method according to claim 2 , further comprising: adding the URL of the first page of the newly-established post or the URL of the post with new replies into a collection queue of content pages; extracting the main post and/or reply and/or URLs of turned pages from the webpage contents corresponding to URLs of the forum list page. 7. The method according to claim 6 , wherein, for the new-established post, if the URL of the first page of the post exists in the collection queue of content pages, the method further comprises: extracting the URL of the first page of the post; replacing a record of a number of present replies of the post in the information list of collected posts with the number of current replies; inserting the URL of the first page of the post into the collection queue of content pages. 8. The method according to claim 6 , wherein the retrieving of URLs of list pages from the collection queue of forum list pages comprises: acquiring the URLs of list pages from the collection queue of list pages in order of FIFO, the acquired URLs meeting a friendly access condition of the website of the URLs of list pages. 9. The method according to claim 6 , wherein extracting the main post and/or reply information from the webpage contents in step (iv) comprises: if the URL is the URL of the first page of the post and is collected for the first time, extracting the main post and reply information from the webpage contents corresponding to the URL; if the URL is the URL of the first page of the post but is not collected for the first time, calculating an origination of new replies S′ From and the number of new replies C′ ParseCount according to the following formulae, and extracting C′ ParseCount new replies from the origination of new replies S′ From . S From ′ = { R PreNum , N PerPage ⁢ ⁢ includes ⁢ ⁢ main ⁢ ⁢ post R PreNum + 1 , N PerPage ⁢ ⁢ ⁢ does ⁢ ⁢ not ⁢ ⁢ include ⁢ ⁢ main ⁢ ⁢ post

Assignees

Inventors

Classifications

  • using information identifiers, e.g. uniform resource locators [URL] · CPC title

  • Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking · CPC title

  • G06Q10/10Primary

    Office automation; Time management · CPC title

  • URL specific, e.g. using aliases, detecting broken or misspelled links · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9552435B2 cover?
The present application discloses methods and systems for incrementally collecting replies in a forum and belongs to the technical field of collecting network information. The method comprises periodically determining whether there is a newly-established post and a post having new replies in all forum list pages needed to be collected: if yes, extracting a main post and reply information from t…
Who is the assignee on this patent?
Wu Xinli, Yang Jianwu, Univ Peking Founder Group Co, and 3 more
What technology area does this patent fall under?
Primary CPC classification G06Q10/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 24 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).