Self-learning automated information technology change risk prediction
US-2024414064-A1 · Dec 12, 2024 · US
US9552435B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9552435-B2 |
| Application number | US-201113997257-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 22, 2011 |
| Priority date | Dec 22, 2010 |
| Publication date | Jan 24, 2017 |
| Grant date | Jan 24, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present application discloses methods and systems for incrementally collecting replies in a forum and belongs to the technical field of collecting network information. The method comprises periodically determining whether there is a newly-established post and a post having new replies in all forum list pages needed to be collected: if yes, extracting a main post and reply information from the newly-established post, and extracting the information of the new replies from the post having new replies. The system comprises a determining device ( 11 ) for periodically determining whether there is a newly-established post and a post having new replies in all forum list pages needed to be collected; and an extracting device ( 12 ) for extracting a main post and reply information from the newly-established post, and extracting the information of the new replies from the post having new replies. The present application can quickly, accurately and completely collect all main post/replies of a post, so that the drawback that the information of turned pages of a post are missed to be searched or cannot be searched through a general search engine may be overcome.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for incrementally collecting replies in a forum on a computer comprising a processor, the method comprising: determining, using the processor, whether there is a newly-established post or a post with new replies in a forum list page, according to a URL of a first page of the post and number of replies to the post; if it is determined that there is a newly-established post, extracting, using the processor, a main post of the newly-established post and reply information from the newly-established post; if it is determined that there is a post with new replies, calculating, using the processor, an origination and a number of the new replies to, based on the calculated origination and the calculated number, extract the new replies, wherein the determining further comprises: acquiring, using the processor, each URL of list page from a collection queue of list pages recording URLs of the at least one forum list page; retrieving, using the processor, the URL of the first page of each post and the number of current replies from webpage contents corresponding to the acquired URL; and determining, using the processor, if the post has been recorded in an information list of collected posts according to the retrieved URL of the first page, if not, determining, using the processor, that the post is a newly-established post, and the method further comprises: adding, using the processor, the retrieved URL of the first page and the retrieved number of current replies into an information list of collected posts. 2. The method according to claim 1 , wherein the determining further comprises: retrieving a URL of the first page of each post and the number of current replies from webpage contents corresponding to URLs of the forum list page; determining whether the post exists in an information list of collected posts according to the retrieved URL of the first page, and whether the retrieved current number of replies is larger than a number of present replies recorded in said information list, if yes, it is determined that the post has a new reply. 3. The method according to claim 2 , further comprising: adding the URL of the forum list page into a collection queue of forum list pages if a collection interval for the forum list page expires; retrieving URLs of list pages from the collection queue of forum list pages in a First-In-First-Out order. 4. The method according to claim 3 , wherein the collection interval is dynamically adjustable according to an update frequency of the forum of the URLs of list pages. 5. The method according to claim 3 , wherein the URLs retrieved from the collection queue of list pages meet a friendly access condition of the website of the retrieved URLs of list pages. 6. The method according to claim 2 , further comprising: adding the URL of the first page of the newly-established post or the URL of the post with new replies into a collection queue of content pages; extracting the main post and/or reply and/or URLs of turned pages from the webpage contents corresponding to URLs of the forum list page. 7. The method according to claim 6 , wherein, for the new-established post, if the URL of the first page of the post exists in the collection queue of content pages, the method further comprises: extracting the URL of the first page of the post; replacing a record of a number of present replies of the post in the information list of collected posts with the number of current replies; inserting the URL of the first page of the post into the collection queue of content pages. 8. The method according to claim 6 , wherein the retrieving of URLs of list pages from the collection queue of forum list pages comprises: acquiring the URLs of list pages from the collection queue of list pages in order of FIFO, the acquired URLs meeting a friendly access condition of the website of the URLs of list pages. 9. The method according to claim 6 , wherein extracting the main post and/or reply information from the webpage contents in step (iv) comprises: if the URL is the URL of the first page of the post and is collected for the first time, extracting the main post and reply information from the webpage contents corresponding to the URL; if the URL is the URL of the first page of the post but is not collected for the first time, calculating an origination of new replies S′ From and the number of new replies C′ ParseCount according to the following formulae, and extracting C′ ParseCount new replies from the origination of new replies S′ From . S From ′ = { R PreNum , N PerPage includes main post R PreNum + 1 , N PerPage does not include main post
using information identifiers, e.g. uniform resource locators [URL] · CPC title
Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking · CPC title
Office automation; Time management · CPC title
URL specific, e.g. using aliases, detecting broken or misspelled links · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.