Service packaging method based on web page segmentation and search algorithm

US12050652B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12050652-B2
Application numberUS-201917614978-A
CountryUS
Kind codeB2
Filing dateNov 15, 2019
Priority dateMay 27, 2019
Publication dateJul 30, 2024
Grant dateJul 30, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present invention provides a service packaging method based on web page segmentation and search algorithm, comprising the following steps: a service extraction stage, comprising dynamic packaging and/or static packaging; for dynamic packaging, parsing a dynamic web page, tagging forms that possibly exist in parsed dynamic form information, and tagging and defining, by a user, desired forms among the forms that possibly exist; for static packaging, parsing a static web page, blocking and tagging parsed static forms, and selecting and defining, by the user, desired blocks, and filling in a name, description information and an extraction rule of a service; and a service calling stage, comprising inputting, by the user, related information for calling a service, and generating, by a back end system, a corresponding service according to the received related information for calling the service and according to the extraction rule, and returning the corresponding service to a front end. The present invention greatly increases the efficiency of acquiring data by a user.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented service packaging method based on web page segmentation and search algorithm, comprising following steps executed by a processor: conducting a service extraction stage comprising dynamic packaging and static packaging, wherein, for dynamic packaging, parsing a dynamic web page, tagging forms that exist in parsed dynamic form information, and tagging and defining, by a user, desired forms among the forms that exist; for static packaging, parsing a static web page, blocking and tagging parsed static forms, and selecting and defining, by the user, desired blocks, and filling in a name, description information and an extraction rule of a service; conducting a service calling stage, wherein the user inputs related information for calling a service, and generating, by a back end system, a respective service according to the received related information for calling the service and according to an extraction rule, and returning the service to a front end; and wherein the dynamic packaging comprises following steps: S1-1: parsing a dynamic web page, specifically comprising: S1-1-1: filling in, by a user, a Uniform Resource Locator (URL) address, the URL address being any web link accessible by Internet; S1-1-2: using crawler technology to crawl a source code of a web page corresponding to the URL address; S1-1-3: judging whether there is a <form> tag in a search page, converting the source code of the web page into a structured class data, and searching the <form> tag in the class data, and tagging the <form> tag; S1-1-4: constantly printing out parsed log information in a Graphical User Interface (GUI) display background; S1-1-5: using image processing technique to tag all form information that exist in the dynamic web page, and location of each input box and a submit button in each form; S1-2: selecting, by the user, a form and defining input parameter information, specifically comprising: S1-2-1: independently selecting, by the user, whether the user needs to use the form, if the user does, selecting the form number, and if the user doesn't, skipping this step; S1-2-2: independently defining, by the user, a name and a sample value of each input box, and selecting a number of the submit button; and S1-2-3: submitting the information modified by the user to a background, and generating, by the background, a form extraction rule based on the information; and wherein, the static packaging comprises: S1-3: parsing a static web page, specifically comprising: S1-3-1: using crawler technology to crawl a source code of a web page corresponding to a Uniform Resource Locator (URL) address; S1-3-2: using a breadth-first search algorithm to search all items that exist in the static web page; S1-3-3: using page segmentation algorithm to merge all items with the same structure into a block; S1-3-4: using a weighted sorting algorithm to screen out at most 10 largest blocks; S1-3-5: using image processing technology to tag the screened blocks; S1-3-6: constantly printing out parsed log information in a Graphical User Interface (GUI) display background; S1-4: selecting, by the user, the screened blocks and defining input parameter information; S1-4-1: independently selecting, by the user, a number of the screened blocks desired by the user; S1-4-2: defining, by the user, a name and description of data number in the screened blocks automatically analyzed by a system, and judging whether the screened blocks are desired; S1-4-3: filling in, by the user, a name and description information of a to-be-generated service; S1-4-4: submitting, by the system, the service information modified by the user and the extraction rule of each item to a service generation background in a JSON format; S1-5: generating a service; S1-5-1: parsing, by the service generation background, the service information and the extraction rule information, and checking fault tolerance; S1-5-2: generating, by the service generation background, the service desired by the user, and an address, and a query parameter corresponding to calling the service, for waiting for calling; wherein, a crawler tool of the crawler technology is Selenium+BeautifulSoup+Pyquery in Python™3.6. 2. The computer-implemented service packaging method according to claim 1 , wherein, the breadth-first search algorithm is as follows: generating a Document Object Model (DOM) tree structure of the web page, creating a traversal sequence list, putting HyperText Markup Language(HTML) nodes in the traversal sequence list, traversing the traversal sequence list sequentially, putting child nodes of each node at end of the list until all the nodes are traversed. 3. The computer-implemented service packaging method according to claim 1 , wherein, the weighted sorting algorithm is as follows: sorting a first 15 blocks to create a first list according to numbers of list items in each block from large to small; sorting another first 15 blocks to create a second list according to a block size of each block from large to small; and selecting intersection of the first and second lists and selecting first 10 blocks as a largest block finally selected. 4. The computer-implemented service packaging method according to claim 1 , wherein, a specific process of the service calling stage is: S2-1: filling in, by the user, a query parameter specified by the service, and calling Application Programming Interface (API); S2-2: opening, by a calling background, a real Uniform Resource Locator (URL) address corresponding to the API by using crawler technology according to an address of the API called by the user; S2-3: deciding, by the calling background, whether to fill in and query the form information according to the user's selection upon packaging the service; S2-4: using, by the calling background, crawler technology to crawl a source code of the web page after the form is processed; S2-5: extracting, by the system, related items in the page according to the stored extraction rule information, and performing structural conversion and generating a returned result according to a name and parameters of the returned result defined by the user; S2-6: performing screening, by the calling background, on the returned result according to the query parameter of the user; S2-7: returning, by the system, a calling result to the front end. 5. A computer-implemented service packaging method based on web page segmentation and search algorithm, comprising following steps executed by a processor: conducting a service extraction stage comprising dynamic packaging and static packaging, wherein, for dynamic packaging, parsing a dynamic web page, tagging forms that exist in parsed dynamic form information, and tagging and defining, by a user, desired forms among the forms that exist; for static packaging, parsing a static web page, blocking and tagging parsed static forms, and selecting and defining, by the user, desired blocks, and filling in a name, description information and an extraction rule of a service; conducting a service calling stage, wherein the user inputs related information for calling a service, and generating, by a back end system, a respective service according to the received related information for calling the service and according to an extraction rule, and returning the service to a front end; and wherein the dynamic packaging comprises following steps: S1-1: parsing a dynamic web page, specifically comprising: S1-1-1: filling in, by a user, a Uniform Resource Locator (URL) address, the URL address being any web link accessible by Internet; S1-1-2: using crawler technology to crawl a source code of a web page corresponding to the URL address; S1-1-3: judging whether there is a <form> tag in a search page, converting the s

Assignees

Inventors

Classifications

  • Presentation of query results · CPC title

  • Optimising the visualization of content, e.g. distillation of HTML documents · CPC title

  • using information identifiers, e.g. uniform resource locators [URL] · CPC title

  • G06F16/951Primary

    Indexing; Web crawling techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12050652B2 cover?
The present invention provides a service packaging method based on web page segmentation and search algorithm, comprising the following steps: a service extraction stage, comprising dynamic packaging and/or static packaging; for dynamic packaging, parsing a dynamic web page, tagging forms that possibly exist in parsed dynamic form information, and tagging and defining, by a user, desired forms …
Who is the assignee on this patent?
Univ Zhejiang
What technology area does this patent fall under?
Primary CPC classification G06F16/951. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 30 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).