Content item selection for goal achievement
US-12175387-B2 · Dec 24, 2024 · US
US10311120B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10311120-B2 |
| Application number | US-201514627311-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 20, 2015 |
| Priority date | Aug 22, 2012 |
| Publication date | Jun 4, 2019 |
| Grant date | Jun 4, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Various embodiments provide a method and an apparatus for identifying webpage type. The method includes: judging whether a web address to be classified matches with a webpage classification rule in at least two webpage classification rules; and determining the type of the webpage to be a type corresponding to a webpage classification rule which matches with the web address.
Opening claim text (preview).
The invention claimed is: 1. A method for identifying webpage type, comprising: at a device having a processor and a screen, reading pre-stored web addresses of a webpage type, obtaining a collection of string components of the web addresses by parsing the web addresses; converging web addresses having at least one identical string component into one group according to a pre-defined converging method to generate multiple groups; determining that a coverage rate of a group meets a requirement in response to a determination that a total number of webpages in the group is smaller than or equal to a first threshold and determining that an identification accuracy of the group meets the requirement in response to a determination that an entropy is smaller than a second threshold; determining the coverage rate and the identification accuracy of the group do not meet the requirement in response to a determination that the total number of webpages in the group is larger than the first threshold or the entropy is larger than or equal to a second threshold; wherein the entropy satisfies E=sum(pi*log(pi)), i=1, 2 . . . , n, wherein n is the total number of webpages in the group, pi is a probability of webpages of a same type occurring in the group; terminating converging in response to the determination that the coverage rate and the identification accuracy meet the requirement; generating a webpage classification rule using the multiple groups and the webpage type, and storing the webpage classification rule into a webpage classification rule base; judging whether a web address of a webpage to be classified matches a webpage classification rule; determining a type of the webpage to be a type corresponding to a webpage classification rule which matches the web address; in response to a judgment that the web address of the webpage to be classified does not match the webpage classification rule, using a classifier trained using a machine learning algorithm based on web addresses to determine the webpage type of the webpage to be classified; extracting a content of the webpage selectively according to the webpage type; and displaying, on the screen, the content to a user in a pre-defined manner corresponding to the webpage type. 2. The method of claim 1 , wherein the webpage classification rule comprises a string expression associated with a webpage type, the string expression is extracted from a plurality of first web addresses pre-classified into the webpage type, the string expression describes characteristics shared by the plurality of first web addresses and comprises a first description string component describing a domain name and at least a second description string component describing at least one web address string component sequentially following the domain name in the plurality of first web addresses. 3. The method of claim 2 , wherein judging whether the web address of the webpage to be classified matches the webpage classification rule in at least two webpage classification rule components comprises: extracting domain name and another string component from the web address of the webpage to be classified; and judging whether the extracted domain name and string components matches string expression corresponding to the webpage classification rule. 4. The method of claim 3 , wherein judging whether the extracted domain name and string components matches string expression corresponding to the webpage classification rule comprises: determining whether the domain name matches the first description string component of the webpage classification rule; and determining whether the another string component matches the second description string component of the webpage classification rule. 5. The method of claim 3 , further comprising: storing the web address and the webpage type of the web page to be classified in response to a determination that the domain name and the another string component of the web address of the webpage to be classified match the string expression of the webpage classification rule. 6. The method of claim 2 , wherein extracting the string expression comprises: extracting a description of shared characteristics of web address string components in each of at least one tier of the plurality of first web addresses sequentially starting from a tier for domain names of the plurality of first web addresses to obtain at least one description corresponding to the at least one tier; sequentially arranging the at least one description corresponding to the at least one tier according to an order of the at least one tier arranged in the plurality of first web addresses to obtain the string expression. 7. The method of claim 6 , wherein extracting the description of shared characteristics of the web address string components in each of at least one tier of the plurality of first web addresses comprises: arranging the plurality of first web addresses into a tree where a node for a web address string component of a first tier which follows a second tier in web addresses serves as a child node of a node for a web address string component of the second tier; and converging a plurality of nodes at a same tier having a same parent node into one node whose value is a description of shared characteristics of the plurality of nodes; wherein the string expression comprises a value of each node which is an only child node of a parent node, the parent node is a node for domain name or a descendant node of the node for domain name. 8. The method of claim 6 , wherein extracting the description of shared characteristics of web address string components comprises: converting each of the web address string components into a string according to a pre-determined converting method; and in response to a determination that the web address string components are converted into a same string, determining the string to be the description of shared characteristics of the web address string components. 9. An apparatus for identifying webpage type, comprising: at least one processor; a display screen; and memory for storing computer-readable instructions, wherein the at least one processor, when executing the computer-readable instructions, is configured to: read pre-stored web addresses of a webpage type, obtaining a collection of string components of the web addresses by parsing the web addresses; converge web addresses having at least one identical string component into one group according to a pre-defined converging method to generate multiple groups; determine that a coverage rate of a group meets a requirement in response to a determination that a total number of webpages in the group is smaller than or equal to a first threshold and determining that an identification accuracy of the group meets the requirement in response to a determination that an entropy is smaller than a second threshold; determine the coverage rate and the identification accuracy of the group do not meet the requirement in response to a determination that the total number of webpages in the group is larger than the first threshold or the entropy is larger than or equal to a second threshold; wherein the entropy satisfies E=sum(pi*log(pi)), i=1, 2 . . . , n, wherein n is the total number of webpages in the group, pi is a probability of webpages of a same type occurring in the group; terminate converging in response to the determination that the coverage rate and the identification accuracy meet the requirement; generate a webpage classification rule using the multiple groups and the webpage type, and storing the webpage classification rule into a webpage classification rule base; judge whether a web address of a webpage to be classified matches a we
Market segmentation · CPC title
Navigation, e.g. using categorised browsing · CPC title
Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking · CPC title
Retrieval from the web · CPC title
using information identifiers, e.g. uniform resource locators [URL] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.