Method and device for classifying webpages

US10909427B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10909427-B2
Application numberUS-201615740092-A
CountryUS
Kind codeB2
Filing dateMar 31, 2016
Priority dateJun 30, 2015
Publication dateFeb 2, 2021
Grant dateFeb 2, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and device for classifying webpages are provided. The method comprises: parsing a plurality of webpage elements from a webpage to be predicted; predicting a candidate webpage classification to which the webpage to be predicted belongs respectively according to respective webpage elements; and determining a final webpage classification of the webpage to be predicted by comparing the candidate webpage classifications predicted respectively based on the respective webpage elements.

First claim

Opening claim text (preview).

What is claimed is: 1. A device for classifying webpages, comprising: one or more processors; and a memory; wherein one or more programs are stored in the memory, and when executed by the one or more processors, the one or more programs cause the one or more processors to implement the following operations: creating respectively predicting models for predicting a webpage classification based on respective webpage elements; parsing a plurality of webpage elements from a webpage to be predicted; predicting a candidate webpage classification to which the webpage to be predicted belongs according to the predicting models corresponding to the respective webpage elements; and extracting a candidate webpage classification for which text similarity compared with the webpage to be predicted meets a precondition as a final webpage classification to which the webpage to be predicted belongs, wherein the creating respectively predicting models for predicting the webpage classification based on the respective webpage elements comprises: mining a plurality of webpage classifications and queries which belong to the webpage classifications based on search logs; and creating the predicting models for predicting the webpage classification based on the respective webpage elements according to the webpage elements and the related queries in the search logs; wherein in a case that the webpage elements comprise a root domain name, the predicting models comprise a first predicting model for predicting the webpage classification based on the root domain name, and the creating the predicting models for predicting the webpage classification based on the respective webpage elements according to the webpage elements and the related queries in the search logs comprises: extracting root domain names of historical webpages accessed correspondingly in the search logs; recording queries corresponding to the respective root domain names according to the historical webpages and queries for triggering the historical webpages in the search logs; calculating a first probability of the root domain name belonging to respective webpage classifications according to webpage classifications to which the queries belongs; and create the first predicting model comprising a judgment condition that the root domain name belongs to respective webpage classifications based on the first probability; the predicting a candidate webpage classification to which the webpage to be predicted belongs according to the predicting models corresponding to the respective webpage elements comprises: extracting a root domain name of the target webpage and inputting the root domain name of the target webpage into the first predicting model; and taking a webpage classification as the candidate webpage classification to which the target webpage belongs, in a case that the first predicting model determines that the first probability of which the root domain name of the target webpage belongs to the webpage classifications is greater than a first target probability. 2. A device for classifying webpages, comprising: one or more processors; and a memory; wherein one or more programs are stored in the memory, and when executed by the one or more processors, the one or more programs cause the one or more processors to implement the following operations: creating respectively predicting models for predicting a webpage classification based on respective webpage elements; parsing a plurality of webpage elements from a webpage to be predicted; predicting a candidate webpage classification to which the webpage to be predicted belongs according to the predicting models corresponding to the respective webpage elements; and extracting a candidate webpage classification for which text similarity compared with the webpage to be predicted meets a precondition as a final webpage classification to which the webpage to be predicted belongs, wherein the creating respectively predicting models for predicting the webpage classification based on the respective webpage elements comprises: mining a plurality of webpage classifications and queries which belong to the webpage classifications based on search logs; and creating the predicting models for predicting the webpage classification based on the respective webpage elements according to the webpage elements and the related queries in the search logs; wherein in a case that the webpage elements comprise a webpage title, the predicting models comprise a second predicting model for predicting the webpage classification based on the webpage title, and the creating the predicting models for predicting the webpage classification based on the respective webpage elements according to the webpage elements and the related queries in the search logs comprises: generating a first inverted index for retrieving the webpage classification based on a query according to the webpage classification and the query belonging to the webpage classification, and creating the second predicting model comprising the first inverted index; the predicting a candidate webpage classification to which the webpage to be predicted belongs according to the predicting models corresponding to the respective webpage element comprises: the candidate predicting module comprises: extracting a query comprised in the webpage title of the target webpage and inputting the query comprised in the webpage title of the target webpage into the second predicting model; and finding by the second predicting model a webpage classification corresponding to the query according to the first inverted index, and taking the found webpage classification as the candidate webpage classification to which the target webpage belongs. 3. A device for classifying webpages, comprising: one or more processors; and a memory; wherein one or more programs are stored in the memory, and when executed by the one or more processors, the one or more programs cause the one or more processors to implement the following operations: creating respectively predicting models for predicting a webpage classification based on respective webpage elements; parsing a plurality of webpage elements from a webpage to be predicted; predicting a candidate webpage classification to which the webpage to be predicted belongs according to the predicting models corresponding to the respective webpage elements; and extracting a candidate webpage classification for which text similarity compared with the webpage to be predicted meets a precondition as a final webpage classification to which the webpage to be predicted belongs, wherein the creating respectively predicting models for predicting the webpage classification based on the respective webpage elements comprises: mining a plurality of webpage classifications and queries which belong to the webpage classifications based on search logs; and creating the predicting models for predicting the webpage classification based on the respective webpage elements according to the webpage elements and the related queries in the search logs; wherein in a case that the webpage elements comprise a webpage title, the predicting models comprise a second predicting model for predicting the webpage classification based on the webpage title, and the creating the predicting models for predicting the webpage classification based on the respective webpage elements according to the webpage elements and the related queries in the search logs comprises: adding pre-collected queries into respective webpage classifications according to the queries belonging to the webpage classification, generating a second inverted index for retrieving the webpage classification based on the added queries, and creating the second predicting model comprising the second inverted index; the predicting a candidate webpage classification to

Assignees

Inventors

Classifications

  • Bayesian classification · CPC title

  • G06F16/951Primary

    Indexing; Web crawling techniques · CPC title

  • Distances to closest patterns, e.g. nearest neighbour classification · CPC title

  • Tree-structured documents (parsing G06F40/205; validation G06F40/226) · CPC title

  • Control and interface arrangements therefor, e.g. drivers or device-embedded control circuitry · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10909427B2 cover?
A method and device for classifying webpages are provided. The method comprises: parsing a plurality of webpage elements from a webpage to be predicted; predicting a candidate webpage classification to which the webpage to be predicted belongs respectively according to respective webpage elements; and determining a final webpage classification of the webpage to be predicted by comparing the can…
Who is the assignee on this patent?
Beijing Qihoo Technology Co, Qizhi Software Beijing Co Ltd, Beijing Qihoo Techology Company Ltd
What technology area does this patent fall under?
Primary CPC classification G06F16/951. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 02 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).