Search system for providing web crawling query prioritization based on classification operation performance

US10949475B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10949475-B2
Application numberUS-201815979170-A
CountryUS
Kind codeB2
Filing dateMay 14, 2018
Priority dateMay 14, 2018
Publication dateMar 16, 2021
Grant dateMar 16, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Various methods and systems for processing web crawling queries using a web crawling prioritization model based on classification operation performance. A classification operation for organizing products in a product listing platform is accessed. A web crawling engine is accessed for the classification operation. The web crawling engine operates based on a web crawling query prioritization model that supports determining web crawling priority scores that indicate a predicted performance improvement for classification operations executed with known data and web crawled data to be retrieved from executing a web crawling query operation. Using the web crawling prioritization model, a web crawling priority score is determined for a web crawling query for the corresponding classification operation. The classification operation is associated with a product in a product listing platform and known data for the product. Based on the web crawling priority score, the web crawling query is executed to identify web crawled data.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented for processing web crawling queries, the method comprising: accessing a web crawling engine, the web crawling engine operates based on a web crawling query prioritization model that supports determining web crawling priority scores that indicate corresponding predicted performance improvements for classification operations executed with known data and web crawled data to be retrieved from executing a web crawling query operation; using the web crawling prioritization model, determining a web crawling priority score for a web crawling query for a corresponding classification operation associated with a performance improvement, wherein the predicted performance improvement for the classification operation is a predicted improvement of executing the classification operation with the known data and web crawled data to be retrieved by executing the web crawling query over the classification operation executed with only the known data, wherein the classification operation is associated with a product in a product listing platform and known data for the product; and based on the web crawling priority score, executing the web crawling query to identify web crawled data. 2. The method of claim 1 , wherein, based on executing the web crawling query, the classification operation is executed with the known data and the web crawled data from executing the web crawling query. 3. The method of claim 1 , wherein the web crawling priority score is determined as a function of a known data score and a known data and web crawled data score for the classification operation. 4. The method of claim 1 , wherein the web crawling query prioritization model is a machine-learning model that is trained, for a selected classification operation, based on a classification-regression technique that implements a first classifier, a second classifier, and a regressor. 5. The method of claim 1 , wherein the web crawling query prioritization model is a machine-learning model that is trained, for a selected classification operation, based on a classification-classification technique that implements a first classifier, a second classifier and a third classifier. 6. The method of claim 1 , wherein web crawling priority score are further generated as a function of a hyperparameter and a ground-truth label. 7. The method of claim 1 , wherein the classification operation is associated with organizing products in the product listing platform for one of the following: product deduplication, product adoption, product attribute extraction, or product quality determination. 8. One or more computer storage media having computer-executable instructions embodied thereon that, when executed, by one or more processors, cause the one or more processors to perform a method for processing web crawling queries, the method comprising: accessing a web crawling engine, the web crawling engine operates based on a web crawling query prioritization model that supports determining web crawling priority scores that indicate corresponding predicted performance improvements for classification operations executed with known data and web crawled data to be retrieved from executing a web crawling query operation; using the web crawling prioritization model, determining a web crawling priority score for a web crawling query for a corresponding classification operation associated with a performance improvement, wherein the predicted performance improvement for the classification operation is a predicted improvement of executing the classification operation with the known data and web crawled data to be retrieved by executing the web crawling query over the classification operation executed with only the known data, wherein the classification operation is associated with a product in a product listing platform and known data for the product; and based on the web crawling priority score, processing the web crawling query based on a web crawling query processing operation corresponding to the web crawling score. 9. The computer storage media of claim 8 , wherein, for the web crawling priority score, the web crawling processing operation is configured to execute the web crawling query to identify web crawled data, such that the classification operation is executed with the known data and web crawled data from executing the web crawling query. 10. The computer storage media of claim 8 , wherein, for the web crawling priority score, the web crawling processing operation is configured not to execute the web crawling query to identify web crawled data, such that the classification operation is executed with only the known data. 11. The computer storage media of claim 8 , wherein the web crawling priority score is determined as a function of a known data score and a known data and web crawled data score for the classification operation. 12. The computer storage media of claim 8 , wherein the web crawling query prioritization model is a machine-learning model that is trained, for a selected classification operation, based on a classification-regression technique that implements a first classifier, a second classifier, and a regressor. 13. The computer storage media of claim 8 , wherein the web crawling query prioritization model is a machine-learning model that is trained, for a selected classification operation, based on a classification-classification technique that implements a first classifier, a second classifier and a third classifier. 14. The computer storage media of claim 8 , wherein the classification operation is associated with organizing products in the product listing platform for one of the following: product deduplication, product adoption, product attribute extraction, or product quality determination. 15. A search system for processing web crawling queries, the system comprising: one or more processors; and one or more computer storage media storing computer-useable instructions that, when used by the one or more processors, cause the one or more processors to execute: a web crawling engine: access a web crawling query prioritization model that supports determining web crawling priority scores that indicate corresponding predicted performance improvements for classification operations executed with known data and web crawled data to be retrieved from executing a web crawling query operation; determine a web crawling priority score for a web crawling query for a corresponding classification operation associated with a performance improvement, wherein the predicted performance improvement for the classification operation is a predicted improvement of executing the classification operation with the known data and web crawled data to be retrieved by executing the web crawling query over the classification operation executed with only the known data, wherein the classification operation is associated with a product in a product listing platform and known data for the product; and based on the web crawling score, process the web crawling query based on a web crawling query processing operation corresponding to the web crawling priority score. 16. The system of claim 15 , wherein, based on the web crawling priority score, the web crawling query processing operation is selected from the following: executing the web crawling query, not executing the web crawling, or delaying execution of the web crawling query. 17. The system of claim 15 , wherein, for the web crawling priority score, the web crawling processing operation is configured to execute the web crawling query to identify web crawled data

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Learning methods · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • of access to content, e.g. by caching · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10949475B2 cover?
Various methods and systems for processing web crawling queries using a web crawling prioritization model based on classification operation performance. A classification operation for organizing products in a product listing platform is accessed. A web crawling engine is accessed for the classification operation. The web crawling engine operates based on a web crawling query prioritization mode…
Who is the assignee on this patent?
Ebay Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/951. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 16 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).