Search system for providing web crawling query prioritization based on classification operation performance
US-10949475-B2 · Mar 16, 2021 · US
US11636164B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11636164-B2 |
| Application number | US-202117200501-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 12, 2021 |
| Priority date | May 14, 2018 |
| Publication date | Apr 25, 2023 |
| Grant date | Apr 25, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Various methods and systems for processing web crawling queries using a web crawling prioritization model based on classification operation performance. A classification operation for organizing products in a product listing platform is accessed. A web crawling engine is accessed for the classification operation. The web crawling engine operates based on a web crawling query prioritization model that supports determining web crawling priority scores that indicate a predicted performance improvement for classification operations executed with known data and web crawled data to be retrieved from executing a web crawling query operation. Using the web crawling prioritization model, a web crawling priority score is determined for a web crawling query for the corresponding classification operation. The classification operation is associated with a product in a product listing platform and known data for the product. Based on the web crawling priority score, the web crawling query is executed to identify web crawled data.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented for processing web crawling queries, the method comprising: accessing a web crawling engine, the web crawling engine operates based on a web crawling query prioritization model that supports determining web crawling priority scores for executing web crawling query operations; using the web crawling prioritization model, determining a web crawling priority score for a web crawling query for a corresponding classification operation, wherein the web crawling priority score indicates a predicted performance improvement for the classification operation, the performance improvement is a predicted improvement of executing the classification operation with the known data and web crawled data to be retrieved by executing the web crawling query over the classification operation executed with only the known data; and based on the web crawling priority score, executing the web crawling query to identify web crawled data. 2. The method of claim 1 , wherein, based on executing the web crawling query, the classification operation is executed with the known data and the web crawled data from executing the web crawling query. 3. The method of claim 1 , wherein the web crawling priority score is determined as a function of a known data score and a known data and web crawled data score for the classification operation. 4. The method of claim 1 , wherein the web crawling query prioritization model is a machine-learning model that is trained, for a selected classification operation, based on a classification-regression technique that implements a first classifier, a second classifier, and a regressor. 5. The method of claim 1 , wherein the web crawling query prioritization model is a machine-learning model that is trained, for a selected classification operation, based on a classification-classification technique that implements a first classifier, a second classifier and a third classifier. 6. The method of claim 1 , wherein the classification operation is associated with a product in a product listing platform and known data for the product. 7. The method of claim 6 , wherein the classification operation is associated with organizing products in the product listing platform for one of the following: product deduplication, product adoption, product attribute extraction, or product quality determination. 8. One or more computer storage media having computer-executable instructions embodied thereon that, when executed, by one or more processors, cause the one or more processors to perform a method for processing web crawling queries, the method comprising: accessing a web crawling engine, the web crawling engine operates based on a web crawling query prioritization model that supports determining web crawling priority scores for executing web crawling query operations; using the web crawling prioritization model, determining a web crawling priority score for a web crawling query for a corresponding classification operation, wherein the web crawling priority score indicates a predicted performance improvement for the classification operation, the performance improvement is a predicted improvement of executing the classification operation with the known data and web crawled data to be retrieved by executing the web crawling query over the classification operation executed with only the known data; and based on the web crawling priority score, processing the web crawling query based on a web crawling query processing operation corresponding to the web crawling score. 9. The computer storage media of claim 8 , wherein, for the web crawling priority score, the web crawling processing operation is configured to execute the web crawling query to identify web crawled data, such that the classification operation is executed with the known data and web crawled data from executing the web crawling query. 10. The computer storage media of claim 8 , wherein, for the web crawling priority score, the web crawling processing operation is configured not to execute the web crawling query to identify web crawled data, such that the classification operation is executed with only the known data. 11. The computer storage media of claim 8 , wherein the web crawling priority score is determined as a function of a known data score and a known data and web crawled data score for the classification operation. 12. The computer storage media of claim 8 , wherein the web crawling query prioritization model is a machine-learning model that is trained, for a selected classification operation, based on a classification-regression technique that implements a first classifier, a second classifier, and a regressor. 13. The computer storage media of claim 8 , wherein the web crawling query prioritization model is a machine-learning model that is trained, for a selected classification operation, based on a classification-classification technique that implements a first classifier, a second classifier and a third classifier. 14. The computer storage media of claim 8 , wherein the classification operation is associated with a product in a product listing platform and known data for the product, wherein the classification operation is associated with organizing products in the product listing platform for one of the following: product deduplication, product adoption, product attribute extraction, or product quality determination. 15. A search system for processing web crawling queries, the system comprising: one or more processors; and one or more computer storage media storing computer-useable instructions that, when used by the one or more processors, cause the one or more processors to execute: a web crawling engine: access a web crawling engine, the web crawling engine operates based on a web crawling query prioritization model that supports determining web crawling priority scores for executing web crawling query operations; use the web crawling prioritization model to determine a web crawling priority score for a web crawling query for a corresponding classification operation, wherein the web crawling priority score indicates a predicted performance improvement for the classification operation, the performance improvement is a predicted improvement of executing the classification operation with the known data and web crawled data to be retrieved by executing the web crawling query over the classification operation executed with only the known data; and based on the web crawling priority score, process the web crawling query based on a web crawling query processing operation corresponding to the web crawling priority score. 16. The system of claim 15 , wherein, based on the web crawling priority score, the web crawling query processing operation is selected from the following: executing the web crawling query, not executing the web crawling, or delaying execution of the web crawling query. 17. The system of claim 15 , wherein, for the web crawling priority score, the web crawling processing operation is configured to execute the web crawling query to identify web crawled data, such that the classification operation is executed with the known data and web crawled data from executing the web crawling query; or wherein, for the web crawling priority score, the web crawling processing operation is configured not to execute the web crawling query to identify web crawled data, such that the classification operation is executed with only the known data. 18. The system of claim 15 , wherein the web crawling query prioritization model is a machine-learning model that is t
Learning methods · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
of access to content, e.g. by caching · CPC title
Indexing; Web crawling techniques · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.