Extracting webpage features using coded data packages for page heuristics

US12299053B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12299053-B2
Application numberUS-202117562779-A
CountryUS
Kind codeB2
Filing dateDec 27, 2021
Priority dateDec 27, 2021
Publication dateMay 13, 2025
Grant dateMay 13, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

There are provided systems and methods for extracting webpage features using coded data packages for page heuristics. A service provider server may provide website agnostic tools that account for differences in webpage layouts. This may be done using coded data packages designed to consider webpage heuristics of different webpages. These data packages include entries that have a term, a weight, and an optional scope for searching or filtering webpage elements in webpage document code for webpages. Using multiple entries in a data package, a decision may be returned of whether a webpage includes a certain feature, data, or element, as well as data for the element. The identified feature may be used for data extraction and/or determination, which may allow one or more applications and/or browser extensions to provide services across multiple different websites without specifically formulating the data packages for certain website styles.

First claim

Opening claim text (preview).

What is claimed is: 1. A service provider system comprising: a non-transitory memory; and one or more hardware processors coupled to the non-transitory memory and configured to read instructions from the non-transitory memory to cause the service provider system to perform operations comprising: receiving webpage data for a webpage accessible by a software operation of the service provider system, wherein the webpage comprises at least one webpage feature usable to provide at least one notification to a user via the software operation during a browsing of the webpage by the user, and wherein the webpage data includes a plurality of terms of the webpage; determining an intent for searching the webpage for the at least one webpage feature based on the plurality of terms and a corresponding identifier of each of the at least one webpage feature; accessing a coded data package for the webpage based on the intent, wherein the coded data package comprises webpage heuristics for a layout of the at least one webpage feature on a plurality of webpages; determining a weighted term and a webpage element attribute for the at least one webpage feature from the coded data package, wherein the weighted term is searched for in the webpage data based on the webpage element attribute; determining the at least one webpage feature based on the weighted term, the webpage element attribute, and the webpage data; and extracting data for the at least one webpage feature from the webpage. 2. The service provider system of claim 1 , wherein the webpage heuristics comprise a webpage shape corresponding to the intent that enables identification of the at least one webpage feature on the layout of the plurality of webpages using the weighted term and the webpage element attribute, and wherein the webpage heuristics enable the software operation to heuristically locate the at least one webpage feature using the webpage shape and HyperText Markup Language (HTML) code for the webpage. 3. The service provider system of claim 2 , wherein the at least one webpage feature is associated with at least one of a product title, a product name, a product description, a product price, or a product discount. 4. The service provider system of claim 1 , wherein the operations further comprise: detecting a navigation to another webpage of the plurality of webpages by a computing device of a user; and causing to be displayed, to the user on the computing device via the software operation, the extracted data with the other webpage. 5. The service provider system of claim 4 , wherein the operations further comprise: providing, using the extracted data, the software operation, and the coded data package, a comparison of a first product on the webpage to a second product on the other webpage. 6. The service provider system of claim 1 , wherein the coded data package further comprises filtering logic and at least one description of the at least one webpage feature on the plurality of webpages. 7. The service provider system of claim 1 , wherein the software operation is associated with one of a web browser application extension or a dedicated mobile application provided by the service provider system. 8. The service provider system of claim 1 , wherein the coded data package further comprises one or more operations to parse HTML code for the plurality of webpages to identify the at least one webpage feature on the plurality of webpages. 9. The service provider system of claim 1 , wherein the coded data package uses regular expression (regex) for identifications of the at least one webpage feature using the webpage heuristics for the plurality of webpages. 10. The service provider system of claim 1 , wherein the operations further comprise: validating a data extract operation specific to the webpage using the coded data package, wherein the validating confirms whether the data extraction operation extracts the data from the at least one webpage feature. 11. A method comprising: identifying a webpage accessible, via one or more computing devices, by one or more users during one or more uses of a service provider application or a service provider browser extension with the webpage, wherein the webpage comprises one or more items purchasable by the one or more users; determining an intent for searching the webpage for the one or more items based on the one or more uses; determining one or more webpage shape heuristics utilizable to search the webpage based on the intent, wherein the one or more shape heuristics each comprise a weighted term and a webpage element attribute associated with identifying the one or more items on the webpage; determining webpage feature layout data for the webpage from webpage computing code for the webpage; determining, using the one or more webpage shape heuristics, item data for the one or more items on the webpage; and storing the item data with an identifier for the webpage for the service provider application or the service provider browser extension. 12. The method of claim 11 , wherein the one or more uses comprises a browsing session of at least one other webpage for the one or more items further purchasable via the at least one other webpage. 13. The method of claim 12 , further comprising: presenting the item data to a user on a computing device of the user during the browsing session. 14. The method of claim 11 , further comprising: accessing another webpage via the service provider application or the service provider browser extension; and causing the item data to be displayed via the service provider application or the service provider browser extension with the other webpage. 15. The method of claim 11 , wherein the webpage comprises an online merchant marketplace for the one or more items, wherein the intent is associated with at least one of a title, a product, a description, or a price, and wherein the webpage feature layout data comprises one of Hypertext Markup Language (HTML) code, Extensible Markup Language (XML) code, or JavaScript code. 16. The method of claim 11 , further comprising: verifying at least one data extraction tool specific to the webpage using the one or more webpage shape heuristics. 17. A non-transitory machine-readable medium having stored thereon machine-readable instructions executable to cause a machine to perform operations comprising: extracting data for an item on a website of a merchant using at least one of a plurality of webpage heuristic data packages, wherein the data comprises at least one descriptive parameter of the item; storing the extracted data for the item with an identification of the website; receiving a request to view the item on the website during a usage of an application extension of a browser application, wherein the application extension provides an item comparison of the item to one or more other items; determining an intent for searching the webpage for item data for the item comparison; determining a weighted term and a webpage element attribute associated with the intent based on the at least one of the plurality of webpage heuristic data packages; determining the extracted data using the at least one of the plurality of webpage heuristic data packages, wherein the determining the extracted data comprises identifying the item using the weighted term and the webpage element attribute; and presenting the extracted data via the application extension through the browser application. 18. The non-transitory machine-readable medium of claim 17 , wherein the item is a first item and t

Assignees

Inventors

Classifications

  • Optimising the visualization of content, e.g. distillation of HTML documents · CPC title

  • Document structures and storage, e.g. HTML extensions · CPC title

  • G06F16/951Primary

    Indexing; Web crawling techniques · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • utilising user interfaces specially adapted for shopping · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12299053B2 cover?
There are provided systems and methods for extracting webpage features using coded data packages for page heuristics. A service provider server may provide website agnostic tools that account for differences in webpage layouts. This may be done using coded data packages designed to consider webpage heuristics of different webpages. These data packages include entries that have a term, a weight,…
Who is the assignee on this patent?
Paypal Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/951. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 13 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).