Discovery of new business openings using web content analysis

US12175483B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12175483-B2
Application numberUS-202318365775-A
CountryUS
Kind codeB2
Filing dateAug 4, 2023
Priority dateMar 12, 2013
Publication dateDec 24, 2024
Grant dateDec 24, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In general, embodiments of the present invention provide systems, methods and computer readable media for identifying a new business based on programmatically analyzing content received from online sources and, as a result, discovering one or more references to the business. In embodiments, the system stores historical data representing previously identified new businesses and then uses attributes of those businesses in search queries to receive related content. Additionally or alternatively, the system stores data representing online sources that historically provided content containing references to new businesses and then continues to access those sources for additional content. In embodiments, the system performs content analysis on structured and/or unstructured content. In some embodiments, analysis of content received from a particular online source includes a source-specific algorithm that takes a source-specific representation of the content as input and produces a result indicating the likelihood that the content includes a new business reference.

First claim

Opening claim text (preview).

That which is claimed: 1. An apparatus comprising at least one processor and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to: receive content data from an online source; determine, based at least in part on parsing the content data, that the content data contains one or more provider references associated with a provider that is a new business; responsive to determining that data representing the new business is not stored in a repository, store the data in the repository; and generate a confidence rating associated with the online source based at least in part on the one or more provider references and whether the new business was verified; and update a source search index based at least in part on the confidence rating. 2. The apparatus of claim 1 , wherein the parsing is based at least in part on a pattern recognition algorithm. 3. The apparatus of claim 2 , wherein the pattern recognition algorithm is selected based in part on determining whether the content data is structured content or unstructured content. 4. The apparatus of claim 2 , wherein the pattern recognition algorithm is a trainable function generated using machine learning. 5. The apparatus of claim 1 , wherein the content data is received based at least in part on the source search index. 6. The apparatus of claim 1 , wherein the source search index is updated based at least in part on based on source data quality signals. 7. The apparatus of claim 1 , further caused to maintain the source search index by pruning the source search index to remove online sources that have not included any further new business references within a predetermined period of time. 8. A non-transitory computer readable storage medium including computer program code that, when executed by a processor of an apparatus, cause the apparatus to: receive content data from an online source; determine, based at least in part on parsing the content data, that the content data contains one or more provider references associated with a provider that is a new business; responsive to determining that data representing the new business is not stored in a repository, store the data in the repository; and generate a confidence rating associated with the online source based at least in part on the one or more provider references and whether the new business was verified; and update a source search index based at least in part on the confidence rating. 9. The non-transitory computer readable storage medium of claim 8 , wherein the parsing is based at least in part on a pattern recognition algorithm. 10. The non-transitory computer readable storage medium of claim 9 , wherein the pattern recognition algorithm is selected based in part on determining whether the content data is structured content or unstructured content. 11. The non-transitory computer readable storage medium of claim 9 , wherein the pattern recognition algorithm is a trainable function generated using machine learning. 12. The non-transitory computer readable storage medium of claim 8 , wherein the content data is received based at least in part on the source search index. 13. The non-transitory computer readable storage medium of claim 8 , wherein the source search index is updated based at least in part on based on source data quality signals. 14. The non-transitory computer readable storage medium of claim 8 , wherein the apparatus is further caused to maintain the source search index by pruning the source search index to remove online sources that have not included any further new business references within a predetermined period of time. 15. A computer-implemented method, comprising: receiving content data from an online source; determining, based at least in part on parsing the content data, that the content data contains one or more provider references associated with a provider that is a new business; responsive to determining that data representing the new business is not stored in a repository, storing the data in the repository; generating a confidence rating associated with the online source based at least in part on the one or more provider references and whether the new business was verified; and updating a source search index based at least in part on the confidence rating. 16. The computer-implemented method of claim 15 , wherein the parsing is based at least in part on a pattern recognition algorithm. 17. The computer-implemented method of claim 16 , wherein the pattern recognition algorithm is selected based in part on determining whether the content data is structured content or unstructured content. 18. The computer-implemented method of claim 16 , wherein the pattern recognition algorithm is a trainable function generated using machine learning. 19. The computer-implemented method of claim 15 , wherein the content data is received based at least in part on the source search index. 20. The computer-implemented method of claim 15 , wherein the source search index is updated based at least in part on based on source data quality signals.

Assignees

Inventors

Classifications

  • Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals · CPC title

  • Indexing; Web crawling techniques · CPC title

  • Market modelling; Market analysis; Collecting market data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12175483B2 cover?
In general, embodiments of the present invention provide systems, methods and computer readable media for identifying a new business based on programmatically analyzing content received from online sources and, as a result, discovering one or more references to the business. In embodiments, the system stores historical data representing previously identified new businesses and then uses attribu…
Who is the assignee on this patent?
Bytedance Inc
What technology area does this patent fall under?
Primary CPC classification G06Q30/0201. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 24 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).