Discovery of new business openings using web content analysis

US10489800B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10489800-B2
Application numberUS-201715686724-A
CountryUS
Kind codeB2
Filing dateAug 25, 2017
Priority dateMar 12, 2013
Publication dateNov 26, 2019
Grant dateNov 26, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In general, embodiments of the present invention provide systems, methods and computer readable media for identifying a new business based on programmatically analyzing content received from online sources and, as a result, discovering one or more references to the business. In embodiments, the system stores historical data representing previously identified new businesses and then uses attributes of those businesses in search queries to receive related content. Additionally or alternatively, the system stores data representing online sources that historically provided content containing references to new businesses and then continues to access those sources for additional content. In embodiments, the system performs content analysis on structured and/or unstructured content. In some embodiments, analysis of content received from a particular online source includes a source-specific algorithm that takes a source-specific representation of the content as input and produces a result indicating the likelihood that the content includes a new business reference.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus for maintaining a source search index for use in automatically identifying references to a new business within published content returned from an online source over a network, the apparatus comprising at least one processor and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to: receive content from an online source; store data representing the online source in a source search index upon determining that the online source has provided content referencing at least one previously known new business, wherein determining that the online source has provided content referencing at least one previously known new business comprises identifying and extracting references to new businesses that are included in published content included in received search results by applying a pattern recognition algorithm that is configured to process one or more text patterns extracted from the published content, wherein a new business is one of a newly opened business or a business that is about to open; and verifying each of a set of new businesses in the extracted references by determining whether data representing the new business is already stored in a business repository; and update the source search index based on source data quality signals calculated as a result of new business verification by one of updating stored data in the source search index representing a known online source or storing data representing a newly discovered online source. 2. The apparatus of claim 1 , wherein the business repository is instantiated by storing data representing previously identified new businesses, wherein the seed data describing each previously identified new business includes at least one of the group of business attributes including business name, type of business, and business location. 3. The apparatus of claim 1 , wherein pattern recognition algorithm is selected based in part on determining whether the published content is structured content or unstructured content. 4. The apparatus of claim 3 , further caused to recognize source-specific data representation patterns based on identifying at least one of a group of text patterns including particular keywords or phrases, dates, a name of a chef, and a particular location. 5. The apparatus of claim 3 , wherein the pattern recognition algorithm is a trainable function generated using machine learning. 6. The apparatus of claim 1 , wherein calculating at least one source data quality signal for each online source is based at least on the extracted references and whether the new business reference was verified. 7. The apparatus of claim 6 , further caused to for each online source, update a confidence rating associated with the online source using the source data quality signal. 8. The apparatus of claim 1 , further caused to maintain the source search index by pruning the source search index to remove online sources that have not included any further new business references within a predetermined period of time. 9. The apparatus of claim 1 , further caused to calculate a confidence rating associated with an online source by: periodically receiving content data from the online source within a predetermined time period; calculating a total of references to different verified new businesses within the content data received within the time period; and calculating the confidence rating associated with the online source based in part on the total of references. 10. The apparatus of claim 1 , wherein the apparatus is caused to periodically crawl the online sources stored in the search index and pull in content data being published by the online sources. 11. A system for maintaining a search index for use in automatically identifying references to a new business within published content returned from an online source over a network, the system comprising at least one repository and at least one server, the at least one server having at least one processor and at least one memory including computer program code, the at least one memory and computer program code configured to, with the at least one processor, cause the system to: receive content from an online source; store data representing the online source in a source search index upon determining that the online source has provided content referencing at least one previously known new business, wherein determining that the online source has provided content referencing at least one previously known new business comprises identifying and extracting references to new businesses that are included in published content included in received search results by applying a pattern recognition algorithm that is configured to process one or more text patterns extracted from the published content, wherein a new business is one of a newly opened business or a business that is about to open; and verifying each of a set of new businesses in the extracted references by determining whether data representing the new business is already stored in a business repository; and update the source search index based on source data quality signals calculated as a result of new business verification by one of updating stored data in the source search index representing a known online source or storing data representing a newly discovered online source. 12. The system of claim 11 , wherein the business repository is instantiated by storing data representing previously identified new businesses, wherein the seed data describing each previously identified new business includes at least one of the group of business attributes including business name, type of business, and business location. 13. The system of claim 11 , wherein pattern recognition algorithm is selected based in part on determining whether the published content is structured content or unstructured content. 14. The system of claim 13 , further caused to recognize source-specific data representation patterns based on identifying at least one of a group of text patterns including particular keywords or phrases, dates, a name of a chef, and a particular location. 15. The system of claim 13 , wherein the pattern recognition algorithm is a trainable function generated using machine learning. 16. The system of claim 11 , wherein calculating at least one source data quality signal for each online source is based at least on the extracted references and whether the new business reference was verified. 17. The system of claim 16 , further caused to for each online source, update a confidence rating associated with the online source using the source data quality signal. 18. The system of claim 11 , further caused to maintain the source search index by pruning the source search index to remove online sources that have not included any further new business references within a predetermined period of time. 19. The system of claim 11 , further caused to calculate a confidence rating associated with an online source by: periodically receiving content data from the online source within a predetermined time period; calculating a total of references to different verified new businesses within the content data received within the time period; and calculating the confidence rating associated with the online source based in part on the total of references. 20. The system of claim 11 , caused to periodically crawl the online sources stored in the search index and pull

Assignees

Inventors

Classifications

  • Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals · CPC title

  • Indexing; Web crawling techniques · CPC title

  • Market modelling; Market analysis; Collecting market data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10489800B2 cover?
In general, embodiments of the present invention provide systems, methods and computer readable media for identifying a new business based on programmatically analyzing content received from online sources and, as a result, discovering one or more references to the business. In embodiments, the system stores historical data representing previously identified new businesses and then uses attribu…
Who is the assignee on this patent?
Groupon Inc
What technology area does this patent fall under?
Primary CPC classification G06Q30/0201. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 26 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).