Discovery of new business openings using web content analysis
US-9773252-B1 · Sep 26, 2017 · US
US10489800B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10489800-B2 |
| Application number | US-201715686724-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 25, 2017 |
| Priority date | Mar 12, 2013 |
| Publication date | Nov 26, 2019 |
| Grant date | Nov 26, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In general, embodiments of the present invention provide systems, methods and computer readable media for identifying a new business based on programmatically analyzing content received from online sources and, as a result, discovering one or more references to the business. In embodiments, the system stores historical data representing previously identified new businesses and then uses attributes of those businesses in search queries to receive related content. Additionally or alternatively, the system stores data representing online sources that historically provided content containing references to new businesses and then continues to access those sources for additional content. In embodiments, the system performs content analysis on structured and/or unstructured content. In some embodiments, analysis of content received from a particular online source includes a source-specific algorithm that takes a source-specific representation of the content as input and produces a result indicating the likelihood that the content includes a new business reference.
Opening claim text (preview).
What is claimed is: 1. An apparatus for maintaining a source search index for use in automatically identifying references to a new business within published content returned from an online source over a network, the apparatus comprising at least one processor and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to: receive content from an online source; store data representing the online source in a source search index upon determining that the online source has provided content referencing at least one previously known new business, wherein determining that the online source has provided content referencing at least one previously known new business comprises identifying and extracting references to new businesses that are included in published content included in received search results by applying a pattern recognition algorithm that is configured to process one or more text patterns extracted from the published content, wherein a new business is one of a newly opened business or a business that is about to open; and verifying each of a set of new businesses in the extracted references by determining whether data representing the new business is already stored in a business repository; and update the source search index based on source data quality signals calculated as a result of new business verification by one of updating stored data in the source search index representing a known online source or storing data representing a newly discovered online source. 2. The apparatus of claim 1 , wherein the business repository is instantiated by storing data representing previously identified new businesses, wherein the seed data describing each previously identified new business includes at least one of the group of business attributes including business name, type of business, and business location. 3. The apparatus of claim 1 , wherein pattern recognition algorithm is selected based in part on determining whether the published content is structured content or unstructured content. 4. The apparatus of claim 3 , further caused to recognize source-specific data representation patterns based on identifying at least one of a group of text patterns including particular keywords or phrases, dates, a name of a chef, and a particular location. 5. The apparatus of claim 3 , wherein the pattern recognition algorithm is a trainable function generated using machine learning. 6. The apparatus of claim 1 , wherein calculating at least one source data quality signal for each online source is based at least on the extracted references and whether the new business reference was verified. 7. The apparatus of claim 6 , further caused to for each online source, update a confidence rating associated with the online source using the source data quality signal. 8. The apparatus of claim 1 , further caused to maintain the source search index by pruning the source search index to remove online sources that have not included any further new business references within a predetermined period of time. 9. The apparatus of claim 1 , further caused to calculate a confidence rating associated with an online source by: periodically receiving content data from the online source within a predetermined time period; calculating a total of references to different verified new businesses within the content data received within the time period; and calculating the confidence rating associated with the online source based in part on the total of references. 10. The apparatus of claim 1 , wherein the apparatus is caused to periodically crawl the online sources stored in the search index and pull in content data being published by the online sources. 11. A system for maintaining a search index for use in automatically identifying references to a new business within published content returned from an online source over a network, the system comprising at least one repository and at least one server, the at least one server having at least one processor and at least one memory including computer program code, the at least one memory and computer program code configured to, with the at least one processor, cause the system to: receive content from an online source; store data representing the online source in a source search index upon determining that the online source has provided content referencing at least one previously known new business, wherein determining that the online source has provided content referencing at least one previously known new business comprises identifying and extracting references to new businesses that are included in published content included in received search results by applying a pattern recognition algorithm that is configured to process one or more text patterns extracted from the published content, wherein a new business is one of a newly opened business or a business that is about to open; and verifying each of a set of new businesses in the extracted references by determining whether data representing the new business is already stored in a business repository; and update the source search index based on source data quality signals calculated as a result of new business verification by one of updating stored data in the source search index representing a known online source or storing data representing a newly discovered online source. 12. The system of claim 11 , wherein the business repository is instantiated by storing data representing previously identified new businesses, wherein the seed data describing each previously identified new business includes at least one of the group of business attributes including business name, type of business, and business location. 13. The system of claim 11 , wherein pattern recognition algorithm is selected based in part on determining whether the published content is structured content or unstructured content. 14. The system of claim 13 , further caused to recognize source-specific data representation patterns based on identifying at least one of a group of text patterns including particular keywords or phrases, dates, a name of a chef, and a particular location. 15. The system of claim 13 , wherein the pattern recognition algorithm is a trainable function generated using machine learning. 16. The system of claim 11 , wherein calculating at least one source data quality signal for each online source is based at least on the extracted references and whether the new business reference was verified. 17. The system of claim 16 , further caused to for each online source, update a confidence rating associated with the online source using the source data quality signal. 18. The system of claim 11 , further caused to maintain the source search index by pruning the source search index to remove online sources that have not included any further new business references within a predetermined period of time. 19. The system of claim 11 , further caused to calculate a confidence rating associated with an online source by: periodically receiving content data from the online source within a predetermined time period; calculating a total of references to different verified new businesses within the content data received within the time period; and calculating the confidence rating associated with the online source based in part on the total of references. 20. The system of claim 11 , caused to periodically crawl the online sources stored in the search index and pull
Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals · CPC title
Indexing; Web crawling techniques · CPC title
Market modelling; Market analysis; Collecting market data · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.