Consumer purchasing and inventory control assistant apparatus, system and methods
US-12148022-B2 · Nov 19, 2024 · US
US10762437B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10762437-B2 |
| Application number | US-201615077563-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 22, 2016 |
| Priority date | Jun 19, 2015 |
| Publication date | Sep 1, 2020 |
| Grant date | Sep 1, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and Systems for automatic information extraction by performing self-learning crawling and rule-based data mining is provided. The method determines existence of crawl policy within input information and performs at least one of front-end crawling, assisted crawling and recursive crawling. Downloaded data set is pre-processed to remove noisy data and subjected to classification rules and decision tree based data mining to extract meaningful information. Performing crawling techniques leads to smaller relevant datasets pertaining to a specific domain from multi-dimensional datasets available in online and offline sources.
Opening claim text (preview).
What is claimed is: 1. A computer implemented method for automatic information extraction comprising: receiving a request for information extraction and retrieving input information from the request; determining existence of a crawl policy wherein such determination is performed on the input information retrieved from the request; performing assisted crawling, for the input information containing the crawl policy and performing recursive crawling for the input information not containing the crawl policy; computing valid paths and links for building a new crawl policy based on the assisted crawling and the recursive crawling, wherein the valid paths and links are computed recursively such that the links in a destination file and a web-page in a current crawling cycle matches with one or more previous crawling attempts; pre-processing dataset containing the new crawl policy obtained after the assisted crawling and the recursive crawling to remove noisy data and to obtain a pre-processed dataset; and subjecting the pre-processed relevant dataset to classification rules and decision tree based data mining to obtain extracted information. 2. The method of claim- 1 , wherein the request for automatic information extraction can be system based or user-based. 3. The method of claim- 1 , wherein the input information includes product information as at least one of search template and pattern template. 4. The method of claim- 3 , wherein the search template includes product names for which information is to be extracted. 5. The method of claim- 3 , wherein the pattern template includes patterns leading to destination web-pages. 6. The method of claim- 1 , wherein the assisted crawling is performed based on prioritization policy. 7. The method of claim- 1 , further comprising: determining whether the valid paths and links computed during recursive crawling need to be saved for future attempts of assisted crawling. 8. The method of claim- 1 , wherein a complexity of assisted crawling is determined according to an expression: Complexity: O ( n×m+n×k ), where O is a notation of complexity, n is a number of products to update, k is a number of web-pages downloaded and m is a complexity involved for crawling. 9. The method of claim- 1 , wherein a complexity of assisted crawling is a function of a number of products to update, a number of web-pages downloaded and a complexity involved for crawling. 10. The method of claim- 1 , wherein a complexity of recursive crawling is determined according to an expression: Complexity: O ( n×v 2 +n×k ), where O is a notation of complexity, n is a number of products to update, k is a number of web-pages downloaded and v is a number of hops required to reach the destination source for recursive crawling. 11. The method of claim- 1 , wherein complexity of recursive crawling is a function of the number of products to update, the number of web-pages downloaded and the number of hops required to reach the destination source for recursive crawling. 12. The method of claim- 1 , wherein the classification rules includes generic rules and specific rules. 13. The method of claim- 1 , wherein new rules can be formulated and added to classification rules. 14. The method of claim- 1 , further comprising intimating non-existence of the crawl policy. 15. The method of claim- 1 , further comprising: intimating any conflict caused due to one or more classification rules; and extracting information irrespective of type of file formats. 16. The method of claim- 1 , further comprising: receiving a request for information retrieval and retrieving input information from the request; providing the input information for performing front-end crawling; pre-processing dataset obtained after the front-end crawling to remove noisy data and to obtain the pre-processed dataset; and subjecting the pre-processed relevant dataset to classification rules and decision tree based data mining to extract information. 17. The method of claim- 16 , wherein the input information includes configuration files containing data dictionary for mapping data source. 18. The method of claim- 16 , wherein a complexity of front-end crawling is determined according to an expression: Complexity: O ( n×m+n×k ), where O is a notation of complexity, n is a number of products to update, k is a number of web-pages downloaded and m is a complexity involved for front-end crawling through website or a number of hops to arrive at destination source. 19. The method of claim- 16 , wherein a complexity of front-end crawling is a function of a number of products to update, a number of web-pages downloaded and a complexity involved for front-end crawling through website or a number of hops to arrive at destination source. 20. A computer implemented system for automatic information extraction comprising: an input module for receiving a request for information extraction and retrieving information from the request; a data source for information extraction; one or more processor configured to: responsive to the request for information extraction: determine the existence of a crawl policy, wherein such determination is performed within input information retrieved from the request; perform at least one of the front-end crawling, assisted crawling and recursive crawling on the input information; computing valid paths and links for building a new crawl policy based on the assisted crawling and the recursive crawling, wherein the valid paths and links are computed recursively such that the links in a destination file and a web-page in a current crawling cycle matches with one or more previous crawling attempts; pre-processing dataset containing the new crawl policy obtained after the assisted crawling and the recursive crawling to remove noisy data and to obtain a pre-processed dataset; an extractor to subject pre-processed data to classification rules and decision tree based data mining techniques; and an output module to provide extracted information. 21. The system of claim- 20 , wherein data source include online and offline sources. 22. A non-transitory computer readable medium embodying a program executable in a computing device for automatic information extraction, the program comprising: receiving a request for information extraction and retrieving input information from the request; determining existence of a crawl policy wherein such determination is performed on the input information retrieved from the request; performing assisted crawling, for the input information containing the crawl policy and performing recursive crawling for the input information not containing the crawl policy; computing valid paths and links for building a new crawl policy based on the assisted crawling and the recursive crawling, wherein the valid paths and links are computed recursively such that the links in a destination file and a web-page in a current crawling cycle matches with one or more previous crawling attempts; pre-processing dataset containing the new crawl policy obtained after the assisted crawling and the recursive crawling to remove noisy data and to obtain a pre-processed dataset; and subjecting the pre-processed relevant dataset to classification rules and decision tree based data mining to obtain extracted information.
Indexing; Web crawling techniques · CPC title
Machine learning · CPC title
Retrieval from the web · CPC title
Explanation of inference; Explainable artificial intelligence [XAI]; Interpretable artificial intelligence · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.