System and method of search validation

US10073919B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10073919-B2
Application numberUS-29620808-A
CountryUS
Kind codeB2
Filing dateApr 10, 2008
Priority dateApr 10, 2007
Publication dateSep 11, 2018
Grant dateSep 11, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of validating results of a host search engine ( 50 ), the method including the steps of scanning all data objects deliverable via a web interface with a scanning engine ( 25 ) and executing a matching engine ( 35 ) to generate a report set containing content missed by the host search engine ( 50 ).

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of validating results of a client search engine, the method comprising: scanning, by a processor of a server, web pages of a client web-site; identifying, from the scanning, data objects including words on the web pages of the client web-site; creating, by the processor, an index of the data objects identified in the scanning of the web pages of the client web-site, wherein the index of the data objects includes the web pages of the client web-site associated with the data objects; selecting, by the processor, unique words from the index of the data objects by finding words in the index that have a low frequency of occurrence on the web pages compared to other words on the web pages and identifying the low-frequency words as the unique words; sending, by the processor, the unique words selected from the index of the data objects to the client search engine, wherein the client search engine inserts the unique words selected from the index of the data objects into a search field of the client search engine, and generates a search result, including web pages of the client web-site, based on the inserted unique words; receiving, by the processor, the search result including the web pages located by the client search engine; comparing, by the processor, the web pages in the search result located by the client search engine to the web pages of the client web-site recorded in the index of the data objects from which the unique words were selected; based on the comparison, identifying web pages of the client web-site recorded in the index that are not located by the client search engine; and generating a report containing the web pages of the client web-site that were not located by the client search engine. 2. The method according to claim 1 , wherein the data objects further include images and links on the web pages of the client website. 3. The method according to claim 1 , wherein the contents of the index are organized in an order according to the uniqueness of the data objects. 4. The method according to claim 3 , wherein the uniqueness of the data objects is determined by a density analysis of the data objects. 5. The method according to claim 4 , wherein the data objects are words and the density analysis further takes into account proximity to other topics. 6. The method according to claim 1 , wherein the unique words executed by the client search engine comprise a set of the identified data objects. 7. The method according to claim 6 , comprising: determining unique data objects as unique words from the data objects identified from the scanning; and submitting the unique words, including the set of the identified data objects, to the client search engine, wherein the set of the identified data objects is the unique data objects. 8. The method according to claim 1 , wherein the report includes information to identify specific web site pages that contain missed data. 9. The method according to claim 1 , wherein the processor is provided with keywords that are of particular importance to determine whether any web page containing the important keyword is missed by the client search engine. 10. A system for validating results of a client search engine including: a processing device; and a non-transitory computer readable medium storing computer instruction code executed by the processing device to cause the processing device to: scan web pages of a client web-site and identify data objects including words on the web pages of the client web-site; create an index of the data objects identified on the web pages of the client web-site; select unique words from the index of the data objects by finding words in the index that have a low frequency of appearance on the web pages and identifying the low-frequency words as the unique words; send the unique words selected from the index of the data objects to the client search engine, wherein the client search engine inserts the unique words selected from the index of the data objects into a search field of the client search engine, and generates a search result, including web pages of the client web-site, based on the inserted unique words; receive, from the client search engine, the search result including the web pages of the client web-site located by the client search engine; compare the web pages of the client web-site in the search result located by the client search engine to the web pages of the client web-site recorded in the index of the data objects from which the unique words were selected; and based on the comparison, identify web pages of the client web-site recorded in the index that are not located by the client search engine. 11. The system according to claim 10 , wherein the computer instruction code further cause the processing device to generate a report of the web pages of the client web-site that are not located by the client search engine. 12. The system according to claim 11 , wherein the report includes a link to a web page identified in the report. 13. A non-transitory computer readable medium storing computer instruction code, that when executed by a processor, cause the processor to: scan web pages of a client web-site; identify, from the scanning, data objects including words on the web pages of the client web-site; create an index of the data objects identified in the web pages of the client web-site, wherein the index of the data objects includes the web pages of the client web-site associated with the data objects; select unique words from the index of the data objects by finding words in the index that have a low frequency of appearance on the web pages and identifying the low-frequency words as the unique words; send the unique words selected from the index of the data objects to the client search engine, wherein the client search engine inserts the unique words selected from the index of the data objects into a search field of the client search engine, and generates a search result, including web pages of the client web-site, based on the inserted unique words; receive the search result including the web pages of the client web-site located by the client search engine; compare the web pages of the client web-site in the search result located by the client search engine to the web pages recorded in the index of the data objects from which the unique words were selected; based on the comparison, identify web pages of the client web-site recorded in the index that are not located by the host search engine; and generate a report containing the web pages of the client web-site that are not located by the client search engine based on the comparison. 14. The non-transitory computer readable medium according to claim 13 , wherein the computer instruction code causes the processor to analyze the indexed data objects and pass only unique data objects as the unique words to the client search engine. 15. The non-transitory computer readable medium according to claim 14 , wherein the computer instruction code causes the processor to determine the unique data objects by further conducting an analysis of the data objects according to relative importance of context.

Assignees

Inventors

Classifications

  • G06F16/00Primary

    Information retrieval; Database structures therefor; File system structures therefor · CPC title

  • G06F16/958Primary

    Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking · CPC title

  • Indexing; Web crawling techniques · CPC title

  • Automatic justification · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10073919B2 cover?
A method of validating results of a host search engine ( 50 ), the method including the steps of scanning all data objects deliverable via a web interface with a scanning engine ( 25 ) and executing a matching engine ( 35 ) to generate a report set containing content missed by the host search engine ( 50 ).
Who is the assignee on this patent?
Kirkby Stephen Denis, Kellett Peter, Accenture Global Services Ltd
What technology area does this patent fall under?
Primary CPC classification G06F16/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 11 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).