Systems and methods for paragraph-based document searching

US10002196B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10002196-B2
Application numberUS-201514748918-A
CountryUS
Kind codeB2
Filing dateJun 24, 2015
Priority dateMar 31, 2011
Publication dateJun 19, 2018
Grant dateJun 19, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computerized method of searching a collection of electronic documents may include comparing search terms to sets of paragraph terms associated with paragraphs in the documents. Search terms and paragraph terms may be standardized, prior to the comparison. The method may also include generating paragraph scores for the paragraphs using term weight values associated with paragraph terms that match search terms, generating paragraph scores for the paragraphs, and using the paragraph scores to generate overall document scores. The method may also include using the overall document scores to determine a set of search results and providing the search results to a display.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of searching a collection of electronic documents, the method comprising: replacing a set of synonymous terms appearing in a paragraph with a set of standardized paragraph terms, wherein each standardized paragraph term has an associated term weight; generating standardized search terms in response to a search query; generating paragraph scores for paragraphs of a document based at least in part on the associated weights of standardized paragraph terms that match one or more of the standardized search terms; determining overall document scores for the electronic documents based at least in part on a combination of the paragraph scores; and determining a set of matching documents, wherein the set of matching documents is ordered using the overall document scores. 2. The method of claim 1 , further comprising: generating sets of paragraph terms for the paragraphs of the electronic documents; standardizing the sets of paragraph terms to generate sets of standardized paragraph terms for the paragraphs; and associating a term weight with each standardized paragraph term. 3. The method of claim 2 , wherein the term weights are based on inverse frequency scores. 4. The method of claim 1 , wherein the paragraph scores are generated using a limit on the number of times a matching search term can be counted for a paragraph. 5. The method of claim 1 , wherein the overall document score is determined using the formula: W d = ∑ n = 1 k ⁢ ⁢ ( W n ) P , where Wd is the overall document score, k is the number of paragraphs in a document, Wn is the paragraph score of the nth paragraph in the document, and P is a value. 6. The method of claim 5 , wherein the value of P is within a range of 2.0 to 3.0. 7. The method of claim 1 , further comprising: retrieving a text of a matching document in response to receiving a selection of the matching document; and providing the text to a display device. 8. The method of claim 1 , wherein the standardized paragraph terms comprise legal terms. 9. A method of searching a collection of electronic documents comprising: replacing a set of synonymous terms within a paragraph with a set of standardized paragraph terms for paragraphs in electronic documents of a collection; associating term weight values with paragraph terms in the sets of standardized paragraph terms, wherein each term weight value is associated with an individual paragraph term; generating a set of search terms in response to receipt of a search query, wherein the search terms are based at least in part on a query string of the search query; replacing the search query with the set of standardized paragraph terms; comparing the set of search terms with the sets of paragraph terms; generating a paragraph score for the paragraphs using the term weight values of the standardized paragraph terms that match one or more of the search terms, wherein each paragraph score is associated with an individual paragraph; generating an overall document score for the electronic documents by combining the paragraph scores of the paragraphs in the electronic documents, wherein each overall document score is associated with an individual electronic document; determining, by a processor, a set of matching documents from the electronic documents associated with the collection based at least in part on the generated overall document scores, wherein the electronic documents within the set of matching documents are sorted by overall document score; and providing the set of matching documents for display. 10. The method of claim 9 , wherein the term weight values are generated using inverse frequency scores. 11. The method of claim 9 , wherein the paragraph scores are generated by limiting the number of times a paragraph term can be counted to generate a paragraph score. 12. The method of claim 9 , wherein the overall document weights are computed by: W d = ∑ n = 1 k ⁢ ⁢ ( W n ) P , where Wd is the overall document score, k is the number of paragraphs in the document, Wn is the paragraph score of the nth paragraph in a document, and P is a value. 13. The method of claim 12 , wherein the value of P is in a range of 2.0 to 3.0. 14. The method of claim 9 , wherein the paragraph scores are generated for less than, or equal to, a maximum number of paragraphs. 15. The method of claim 9 , wherein a number of paragraph scores that is used to generate an overall document score is less than the number of paragraphs in a document.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10002196B2 cover?
A computerized method of searching a collection of electronic documents may include comparing search terms to sets of paragraph terms associated with paragraphs in the documents. Search terms and paragraph terms may be standardized, prior to the comparison. The method may also include generating paragraph scores for the paragraphs using term weight values associated with paragraph terms that ma…
Who is the assignee on this patent?
Lexisnexis Division Of Reed Elsevier Inc
What technology area does this patent fall under?
Primary CPC classification G06F17/30867. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 19 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).