Feature engineering and user behavior analysis

US10387462B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10387462-B2
Application numberUS-201715408734-A
CountryUS
Kind codeB2
Filing dateJan 18, 2017
Priority dateOct 4, 2005
Publication dateAug 20, 2019
Grant dateAug 20, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and techniques are disclosed to rank documents by analyzing a query log generated by a search engine. The query log includes data relating to user behavior, queries and documents. The systems and techniques distill query log information into surrogate documents and extract features from these surrogate documents to rank the documents.

First claim

Opening claim text (preview).

What is claimed is: 1. An on-line legal research system comprising a data store having stored therein user activity data, the system further comprising: a server having an input for receiving a user input query and user query-related activity data and an output for transmitting a ranked set of documents, the server further comprising a processor, a memory adapted to store instructions, and a behavior module, when executed by the processor, adapted to identify feature values for use in improving query search results based on information associated with user activity, the behavior module adapted to: define a subset of queries in an event-centric surrogate document that are related to a user input query, the event-centric surrogate document comprising a set of queries and a set of events; determine, for each query in the subset of queries, a feature value for each associated event in the event-centric surrogate document; and aggregate the determined feature values for each event in the set of events and determine, based on the aggregated feature values, a final feature; wherein the final feature is determined based on the formula: fea ⁡ ( q u , d ) = ∑ q i ∈ Q ud ⁢ f ⁡ ( q i , d ) · g ⁡ ( q i , q u ) wherein q u is the user input query, d is the event-centric surrogate document, q i is a query in the subset of queries Q ud , g (q i , q u ) is a weight, and f (q i , d) is the query-document feature; wherein the server output is adapted to generate a signal representing a set of ranked documents and to transmit the generated signal, the set of ranked documents being ranked based at least in part on a set of determined feature values. 2. The system of claim 1 wherein the subset of queries is selected based on one or more of: selecting an exact match to the user input query; selecting a top-ranked subset of queries related to the user input query; and selecting queries for the subset of queries based on a predefined relatedness threshold. 3. The system of claim 1 wherein the behavior module is further adapted to weight feature values for each query in the subset of queries. 4. The system of claim 3 wherein the weighting is based on one or both of: the similarity to the user input query; and a normalized similarity to the user input query. 5. The system of claim 1 wherein the behavior module is further adapted to determine additional features for the event-centric surrogate document based on term-based similarity between a set of previous user input queries and the event-centric surrogate document. 6. The system of claim 5 wherein the additional features comprise one or more of: exact query-document similarity; query expansion; and document-document similarity. 7. The system of claim 6 further comprising a search engine adapted to identity a set of search results related to the user input query, and wherein the behavior module is further adapted to re-rank the set of search results based on the determined feature values. 8. The system of claim 7 further comprising a search engine adapted to identify a second set of search results related to the user input query and the determined feature values, and wherein the behavior module is further adapted to re-rank the set of search results based on the second set of search results and the determined feature values. 9. The system of claim 7 wherein the behavior module is further adapted to re-rank the set of search results based on the additional features and the final feature. 10. A method for identifying feature values for use in improving query search results based on information associated with user activity, the method comprising: defining a subset of queries in an event-centric surrogate document that are related to a user input query, the event-centric surrogate document comprising a set of queries and a set of events; determining, for each query in the subset of queries, a feature value for each associated event in the event-centric surrogate document; and aggregating the determined feature values for each event in the set of events and determining, based on the aggregated feature values, a final feature; wherein the final feature is determined based on the formula: fea ⁡ ( q u , d ) = ∑ q i ∈ Q ud ⁢ f ⁡ ( q i , d ) · g ⁡ ( q i , q u ) w

Assignees

Inventors

Classifications

  • Presentation of query results · CPC title

  • Query execution (filtering based on additional data G06F16/335) · CPC title

  • G06Q10/10Primary

    Office automation; Time management · CPC title

  • Selection or weighting of terms from queries, including natural language queries · CPC title

  • Reformulation based on results of preceding query · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10387462B2 cover?
Systems and techniques are disclosed to rank documents by analyzing a query log generated by a search engine. The query log includes data relating to user behavior, queries and documents. The systems and techniques distill query log information into surrogate documents and extract features from these surrogate documents to rank the documents.
Who is the assignee on this patent?
Thomson Reuters Global Resources Unlimited Co
What technology area does this patent fall under?
Primary CPC classification G06Q10/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 20 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).