Clustering a set of natural language queries based on significant events

US11036776B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11036776-B2
Application numberUS-201916436882-A
CountryUS
Kind codeB2
Filing dateJun 10, 2019
Priority dateNov 8, 2016
Publication dateJun 15, 2021
Grant dateJun 15, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Clustering a set of natural language queries NLQs based on a set of significant events retrieved from a corpus stored in a computer system is described. A set of NLQs is used by a search engine for searching a selected corpus to retrieve respective sets of significant events. The set of NLQs is clustered into a plurality of NLQ clusters according to a number of common significant events being returned by the search engine for respective members of an NLQ cluster.

First claim

Opening claim text (preview).

The invention claimed is: 1. An improved method for searching a selected corpus by clustering a set of natural language queries (NLQ) based on a set of significant events retrieved from a corpus stored in a computer system comprising: using a set of NLQs by a search engine for searching a selected corpus to retrieve respective sets of significant events from the selected corpus; for each NLQ in the set of NLQs, using a first set of entities from the NLQ and using the first set of entities to search for a first set of significant events in the selected corpus; using a second set of entities from the first set of significant events to search for a second set of significant events in the selected corpus; producing a distribution profile for each NLQ based on a number of common significant events retrieved using the first set of entities and a number of common significant events retrieved using the second set of entities; clustering the set of NLQs into NLQ clusters according to the distribution profiles; and using a respective NLQ cluster for a function in the search engine. 2. The method as recited in claim 1 , wherein the first set of significant events in the selected corpus are determined in a first search pass and the second set of significant events in the selected corpus in a second search pass. 3. The method as recited in claim 1 , wherein the clustering is also based in part on common linguistic and semantic features of respective NLQs. 4. The method as recited in claim 3 , further comprising: from user input, receiving a threshold number of common significant events as a clustering criterion; and from user input, receiving a threshold number of common linguistic and semantic features in an NLQ as a clustering criterion. 5. The method as recited in claim 2 , further comprising: building a knowledge graph based on a selected corpus stored in the computer system, the knowledge graph having a set of co-occurrence scores on edges of the knowledge graph between respective events in the selected corpus placed at the nodes of the knowledge graph, wherein the co-occurrence scores indicate co-occurrence of entities within respective pairs of events in the selected corpus; and using the knowledge graph to extract the second set of entities. 6. The method as recited in claim 2 , further comprising: extracting a third set of entities from the second set of significant events and using the third set of entities to search for a third set of significant events in the selected corpus in a third search pass; and producing a distribution profile for each NLQ based on a number of significant events retrieved in the first search pass, the second search pass and the third search pass. 7. The method as recited in claim 2 , further comprising: determining a significance score for respective events retrieved by the search system according to a metric of mutual information (MMI); and filtering the retrieved events according to respective significance scores to produce the first set of significant events. 8. The method as recited in claim 1 , wherein the first and second sets of entities have no common members. 9. Apparatus, comprising: a processor; computer memory holding computer program instructions executed by the processor for improved searching of a selected corpus by clustering a set of natural language queries (NLQ), the computer program instructions comprising: program code, operative to use a set of NLQs by for searching a selected corpus to retrieve respective sets of significant events from the selected corpus; program code, operative for each NLQ in the set of NLQs to use a first set of entities from the NLQ and to use the first set of entities to search for a first set of significant events in the selected corpus; program code, operative to use a second set of entities from the first set of significant events to search for a second set of significant events in the selected corpus; program code, operative to produce a distribution profile for each NLQ based on a number of common significant events retrieved using the first set of entities and a number of common significant events retrieved using the second set of entities; program code, operative to cluster the set of NLQs into NLQ clusters according to the distribution profiles; and program code, operative to use a respective NLQ cluster for a function in the search engine. 10. The apparatus as recited in claim 9 , wherein the first set of significant events in the selected corpus are determined in a first search pass and the second set of significant events in the selected corpus in a second search pass. 11. The apparatus as recited in claim 9 , wherein the clustering is also based in part on common linguistic and semantic features of respective NLQs. 12. The apparatus as recited in claim 11 , further comprising: program code, operative to receive a threshold number of common significant events as a clustering criterion; and program code, operative to receive a threshold number of common linguistic and semantic features in an NLQ as a clustering criterion. 13. The apparatus as recited in claim 10 , further comprising: program code, operative to build a knowledge graph based on a selected corpus stored in the computer system, the knowledge graph having a set of co-occurrence scores on edges of the knowledge graph between in the selected corpus placed at the nodes of the knowledge graph, wherein the co-occurrence scores indicate co-occurrence of entities within respective pairs of events in the selected corpus; and program code, operative to use the knowledge graph to extract the second set of entities. 14. The apparatus as recited in claim 10 , further comprising: program code, operative to extract a third set of entities from the second set of significant events and using the third set of entities to search for a third set of significant events in the selected corpus in a third search pass; and program code, operative to produce a distribution profile for each NLQ based on a number of significant events retrieved in the first search pass, the second search pass and the third search pass. 15. A computer program product in a non-transitory computer readable medium for use in a data processing system, the computer program product holding computer program instructions executed by the data processing system for improved searching of a selected corpus by performing clustering of natural language queries (NLQ), the computer program instructions comprising: program code, operative to use a set of NLQs by for searching a selected corpus to retrieve respective sets of significant events from the selected corpus; program code, operative for each NLQ in the set of NLQs to use a first set of entities from the NLQ and to use the first set of entities to search for a first set of significant events in the selected corpus; program code, operative to use a second set of entities from the first set of significant events to search for a second set of significant events in the selected corpus; program code, operative to produce a distribution profile for each NLQ based on a number of common significant events retrieved using the first set of entities and a number of common significant events retrieved using the second set of entities; program code, operative to cluster the set of NLQs into NLQ clusters according to the distribution profiles; and program code, operative to use a respective NLQ cluster for a function in the search engine. 16. The computer program product as recited in claim 15 , wherein the first set of signi

Assignees

Inventors

Classifications

  • using natural language analysis · CPC title

  • Clustering; Classification · CPC title

  • Clustering; Classification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11036776B2 cover?
Clustering a set of natural language queries NLQs based on a set of significant events retrieved from a corpus stored in a computer system is described. A set of NLQs is used by a search engine for searching a selected corpus to retrieve respective sets of significant events. The set of NLQs is clustered into a plurality of NLQ clusters according to a number of common significant events being r…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/3344. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 15 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).