Dynamic faceted search on a document corpus

US11275796B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11275796-B2
Application numberUS-201916399030-A
CountryUS
Kind codeB2
Filing dateApr 30, 2019
Priority dateApr 30, 2019
Publication dateMar 15, 2022
Grant dateMar 15, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A query-focused faceted structure generation method, system, and computer program product for generating a query-focused faceted structure from a taxonomy for searching a document collection, including ingesting a document corpus, generating a vector space representation of a query and instances from a taxonomy of the document corpus, and producing a dynamic structure of a relevant facet categories and facet values using a two-vector space representation from the generated vector space representation.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented query-focused faceted structure generation method for generating a query-focused faceted structure from a taxonomy for searching a document collection, the method comprising: ingesting a document corpus including a pre-processing that filters parts of speech; generating a vector space representation of a query and instances from a taxonomy of the document corpus via at least two models, the taxonomy being loaded and including a graph of a type and instance nodes where the instance nodes have a consistent relationship to the type; and producing a dynamic structure of a relevant category and facet using a two-vector space representation from the generated vector space representation based on a separate two-vector space representation of the at least two models, wherein the ingesting ingests the document corpus by: extracting the terminology that includes noun words and phrases from the document corpus to: train a type model that generates a phrase embedding of the terminology in the document corpus; and train a topic model that generates a second phrase embedding of the terminology in the document corpus, wherein the generating generates a vector for a user query as a weighted combination of the vector for each query token in the topic model as a query vector, wherein the generating generates a list of the vectors for instances from the taxonomy in the topic model, and wherein the producing produces the dynamic structure of the relevant category and the facet by: selecting a first parameter of nearest neighbor instances to the query vector from the taxonomy instances using the topic model as query-similar instances; selecting a second parameter of types in the taxonomy with a most number of query-similar instances to use as categories; selecting a third parameter of facets from instances of the types corresponding to each of the categories for the second parameter; and expanding from the third parameter of the facets within each of the second parameter of the categories to obtain more category-similar instances from the document corpus using the type model. 2. The method of claim 1 , further comprising returning the dynamic structure as a data file to a user. 3. The method of claim 1 , wherein the facets are ranked within each of the first parameter of the categories by distance to both: the query vector in the topic model vector space, and a centroid of the third parameter of instances that correspond to the category. 4. The method of claim 1 , embodied in a cloud-computing environment. 5. A computer program product for query-focused faceted structure generation, the computer program product comprising a computer-readable storage medium having program instructions embodied therewith for generating a query-focused faceted structure from a taxonomy for searching a document collection, the program instructions executable by a computer to cause the computer to perform: ingesting a document corpus including a pre-processing that filters parts of speech; generating a vector space representation of a query and instances from a taxonomy of the document corpus via at least two models, the taxonomy being loaded and including a graph of a type and instance nodes where the instance nodes have a consistent relationship to the type; and producing a dynamic structure of a relevant category and facet using a two-vector space representation from the generated vector space representation based on a separate two-vector space representation of the at least two models, wherein the ingesting ingests the document corpus by: extracting the terminology that includes noun words and phrases from the document corpus to: train a type model that generates a phrase embedding of the terminology in the document corpus; and train a topic model that generates a second phrase embedding of the terminology in the document corpus, wherein the generating generates a vector for a user query as a weighted combination of the vector for each query token in the topic model as a query vector, wherein the generating generates a list of the vectors for instances from the taxonomy in the topic model, and wherein the producing produces the dynamic structure of the relevant category and the facet by: selecting a first parameter of nearest neighbor instances to the query vector from the taxonomy instances using the topic model as query-similar instances; selecting a second parameter of types in the taxonomy with a most number of query-similar instances to use as categories; selecting a third parameter of facets from instances of the types corresponding to each of the categories for the second parameter; and expanding from the third parameter of the facets within each of the second parameter of the categories to obtain more category-similar instances from the document corpus using the type model. 6. The computer program product of claim 5 , further comprising returning the dynamic structure as a data file to a user. 7. The computer program product of claim 5 , wherein the facets are ranked within each of the first parameter of the categories by distance to both: the query vector in the topic model vector space, and a centroid of the third parameter of instances that correspond to the category. 8. A query-focused faceted structure generation system for generating a query-focused faceted structure from a taxonomy for searching a document collection, the system comprising: a processor; and a memory, the memory storing instructions to cause the processor to perform: ingesting a document corpus including a pre-processing that filters parts of speech; generating a vector space representation of a query and instances from a taxonomy of the document corpus via at least two models, the taxonomy being loaded and including a graph of a type and instance nodes where the instance nodes have a consistent relationship to the type; and producing a dynamic structure of a relevant category and facet using a two-vector space representation from the generated vector space representation based on a separate two-vector space representation of the at least two models, wherein the ingesting ingests the document corpus by: extracting the terminology that includes noun words and phrases from the document corpus to: train a type model that generates a phrase embedding of the terminology in the document corpus; and train a topic model that generates a second phrase embedding of the terminology in the document corpus, wherein the generating generates a vector for a user query as a weighted combination of the vector for each query token in the topic model as a query vector, wherein the generating generates a list of the vectors for instances from the taxonomy in the topic model, and wherein the producing produces the dynamic structure of the relevant category and the facet by: selecting a first parameter of nearest neighbor instances to the query vector from the taxonomy instances using the topic model as query-similar instances; selecting a second parameter of types in the taxonomy with a most number of query-similar instances to use as categories; selecting a third parameter of facets from instances of the types corresponding to each of the categories for the second parameter; and expanding from the third parameter of the facets within each of the second parameter of the categories to obtain more category-similar instances from the document corpus using the type model. 9. The system of claim 8 , further comprising returning the dynamic structure as a data file to a user. 10. The system of claim 8 , embodied in a cloud-computing environment.

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Feedforward networks · CPC title

  • Machine learning · CPC title

  • Knowledge engineering; Knowledge acquisition · CPC title

  • G06F16/93Primary

    Document management systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11275796B2 cover?
A query-focused faceted structure generation method, system, and computer program product for generating a query-focused faceted structure from a taxonomy for searching a document collection, including ingesting a document corpus, generating a vector space representation of a query and instances from a taxonomy of the document corpus, and producing a dynamic structure of a relevant facet catego…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/93. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).