Dynamic faceted search on a document corpus

US11003701B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11003701-B2
Application numberUS-201916399180-A
CountryUS
Kind codeB2
Filing dateApr 30, 2019
Priority dateApr 30, 2019
Publication dateMay 11, 2021
Grant dateMay 11, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A query-focused faceted structure generation method, system, and computer program product for generating a query-focused faceted structure from a taxonomy for searching a document corpus, including augmenting taxonomy types with new instances where the instances comprise entities within a proximity of existing instances of taxonomy types in a local embedding of entities parsed from the document corpus, ranking each instance in the augmented taxonomy with respect to its type as a function of both a distance from an instance to a query in a global embedding vector space of the entities trained from the document corpus and a distance of an instance to a type in the local embedding, and ranking the taxonomy types using expanded instances in the document corpus for each type.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented query-focused faceted structure generation method for generating a query-focused faceted structure from a taxonomy for searching a document corpus, the method comprising: augmenting taxonomy types with new instances where the instances comprise entities within a proximity of existing instances of taxonomy types in a local embedding of entities parsed from the document corpus; ranking each instance in the augmented taxonomy with respect to its type as a function of both a distance from an instance to a query in a global embedding vector space of the entities trained from the document corpus and a distance of an instance to a type in the local embedding; and ranking the taxonomy types using expanded instances in the document corpus for each type. 2. The method of claim 1 , presenting a dynamic structure including a faceted structure for a narrowing search of the document corpus to a user, the faceted structure being generated by selecting the ranked taxonomy types as search categories and ranked instances within each type as search facets within each category. 3. The method of claim 1 , further comprising returning the dynamic structure as a data file to a user. 4. The method of claim 2 , further comprising returning the dynamic structure as a data file to a user. 5. The method of claim 1 , further comprising ingesting the document corpus by: extracting the terminology that includes noun words and phrases from the document corpus to: train a type model that generates a phrase embedding of the terminology in the document corpus; and train a topic model that generates a second phrase embedding of the terminology in the document corpus. 6. The method of claim 1 , wherein the taxonomy types are loaded and includes a graph of type and instance nodes where instances have a consistent relationship to type. 7. The method of claim 1 , embodied in a cloud-computing environment. 8. A computer program product for query-focused faceted structure generation, the computer program product comprising a computer-readable storage medium having program instructions embodied therewith for generating a query-focused faceted structure from a taxonomy for searching a document corpus, the program instructions executable by a computer to cause the computer to perform: augmenting taxonomy types with new instances where the instances comprise entities within a proximity of existing instances of taxonomy types in a local embedding of entities parsed from the document corpus; ranking each instance in the augmented taxonomy with respect to its type as a function of both a distance from an instance to a query in a global embedding vector space of the entities trained from the document corpus and a distance of an instance to a type in the local embedding; and ranking the taxonomy types using expanded instances in the document corpus for each type. 9. The computer program product of claim 8 , presenting a dynamic structure including a faceted structure for a narrowing search of the document corpus to a user, the faceted structure being generated by selecting the ranked taxonomy types as search categories and ranked instances within each type as search facets within each category. 10. The computer program product of claim 8 , further comprising returning the dynamic structure as a data file to a user. 11. The computer program product of claim 9 , further comprising returning the dynamic structure as a data file to a user. 12. The computer program product of claim 8 , further comprising ingesting the document corpus by: extracting the terminology that includes noun words and phrases from the document corpus to: train a type model that generates a phrase embedding of the terminology in the document corpus; and train a topic model that generates a second phrase embedding of the terminology in the document corpus. 13. The computer program product of claim 8 , wherein the taxonomy types are loaded and includes a graph of type and instance nodes where instances have a consistent relationship to type. 14. A query-focused faceted structure generation system for generating a query-focused faceted structure from a taxonomy for searching a document corpus, the system comprising: a processor; and a memory, the memory storing instructions to cause the processor to perform: augmenting taxonomy types with new instances where the instances comprise entities within a proximity of existing instances of taxonomy types in a local embedding of entities parsed from the document corpus; ranking each instance in the augmented taxonomy with respect to its type as a function of both a distance from an instance to a query in a global embedding vector space of the entities trained from the document corpus and a distance of an instance to a type in the local embedding; and ranking the taxonomy types using expanded instances in the document corpus for each type. 15. The system of claim 14 , presenting a dynamic structure including a faceted structure for a narrowing search of the document corpus to a user, the faceted structure being generated by selecting the ranked taxonomy types as search categories and ranked instances within each type as search facets within each category. 16. The system of claim 14 , further comprising returning the dynamic structure as a data file to a user. 17. The system of claim 15 , further comprising returning the dynamic structure as a data file to a user. 18. The system of claim 14 , further comprising ingesting the document corpus by: extracting the terminology that includes noun words and phrases from the document corpus to: train a type model that generates a phrase embedding of the terminology in the document corpus; and train a topic model that generates a second phrase embedding of the terminology in the document corpus. 19. The system of claim 14 , wherein the taxonomy types are loaded and includes a graph of type and instance nodes where instances have a consistent relationship to type. 20. The system of claim 14 , embodied in a cloud-computing environment.

Assignees

Inventors

Classifications

  • Phrasal analysis, e.g. finite state techniques or chunking · CPC title

  • Management therefor · CPC title

  • Creation of semantic tools, e.g. ontology or thesauri · CPC title

  • using natural language analysis · CPC title

  • Indexing structures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11003701B2 cover?
A query-focused faceted structure generation method, system, and computer program product for generating a query-focused faceted structure from a taxonomy for searching a document corpus, including augmenting taxonomy types with new instances where the instances comprise entities within a proximity of existing instances of taxonomy types in a local embedding of entities parsed from the document…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/3344. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 11 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).