Document content analysis based on topic modeling

US10558657B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10558657-B1
Application numberUS-201615269458-A
CountryUS
Kind codeB1
Filing dateSep 19, 2016
Priority dateSep 19, 2016
Publication dateFeb 11, 2020
Grant dateFeb 11, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A mechanism for progressive topic modeling is disclosed to facilitate document content analysis. Input documents can be sorted and divided into multiple groups. Topic modeling is performed for each group, where the topic modeling for one group is based on the generated topic model from a previous group, if available. The vocabulary used in the topic modeling process can also be updated for each group of documents. The generated topics can be presented in a user interface to facilitate a user in analyzing the documents. The topic modeling mechanism can also be utilized to enhance a document search experience by generating topics from documents contained in search results and presenting topic words to a user as suggested search terms.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for facilitating document search based on topic modeling, the method comprising: causing a display device to present a search section in a user interface, the search section configured to receive a search query and initiate a search on a collection of documents based on the search query; causing the display device to present the topic modeling section adjacent to the search section in the user interface, the topic modeling section configured to present one or more topics comprising one or more of a plurality of topic words extracted through topic modeling from one or more documents in search results generated based on the search query wherein presenting the one or more of a plurality of topic words comprises: determining dimensions of a topic modeling section in the user interface; determining a maximum number of topic words that can be presented in the topic modeling section based on the dimensions of the topic modeling section; selecting the one or more topic words from the plurality of topic words such that a number of the selected topic words does not exceed the maximum number of topic words; and causing the display device to present the selected topic words in the topic modeling section in the user interface; causing the display device to present a search result section adjacent to the topic modeling section in the user interface, the search result section configured to present the search results that comprise the one or more documents that satisfy the search query; receiving, from a user input device, user input indicating a user selection from the one or more topics presented in the topic modeling section of at least one of the topics and addition of the at least one of the topics to the search query in the search section to thereby form an updated search query; and in response to receiving the updated search query, initiating another search in the collection of documents based on the updated search query, receiving updated search results, updating the one or more topics based on the updated search results, causing the display device to present the updated search results in the search result section, and causing the display device to present the updated topics in the topic modeling section. 2. The computer-implemented method of claim 1 , wherein presenting the one or more topics comprises presenting one or more of the plurality of topic words that are not contained in the search query. 3. The computer-implemented method of claim 2 , wherein the updated search query contains at least one topic word displayed in the topic modeling section. 4. The computer-implemented method of claim 2 , further comprising: causing to be presented in the user interface a window configured to receive a selection of one or more stop words filtered from the updated search results. 5. The computer-implemented method of claim 1 , wherein the user interface is configured to enable the addition of the at least one of the topics to the search query to be performed by dragging and dropping a topic word of the at least one of the topics from the topic modeling section to the search section. 6. The computer-implemented method of claim 1 , wherein the one or more documents comprise at least one of customer reviews of products or services, blogs posted by blog authors, or news articles written by reporters reporting events. 7. The computer-implemented method of claim 1 , wherein causing the display device to present the updated topics comprises performing topic modeling on two or more sets of documents and the extracted topics are contained in one set of the one or more documents but not the second set of the one or more documents. 8. A non-transitory computer-readable storage media having instructions stored thereupon that are executable by one or more processors and which, when executed, cause the one or more processors to: cause a user interface for searching a collection of documents to be presented; receive a search query via the user interface; generate a search result by performing a search in the collection of documents based on the search query, the search result comprising one or more documents that satisfy the search query; extract one or more topics from the one or more documents through topic modeling, the one or more topics comprising a plurality of topic words; cause one or more of the plurality of topic words to be presented in the user interface as suggested topic words wherein causing one or more of the plurality of topic words to be presented in the user interface comprises: determining dimensions of a portion of the user interface for presenting the suggested topic words; determining a maximum number of topic words that can be presented in the portion of the user interface based on the dimensions of the portion of the user interface; selecting the suggested topic words from the plurality of topic words such that a number of the suggested topic words does not exceed the maximum number of topic words; and causing the suggested topic words to be presented in the portion of the user interface; receive a selection of at least one of the suggested topic words through the user interface; update the search result to generate an updated search result by performing another search in the collection of documents based on an updated search query that comprises the search query and the at least one of the suggested topic words; and cause the updated search result to be presented in the user interface. 9. The non-transitory computer-readable storage media of claim 8 , having further instructions stored thereupon to: extract topics from the updated search result to form extracted topics; and cause one or more topic words of the extracted topics that are not included in the updated search query to be presented in the user interface. 10. The non-transitory computer-readable storage media of claim 9 , wherein extracting topics from the updated search result is performed via topic modeling based on topic models generated from the previous search result and additional documents included in the updated search result. 11. The non-transitory computer-readable storage media of claim 9 , wherein extracting topics from the updated search result is performed via topic modeling on documents included in the updated search result. 12. The non-transitory computer-readable storage media of claim 8 , wherein the user interface is configured to allow the selection of a suggested topic word to be performed by dragging a user interface control representing the suggested topic word and dropping the user interface control to an input field of the user interface configured for entering the search query. 13. The non-transitory computer-readable storage media of claim 8 , wherein the suggested topic words presented in the user interface are organized based on respective topics that the suggested topic words belong to. 14. The non-transitory computer-readable storage media of claim 8 , further comprising: causing to be presented in the user interface a window configured to receive a selection of one or more stop words filtered from the updated search results. 15. A computer-implemented method for presenting topic models extracted from a collection of documents, the method comprising: generating a plurality of topics by performing a topic modeling process on a collection of documents, individual ones of the plurality of topics comprising a plurality of topic words; causing the plurality of topics to be presented in a user interface by displaying the plurality of topic words i

Assignees

Inventors

Classifications

  • G06F16/35Primary

    Clustering; Classification · CPC title

  • Document management systems · CPC title

  • of sub-queries or views · CPC title

  • Presentation of query results · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10558657B1 cover?
A mechanism for progressive topic modeling is disclosed to facilitate document content analysis. Input documents can be sorted and divided into multiple groups. Topic modeling is performed for each group, where the topic modeling for one group is based on the generated topic model from a previous group, if available. The vocabulary used in the topic modeling process can also be updated for each…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/35. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 11 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).