System and methods for generating an enhanced output of relevant content to facilitate content analysis

US11609959B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11609959-B2
Application numberUS-201816148340-A
CountryUS
Kind codeB2
Filing dateOct 1, 2018
Priority dateMar 3, 2018
Publication dateMar 21, 2023
Grant dateMar 21, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to methods and systems for ingesting content from data feeds and to generate an enhanced output of relevant content for presentation to a user to facilitate analysis of the relevant content. Content is received from data feeds, and filtered to identify relevant content with respect to a particular context. The relevant content is then processed, e.g., using natural language processes, to extract entities involved, and to also identify particular activities detailed in the relevant content. Activity-mining is applied to the identified relevant data to classify and assigned activity tags to the extracted entity. Based on the extracted and identified information, an enhanced output is generated for presentation to facilitate research operations. The enhanced output may include overlaid graphical annotations, indicators, and graphical controls over the relevant articles to provide a means for updating a database based on the relevant content.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of enhancing an output of relevant content to facilitate analysis of the relevant content, comprising: receiving content from data feeds; optimizing the content to be analyzed to generate optimized content, wherein the optimizing includes parsing the content to be analyzed to extract a text portion of the content to be analyzed for further analysis, while removing extraneous information from the content to be analyzed, wherein the extraneous information comprises advertisements; filtering out non-relevant content from the optimized content, the non-relevant content including content unrelated to a subject associated with the analysis, the filtering including retaining relevant content, wherein: filtering out the non-relevant content from the optimized content comprises filtering out duplicate content from the optimized content, in response to determining that an instance of the content is unavailable, retrieving the duplicate content via one or more links to the duplicate content, and the content to be analyzed includes one or more of a news article, a blog, a social media post, and a long form article; extracting entity data corresponding to entities mentioned in the relevant content, wherein the entity data includes entity metadata associated with the entities, and wherein extracting the entity data comprises: determining that a first entity of the entities is related to a second entity of the entities, and linking the first entity to the second entity; activity-mining the relevant content based on the extracted entity data and the entity metadata to associate the entities with activity tags corresponding to activities associated with the entities and extracted from the relevant content; matching the entities to entity profiles in a database to generate matched entity profiles, the matching based on the entity metadata and profile metadata of the entity profiles in the database; annotating the relevant content with at least one graphical indicator and at least one graphical user interface (GUI) control based at least in part on one or more of the entities, the matched entity profiles, and the activity tags; and rendering a GUI having at least a first area configured to display a list of the relevant content and a second area configured to display an enhanced output, wherein the enhanced output comprises an item selected from the list of relevant content, wherein original text of the item includes highlighting overlaid on the one or more of the entities to accentuate the one or more of the entities, each entity of the one or more of the entities highlighted with a distinct color to differentiate each entity of the one or more of the entities. 2. The method of claim 1 , wherein the determining that the first entity is related to the second entity comprises calculating a confidence score, wherein the confidence score comprises a relation similarity score and a name similarity score. 3. The method of claim 1 , further comprising: determining a number of sources that correspond to the duplicate content based on a number of links pointing to the duplicate content. 4. The method of claim 1 , wherein the filtering out the non-relevant content includes applying a keyword filter and a natural expression pattern matching filter to the content to be analyzed, and wherein the rendering the GUI further comprises rendering the GUI having a third area configured to display the entity profiles extracted from the database. 5. The method of claim 1 , wherein the filtering out the non-relevant content further includes clustering content elements based on a similarity of the content elements to each other, wherein the clustering comprises: generating a plurality of vectors, wherein each vector of the plurality of vectors corresponds to a content element and includes values representing a relative frequency of each word in the content element; and comparing each vector of the plurality of vectors to determine a similarity of content elements to each other. 6. The method of claim 5 , wherein the clustering further includes applying a Term Frequency-Inverse Document Frequency (TF-IDF) algorithm that includes determining a frequency of a term within a content element, and determining a frequency of the term across all content elements of the content to be analyzed. 7. The method of claim 1 , wherein the extracting the entity data from the relevant content includes: identifying entities mentioned in a content element; and obtaining the metadata associated with the mentioned entities. 8. The method of claim 7 , wherein the identifying the entities mentioned in the content element includes applying a natural expression language model to the relevant content, and wherein a type of the entity mentioned in the item corresponds to whether the entity is a person, a country, or a location. 9. The method of claim 2 , wherein the relation similarity score is calculated using a MinHash algorithm, and wherein the name similarity score is calculated using a Levenshtein distance algorithm. 10. The method of claim 7 , wherein the metadata associated with a particular entity includes a location of the mention of the particular entity within the content element claim 1 , the method further comprising aggregating the one or more data feeds of the data feeds by dynamically generating a list of the one or more data feeds likely to include content to be of interest, wherein the list is generated based on metadata included in the content. 11. The method of claim 1 , wherein the matching the entities to entity profiles in the database to generate the matched entity profiles includes: identifying candidate profiles in the database based on a coarse similarity estimate of the candidate profiles to the extracted entities; calculating a refined candidate score for each candidate profile, the refined candidate score based on the entity metadata and the profile metadata of each entity profile; comparing the refined candidate score for each candidate profile with a threshold; and designating candidate profiles as a match to a particular extracted entity when the refined candidate score of the candidate profiles exceeds the threshold. 12. The method of claim 1 , wherein the at least one graphical indicator includes an indicator indicating that a particular extracted entity is unmatched with a profile entity in the database, and wherein the at least one GUI control includes a GUI control for executing an update to the database to store the particular extracted entity in the database. 13. A system for enhancing an output of relevant articles to facilitate analysis of the relevant articles, comprising: at least one data feed for receiving one or more articles to be analyzed; a server comprising a memory and a processor, wherein the memory includes a classifier, the classifier configured to: optimize the articles to be analyzed to generate optimized content, wherein the classifier configured to optimize the articles to be analyzed further comprises the classifier configured to parse the articles to be analyzed to extract a text portion of the articles to be analyzed for further analysis, while removing extraneous information from the articles to be analyzed, wherein the extraneous information comprises advertisements; filter out non-relevant articles from the optimized content, the non-relevant articles including articles unrelated to a subject associated with the analysis, wherein relevant articles are retained, wherein the classifier configured to filter out the non-relevant content from the optimized content further comprises the classifier configur

Assignees

Inventors

Classifications

  • Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • using ranking · CPC title

  • Presentation of query results · CPC title

  • Data mining · CPC title

  • Query processing support for facilitating data mining operations in structured databases · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11609959B2 cover?
The present disclosure relates to methods and systems for ingesting content from data feeds and to generate an enhanced output of relevant content for presentation to a user to facilitate analysis of the relevant content. Content is received from data feeds, and filtered to identify relevant content with respect to a particular context. The relevant content is then processed, e.g., using natura…
Who is the assignee on this patent?
Refinitiv Us Organization Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/335. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 21 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).