Scalable and Resource-Efficient Extraction of Data from Network-Accessible Documents
US-2021182343-A1 · Jun 17, 2021 · US
US11922469B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11922469-B2 |
| Application number | US-202217657709-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 1, 2022 |
| Priority date | Oct 11, 2019 |
| Publication date | Mar 5, 2024 |
| Grant date | Mar 5, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A framework for an automated news recommendation system for financial analysis. The system includes the automated ingestion, relevancy, clustering, and ranking of news events for financial analysts in the capital markets. The framework is adaptable to any form of input news data and can seamlessly integrate with other data used for analysis like financial data.
Opening claim text (preview).
What is claimed is: 1. A computer implemented method for recommending news articles, the computer implemented method comprising: ingesting, by a computer system, the news articles from a plurality of news sources; extracting, by the computer system, named entities from each news article to generate a one-hot vector for each news article using a statistical model; clustering, by the computer system, the news articles into clusters based on the one-hot vectors for the news articles; selecting, by the computer system, a representative news article for each cluster in the clusters; converting, by the computer system using a machine learning model, each word of each representative news article into a word representation based on character embeddings; modeling, by the computer system using the machine learning model, characteristics of use and characteristics of use across linguistic context for each word of each representative news article: inputting, by the computer system, word representations into a convolutional layer followed by a max-pool layer in the machine learning model to generate an input representation for each representative news article; generating, by the computer system using the machine learning model, a sentence representation for each representative news article based on the input representations for each news article; merging, by the computer system, clusters in the clusters based on semantic of each representative news article in each cluster to form merged clusters using the sentence representation for each representative news article; generating, by the computer system, a set of ranked clusters using the merged clusters and the sentence representations of each news article; digitally displaying, by the computer system, the set of ranked clusters in a graphical user interface; and manipulating, by the computer system, a number of controls in the graphical user interface to perform an action to the set of ranked clusters on the graphical user interface. 2. The computer implemented method of claim 1 , wherein generating, by the computer system, a set of ranked clusters using the merged clusters comprises: ranking, by the computer system, the news articles within each cluster; ranking, by the computer system, clusters within the set of ranked clusters based on cluster size; and storing, by the computer system, relational information between the set of ranked clusters, news stories of the news articles, and subscriptions of a user to a database. 3. The computer implemented method of claim 2 , wherein ranking of news articles within each cluster is based on trustworthiness and linking volume for each news source of news articles. 4. The computer implemented method of claim 2 , wherein the weighted average for each word of each news article is generated by a multi-layer bidirectional language model through learning the word embeddings for each news article. 5. The computer implemented method of claim 1 , wherein ingesting, by the computer system, news articles from a plurality of news sources comprises: creating, by the computer system, a portfolio for a user, wherein the portfolio comprises subscriptions of the user to different entities; and ingesting, by the computer system, the news articles from the plurality of news sources associated with the subscriptions of the user. 6. The computer implemented method of claim 1 further comprising: receiving, by the computer system, in response to an input from a user, feedback on the set of ranked clusters from the user. 7. The computer implemented method of claim 1 , wherein the news articles are clustered based on distances of pairwise news articles. 8. The computer implemented method of claim 1 , wherein the representative news article for each cluster is selected based on news publication date and news source significance. 9. The computer implemented method of claim 1 , wherein merging, by the computer system, clusters from the clusters based on semantic of each representative news article in each cluster to form merged clusters comprises: determining, by the computer system, a vector for each representative news article by modeling characteristics of word use and change on word use across linguistic context, wherein the vector presents semantic of each representative news article; and merging, by the computer system, clusters with similar semantic based on the vectors determined for the representative news articles of clusters. 10. A computer implemented method of claim 1 , wherein generating, by the computer system using the machine learning model, a sentence representation for each representative news article based on the input representations for each news article comprises: generating, by the computer system using the machine learning model, a weighted average for each input representation based on normalization; and generating, by the computer system using the machine learning model, a sentence representation for each news article based on weighted input representations for each news article. 11. A computer implemented method of claim 1 , wherein the input representations represent semantic of each news articles. 12. The computer implemented method of claim 1 further comprising: learning, by the computer system using the machine learning model, morphological features of words in each news article to form representation for out-of-vocabulary words in each news article. 13. The computer implemented method of claim 1 , wherein the word representation of each word distinguishes each word in a news article from other words in the new article. 14. A computer system comprising: a number of processor units, wherein the number of processor units executes program instructions to: ingest news articles from a plurality of news sources; extracting named entities from each news article to generate a one-hot vector for each news article using a statistical model; cluster the news articles into clusters based on the one-hot vectors for the news articles; select a representative news article for each cluster in the clusters; convert each word of each representative news article into a word representation based on character embeddings using a machine learning model; model characteristics of use and characteristics of use across linguistic context for each word of each representative news article using the machine learning model; input word representations into a convolutional layer followed by a max-pool layer in the machine learning model to generate an input representation for each representative news article; generate a sentence representation for each representative news article based on the input representations for each representative news article using the machine learning model; merge clusters in the clusters based on semantic of each representative news article in each cluster to form merged clusters using the sentence representation for each representative news article; generate a set of ranked clusters using the merged clusters; digitally display the set of ranked clusters in a graphical user interface; and manipulate a number of controls in the graphical user interface to perform an action to the set of ranked clusters on the graphical user interface. 15. The computer system of claim 14 , wherein in generating a set of ranked clusters using the merged clusters, the number of processor units executes program instructions to: rank news articles within each cluster; rank clusters within the set of ranked clusters based on cluster size; and store relational information between the set of ranked clusters, n
Feedforward networks · CPC title
Supervised learning · CPC title
Rating or review of business operators or products · CPC title
Aggregation; Duplicate elimination · CPC title
using ranking · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.