Model-driven data insights for latent topic materiality

US2024248963A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024248963-A1
Application numberUS-202418418171-A
CountryUS
Kind codeA1
Filing dateJan 19, 2024
Priority dateJan 21, 2023
Publication dateJul 25, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein are machine learning methods and systems for locating and tracking performance of latent themes in changing data from disparate sources. Themes may be indirect goals or consequential impacts indicated by latent topics. Identifying performance indicators of latent themes in large changing data sets uncovers underlying trends or previously concealed behaviors that may be accelerating or undermining goals.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: ingesting data from a from disparate data sources to identify data portions of multiple data sets with information that relate to the one or more issues; evaluating the information for each data set against an ensemble model to determine a set of topics for the one or more issues; generate a topic materiality score for each data portion that indicate relevancy of the information to a particular topic in the set of topics; training a time series aggregation model with the topic materiality scores to the one or more issues over time for the data portions of multiple data sets; utilizing the time series aggregation model to generate insights analysis for a selected issue of the one or more issues; applying the insight analysis to updated data from the disparate data sources to output a prediction or recommend for the selected issue. 2 . The method of claim 1 , wherein the data include at least one data point selected comprising a measurement metric related to the one or more topics of the set of topics, and a performance measure related to the one or more topics of the set of topics. 3 . The method of claim 1 , wherein the time series aggregation model is configured to determine a respective performance score for each respective topic of the associated of topics, wherein the respective performance score indicates a contribution of the initiative to overall performance for the respective issue. 4 . The method of claim 1 wherein ingesting data from a from disparate data sources uses canonical type declarations with metadata to normalize data portions from different data sources. 5 . The method of claim 1 , wherein the disparate data sources include enterprise data sources of an organization, wherein the set of topics are latent, and wherein the one or more issues comprises Environmental, Social, and Governance (ESG) issues for the organization. 6 . A method comprising: determining, by an ensemble machine-learning model, an intensity value indicating an amount that one or more targets discuss one or more Environmental, Social, and Governance (ESG) issues, the one or more targets belonging to an entity; determining, using one or more models, a sentiment value associated with the target regarding the entity of the target in relation to the one or more ESG issues; analyzing, using one or more models based on the intensity value and the sentiment value, a plurality of data records. 7 . The method of claim 6 , wherein the one or more models comprise any of one or more natural language processing models, one or more large language models, and one or more machine learning models. 8 . The method of any of claim 6 , wherein the analyzing is performed in-app with an intuitive user interface. 9 . The method of any of claim 6 , further comprising fine-tuning one or more large language model prompts for the models and providing explainability for the determination of intensity. 10 . The method of any of claim 6 , wherein the intensity value is determined based on context provided to the one or more models, wherein the context is based on the target, the entity, and the one or more ESG issues. 11 . A method comprising: evaluating information associated with a set of data record portions from a corpus of data records against an ensemble model to generate a set of topic scores, wherein the ensemble model is configured to assign a plurality of labels to each data record portion of the set of data record portions, the plurality of labels corresponding to a different topic of a set of topics, and wherein the set of topic scores comprises a score for each label of the plurality of labels; generating data record scores for each data record of the corpus of data records, wherein a data record score for an associated data record is generated based on one or more topic scores of the set of topic scores associated with one or more data record portions of the set of data record portions corresponding to the associated data record; and outputting a set of data record scores that includes a data record score for each data record of the corpus of data records, wherein the set of data record scores indicate a set of topics identified within the corpus of data records. 12 . The method of claim 11 , wherein the ensemble model is configured to generate the set of topic scores based on a first set of predictions and a second set of predictions output by a first model and a second model, respectively. 13 . The method of claim 12 , wherein the first model is configured to: search the set of data record portions for one or more keywords associated with the set of topics; and generate the first set of predictions based on identification of keywords associated with the set of topics for each of the data record portions. 14 . The method of claim 12 , further comprising extracting, using the first model, the second model, or both, a set of embedding vectors corresponding to the set of data record portions. 15 . The method of claim 14 , further comprising: determining a similarity score for each embedding vector of the set of embedding vectors based on a set of centroid embedding vectors corresponding to the set of topics, wherein the ensemble model is configured to generate the set of topic scores based at least in part on the similarity scores determined for the set of embedding vectors, wherein the first set of predictions or the second set of predictions includes the set of topic scores. 16 . The method of claim 14 , further comprising: determining, for each data record portion, one or more nearest neighbors based on the embedding. 17 . The method of claim 16 , further comprising: searching each nearest neighbor for a particular data record portion based on one or more keywords associated with the set of topics; determining one or more topic predictions associated with each data record portion of the set of data record portions based on the searching; and outputting the one or more topic predictions to the ensemble as the second set of predictions. 18 . The method of claim 17 , further comprising determining, based on the searching, whether a particular keyword is present in a threshold number of nearest neighbors, wherein a particular topic of the set of topics is identified for a particular data record portion when one or more keywords are identified within at least the threshold number of nearest neighbors. 19 . The method of claim 16 , wherein the one or more nearest neighbors are identified based on training data, the method further comprising: determining candidate topics corresponding to each data record portion based on labels associated with the nearest neighbors identified within the training data; and outputting predicted topics determined for each data record portion as the second set of predictions based on the candidate topics. 20 . The method of claim 11 , further comprising: converting a set of data record scores into timeseries data representing relevance for each topic associated with an initiative, wherein a timescale of the timeseries data can re-configure significance of topics over time based on an output target; and normalizing the data record scores, wherein the normalizing is based at least in part on a data record type associated with data records corresponding to each of the data record scores.

Assignees

Inventors

Classifications

  • Government or public services (business processes related to the transportation industry G06Q50/40) · CPC title

  • based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024248963A1 cover?
Described herein are machine learning methods and systems for locating and tracking performance of latent themes in changing data from disparate sources. Themes may be indirect goals or consequential impacts indicated by latent topics. Identifying performance indicators of latent themes in large changing data sets uncovers underlying trends or previously concealed behaviors that may be accelera…
Who is the assignee on this patent?
C3 Ai Inc
What technology area does this patent fall under?
Primary CPC classification G06F18/2415. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jul 25 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).