Self-supervised system generating embeddings representing sequenced activity

US12591903B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12591903-B2
Application numberUS-202418740485-A
CountryUS
Kind codeB2
Filing dateJun 11, 2024
Priority dateMay 25, 2020
Publication dateMar 31, 2026
Grant dateMar 31, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosure herein describes a system for generating embeddings representing sequential human activity by self-supervised, deep learning models capable of being utilized by a variety of machine learning prediction models to create predictions and recommendations. An encoder-decoder is provided to create user-specific journeys, including sequenced events, based on human activity data from a plurality of tables, a customer data platform, or other sources. Events are represented by sequential feature vectors. A user-specific embedding representing user activities in relationship to activities of one or more other users is created for each user in a plurality of users. The embeddings are updated in real-time as new activity data is received. The embeddings can be fine-tuned using labeled data to customize the embeddings for a specific predictive model. The embeddings are utilized by predictive models to create product recommendations and predictions, such as customer churn, next steps in a customer journey, etc.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system for generating embeddings representing sequential human activity using a self-supervised activity sequencing model, the system comprising: a computer-readable medium storing instructions that are operative upon execution by a processor to: obtain different types of unlabeled activity data from a plurality of data tables, the different types of unlabeled activity data associated with a plurality of users and a set of time indicators; for each user in the plurality of users, generate a user-specific journey based on the different types of unlabeled activity data stored across different platforms, wherein the user-specific journey comprises a set of sequenced events obtained from the different types of unlabeled activity data in a chronological order in accordance with the set of time indicators; generate, by the self-supervised activity sequencing model, a standardized user-specific embedding for the each user in the plurality of users based on the user-specific journey, wherein the user-specific journey is combined with non-sequential feature vectors associated with the each user in the plurality of users to generate the standardized user-specific embedding, wherein the self-supervised activity sequencing model is trained by back propagation, the back propagation comprising identifying a set of errors in regenerated data, the regenerated data generated by decoding the standardized user-specific embedding, wherein a set of weights used to generate the standardized user-specific embedding are adjusted based on the set of errors, and wherein the standardized user-specific embedding is compatible for utilization by a plurality of different prediction models configured to generate user-specific activity predictions and recommendations; receive new activity data for a first user of the plurality of users in real-time, wherein the new activity data represents a set of new activities by the first user; update the standardized user-specific embedding for the first user in real-time, wherein the updated standardized user-specific embedding represents human activities associated with the first user, including the set of new activities described in the new activity data; and output the updated standardized user-specific embedding to the plurality of different prediction models, wherein the updated standardized user-specific embedding serves as a common base layer for the plurality of different prediction models without customization for each of the plurality of different prediction models. 2 . The system of claim 1 , wherein the different types of unlabeled activity data comprise visit logs, support requests, subscription data, transaction data, product return data, and web activity data. 3 . The system of claim 2 , wherein the support requests comprise phone calls, emails, chat support lines, and online customer support lines. 4 . The system of claim 1 , wherein the plurality of data tables comprise two or more of: a transaction table, a support table, a returns table, a web activity table, a loyalty table, and a subscription table. 5 . The system of claim 1 , wherein the plurality of data tables are implemented as structural query language (SQL) tables or relational tables. 6 . The system of claim 1 , wherein the instructions are further operative to: generate sequential feature vectors from the user-specific journey; and combine the sequential feature vectors with the non-sequential feature vectors to generate the standardized user-specific embedding. 7 . The system of claim 1 , wherein the non-sequential feature vectors represent non-sequential user-specific data, the non-sequential user-specific data comprising demographic data and user profile data. 8 . The system of claim 1 , wherein the set of errors comprises mistakes or mismatches between the unlabeled activity data and the regenerated data, and wherein the set of weights are adjusted iteratively until the regenerated data is generated with a certain degree of accuracy. 9 . A method for generating embeddings representing sequential human activity using a self-supervised activity sequencing model, the method comprising: obtaining different types of unlabeled activity data from a plurality of data tables stored across different platforms, the different types of unlabeled activity data associated with a plurality of users and a set of time indicators; for each user in the plurality of users, generating a user-specific journey based on the different types of unlabeled activity data in the plurality of data tables, wherein the user-specific journey comprises a set of sequenced events taken from the different types of unlabeled activity data in a chronological order in accordance with the set of time indicators; generating, by the self-supervised activity sequencing model, a standardized user-specific embedding for the each user in the plurality of users based on the user-specific journey, wherein the user-specific journey is combined with non-sequential feature vectors associated with the each user in the plurality of users to generate the standardized user-specific embedding, wherein the self-supervised activity sequencing model is trained by back propagation, the back propagation comprising identifying a set of errors in regenerated data, the regenerated data generated by decoding the standardized user-specific embedding, wherein a set of weights used to generate the standardized user-specific embedding are adjusted based on the set of errors, and wherein the standardized user-specific embedding is compatible for utilization by a plurality of different prediction models configured to generate user-specific activity predictions and recommendations; receiving new activity data for a first user of the plurality of users in real-time, wherein the new activity data represents a set of new activities by the first user; updating the standardized user-specific embedding for the first user in real-time, wherein the updated standardized user-specific embedding represents human activities associated with the first user, including the set of new activities described in the new activity data; and outputting the updated standardized user-specific embedding to the plurality of different prediction models, wherein the updated standardized user-specific embedding serves as a common base layer for the plurality of different prediction models without customization for each of the plurality of different prediction models. 10 . The method of claim 9 , wherein the different types of unlabeled activity data comprise visit logs, support requests, subscription data, transaction data, product return data, and web activity data. 11 . The method of claim 10 , wherein the support requests comprise phone calls, emails, chat support lines, and online customer support lines. 12 . The method of claim 9 , wherein the plurality of data tables comprises two or more of: a transaction table, a support table, a returns table, a web activity table, a loyalty table, and a subscription table. 13 . The method of claim 9 , wherein the plurality of data tables are implemented as structural query language (SQL) tables or relational tables. 14 . The method of claim 9 , further comprising: generating sequential feature vectors from the user-specific journey; and combining the sequential feature vectors with the non-sequential feature vectors to generate the standardized user-specific embedding. 15 . The method of claim 9 , wherein the non-sequential feature vectors represent non-sequential user-specific data, the non-sequential user-specific data

Assignees

Inventors

Classifications

  • Backpropagation, e.g. using gradient descent · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • Machine learning · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12591903B2 cover?
The disclosure herein describes a system for generating embeddings representing sequential human activity by self-supervised, deep learning models capable of being utilized by a variety of machine learning prediction models to create predictions and recommendations. An encoder-decoder is provided to create user-specific journeys, including sequenced events, based on human activity data from a p…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06Q30/0202. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 31 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).