Self-supervised system generating embeddings representing sequenced activity

US12062059B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12062059-B2
Application numberUS-202016930279-A
CountryUS
Kind codeB2
Filing dateJul 15, 2020
Priority dateMay 25, 2020
Publication dateAug 13, 2024
Grant dateAug 13, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosure herein describes a system for generating embeddings representing sequential human activity by self-supervised, deep learning models capable of being utilized by a variety of machine learning prediction models to create predictions and recommendations. An encoder-decoder is provided to create user-specific journeys, including sequenced events, based on human activity data from a plurality of tables, a customer data platform, or other sources. Events are represented by sequential feature vectors. A user-specific embedding representing user activities in relationship to activities of one or more other users is created for each user in a plurality of users. The embeddings are updated in real-time as new activity data is received. The embeddings can be fine-tuned using labeled data to customize the embeddings for a specific predictive model. The embeddings are utilized by predictive models to create product recommendations and predictions, such as customer churn, next steps in a customer journey, etc.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for generating embeddings representing sequential human activity, the system comprising: a plurality of data sources associated with at least one data storage device storing unlabeled activity data associated with a plurality of users and a set of time indicators, the unlabeled activity data describing human activity-related events having order over time, and unlabeled non-sequential data associated with the plurality of users, the unlabeled non-sequential data representing non-sequential user-specific data; a computer-readable medium storing instructions that are operative upon execution by a processor to: create a plurality of non-sequential feature vectors based on the unlabeled non-sequential data; create, by a sequencing component associated with a neural network, a plurality of user-specific journeys based on the unlabeled activity data, a user-specific journey in the plurality of user-specific journeys comprising a plurality of sequential feature vectors corresponding to a set of events associated with selected user placed into a sequence in accordance with the set of time indicators; combine, by an embedding component, the plurality of sequential feature vectors in the plurality of user-specific journeys and the plurality of non-sequential feature vectors; and generate, by the embedding component, a plurality of embeddings based on the combination of the plurality of sequential feature vectors in the plurality of user-specific journeys and the plurality of non-sequential feature vectors and a set of weights, an embedding in the plurality of embeddings comprising a set of fixed length vectors representing sequential human activity of the selected user, wherein the plurality of embeddings are suitable for utilization by a plurality of prediction models configured to generate user-specific activity predictions and recommendations; generate, by a decoder component, a plurality of regenerated sequential feature vectors and a plurality of regenerated non-sequential feature vectors based on the plurality of embeddings; compare, by a comparison component, the plurality of regenerated sequential feature vectors to the plurality of sequential feature vectors and the plurality of regenerated non-sequential feature vectors to the plurality of non-sequential feature vectors to identify a set of errors, wherein the set of errors are used to update the set of weights; and generate explainability data, the explainability data indicating contribution of a dimension value of the embedding to the user-specific predictions generated by the plurality of prediction models. 2. The system of claim 1 , further comprises: an encoder-decoder framework of the neural network providing a self-supervised activity sequencing model configured to generate the plurality of embeddings based on the unlabeled activity data obtained from the plurality of data sources, wherein the sequencing component and the embedding component are part of the self-supervised activity sequencing model. 3. The system of claim 2 , wherein the instructions are further operative to: analyze, by an encoder component, unlabeled input data for training the self-supervised activity sequencing model, wherein the unlabeled input data comprises historical human activity data; generate, by the encoder component, an embedding representing sequenced human activity; and perform, by the comparison component, back propagation. 4. The system of claim 1 , wherein the selected user is a first user and wherein instructions are further operative to: receive, from a customer data platform, updated activity data representing a set of new activities by a second user in the plurality of users in real-time; generate, by the sequencing component, an updated user-specific journey for the second user; and create, by the embedding component, an updated user-specific embedding for the second user, wherein the updated user-specific embedding comprises a set of fixed length embeddings representing human activities associated with the second user, including the set of new activities described in the updated activity data. 5. The system of claim 1 , wherein the explainability data is presented as a separate output from the plurality of embeddings. 6. The system of claim 1 , wherein the embedding component comprises at least one long short-term memory (LSTM) artificial recurrent neural network architecture for generating the plurality of embeddings. 7. The system of claim 1 , wherein the instructions are further operative to: fine-tune the embedding component using labeled input data to generate embeddings for a predictive model selected from a set of machine learning (ML) prediction models. 8. A method of generating embeddings representing sequential human activity, the method comprising: creating, by a sequencing component, a plurality of user-specific journeys based on unlabeled activity data obtained from a plurality of data sources, a user-specific journey in the plurality of user-specific journeys comprising a plurality of sequential feature vectors corresponding to a set of events associated with selected user placed into a sequence in accordance with a set of time indicators; creating a plurality of non-sequential feature vectors based on unlabeled non-sequential data obtained from the plurality of data sources, the unlabeled non-sequential data representing non-sequential user-specific data; combining, by an embedding component, the plurality of sequential feature vectors in the plurality of user-specific journeys and the plurality of non-sequential feature vectors; generating, by the embedding component, a plurality of embeddings based on the combination of the plurality of sequential feature vectors in the plurality of user-specific journeys and the plurality of non-sequential feature vectors and a set of weights, an embedding in the plurality of embeddings comprising a set of fixed length vectors representing sequential human activity of the selected user; outputting the plurality of embeddings to a set of machine learning prediction models for generating user-specific activity predictions and recommendations based on the unlabeled activity data and the unlabeled non-sequential data associated with a plurality of users; generating, by a decoder component, a plurality of regenerated sequential feature vectors and a plurality of regenerated non-sequential feature vectors based on the plurality of embeddings; comparing, by a comparison component, the plurality of regenerated sequential feature vectors to the plurality of sequential feature vectors and the plurality of regenerated non-sequential feature vectors to the plurality of non-sequential feature vectors to identify a set of errors, wherein the set of errors are used to update the set of weights; and generating explainability data, the explainability data indicating contribution of a dimension value of the embedding to the user-specific predictions generated by the plurality of prediction models. 9. The method of claim 8 , further comprising: analyzing, by an encoder component, input data for training a self-supervised activity sequencing model, wherein the input data comprises unlabeled historical human activity data; generating, by the encoder component, an embedding representing sequenced human activity; and performing, by the comparison component, back propagation. 10. The method of claim 8 , further comprising: receiving, from a customer data platform, updated activity data representing a set of new activities by a second user in the plurality of users in real-time; generating, by the sequencing component, an updated user-specific journey for the second user; and

Assignees

Inventors

Classifications

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12062059B2 cover?
The disclosure herein describes a system for generating embeddings representing sequential human activity by self-supervised, deep learning models capable of being utilized by a variety of machine learning prediction models to create predictions and recommendations. An encoder-decoder is provided to create user-specific journeys, including sequenced events, based on human activity data from a p…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06Q30/0202. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 13 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).