Dynamic word embeddings

US11068658B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11068658-B2
Application numberUS-201715828884-A
CountryUS
Kind codeB2
Filing dateDec 1, 2017
Priority dateDec 7, 2016
Publication dateJul 20, 2021
Grant dateJul 20, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and articles of manufacture to perform an operation comprising deriving, based on a corpus of electronic text, a machine learning data model that associates words with corresponding usage contexts over a window of time, according to a diffusion process, wherein the machine learning data model comprises a plurality of skip-gram models, wherein each skip-gram model comprises a word embedding vector and a context embedding vector for a respective time step associated with the respective skip-gram model, generating a smoothed model by applying a variational inference operation over the machine learning data model, and identifying, based on the smoothed model and the corpus of electronic text, a change in a semantic use of a word over at least a portion of the window of time.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: deriving, based on a corpus of electronic text, a machine learning data model that associates words with corresponding usage contexts over a window of time, wherein the machine learning data model comprises a plurality of skip-gram models, wherein each skip-gram model comprises a word embedding vector and a context embedding vector for a respective time step associated with the respective skip-gram model, wherein deriving the machine learning data model comprises applying a diffusion process to the word embedding vectors and the context embedding vectors of the plurality of skip-gram models such that the word embedding vectors and the context embedding vectors are aligned to a common frame of reference of time; generating a smoothed model by applying a variational inference operation; and identifying, based on the smoothed model and the corpus of electronic text, a change in a semantic use of a word over at least a portion of the window of time. 2. The method of claim 1 , further comprising: prior to identifying the change in the semantic use of the word, receiving a request to monitor the semantic use of the word; monitoring the semantic use of the word based on the smoothed model and new text added to the corpus of electronic text; identifying the change in the semantic use of the word based on at least one of: (i) a distance between two of the word embedding vectors, or (ii) a distance between two of the context embedding vectors; generating an indication of the change in the semantic use of the word; and outputting the indication. 3. The method of claim 1 , wherein the word embedding vectors comprise word embeddings for each word in the corpus of electronic text, wherein the context embedding vectors comprise context embeddings for each word in the corpus of electronic text, wherein the method further comprises segmenting each text element in the corpus of electronic text into a respective time step of a plurality of time steps based on a respective timestamp of each text element, wherein the plurality of time steps comprises each time step associated with the plurality of skip-gram models. 4. The method of claim 1 , wherein the corpus of electronic text includes a plurality of pairs of words, wherein deriving the machine learning data model further comprises: generating a positive count matrix, wherein the positive count matrix specifies, for each of the plurality of pairs of words in the corpus of electronic text, a respective count of observed occurrences of each respective pair of words within a predefined window of words in a text element of each time step. 5. The method of claim 1 , wherein deriving the machine learning data model further comprises: generating a negative count matrix based on a plurality of rejected pairs of words in a second training corpus of electronic text. 6. The method of claim 1 , wherein the variational inference operation comprises a filtering algorithm comprising: initializing a plurality of variational parameters for the word embedding vectors; initializing a plurality of variational parameters for the context embedding vectors; and optimizing the plurality of variational parameters for the word embedding vectors and the plurality of variational parameters for the context embedding vectors using stochastic gradient descent. 7. The method of claim 1 , wherein the variational inference operation comprises a smoothing algorithm comprising: initializing a plurality of variational parameters for the word embedding vectors; initializing a plurality of variational parameters for the context embedding vectors; optimizing the plurality of variational parameters for the word embedding vectors using a first bidiagonal matrix; and optimizing the plurality of variational parameters for the context embedding vectors using a second bidiagonal matrix. 8. A non-transitory computer-readable storage medium having computer-readable program code embodied therewith, the computer-readable program code executable to perform an operation comprising: deriving, based on a corpus of electronic text, a machine learning data model that associates words with corresponding usage contexts over a window of time, wherein the machine learning data model comprises a plurality of skip-gram models, wherein each skip-gram model comprises a word embedding vector and a context embedding vector for a respective time step associated with the respective skip-gram model, wherein deriving the machine learning data model comprises applying a diffusion process to the word embedding vectors and the context embedding vectors of the plurality of skip-gram models such that the word embedding vectors and the context embedding vectors are aligned to a common frame of reference of time; generating a smoothed model by applying a variational inference operation over the machine learning data model; and identifying, based on the smoothed model and the corpus of electronic text, a change in a semantic use of a word over at least a portion of the window of time. 9. The computer-readable storage medium of claim 8 , the operation further comprising: prior to identifying the change in the semantic use of the word, receiving a request to monitor the semantic use of the word; monitoring the semantic use of the word based on the smoothed model and new text added to the corpus of electronic text; identifying the change in the semantic use of the word based on at least one of: (i) a distance between two word embedding vectors of the plurality of skip-gram models, or (ii) a distance between two context embedding vectors of the plurality of skip-gram models; generating an indication of the change in the semantic use of the word; and outputting the indication. 10. The computer-readable storage medium of claim 8 , wherein the word embedding vectors of the plurality of skip-gram models comprise word embeddings for each word in the corpus of electronic text, wherein the context embedding vectors of the plurality of skip-gram models comprise context embeddings for each word in the corpus of electronic text, wherein the operation further comprises segmenting each text element in the corpus of electronic text into a respective time step of a plurality of time steps based on a respective timestamp of each text element, wherein the plurality of time steps comprises each time step associated with the plurality of skip-gram models. 11. The computer-readable storage medium of claim 8 , wherein the corpus of electronic text includes a plurality of pairs of words, wherein deriving the machine learning data model further comprises: generating a positive count matrix, wherein the positive count matrix specifies, for each of the plurality of pairs of words in the corpus of electronic text, a respective count of observed occurrences of each respective pair of words within a predefined window of words in a text element of each time step. 12. The computer-readable storage medium of claim 8 , wherein deriving the machine learning data model further comprises: generating a negative count matrix based on a plurality of rejected pairs of words in a second training corpus of text. 13. The computer-readable storage medium of claim 8 , wherein the variational inference operation comprises a filtering algorithm comprising: initializing a plurality of variational parameters for the word embedding vectors of the plurality of skip-gram models; initializing a plurality of variational parameters for the context embedding vectors of the plurality of skip-gram models; and optimizing the plurality of variational parameters for the word embedding vect

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • G06F40/284Primary

    Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Feedforward networks · CPC title

  • Inference or reasoning models · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11068658B2 cover?
Systems, methods, and articles of manufacture to perform an operation comprising deriving, based on a corpus of electronic text, a machine learning data model that associates words with corresponding usage contexts over a window of time, according to a diffusion process, wherein the machine learning data model comprises a plurality of skip-gram models, wherein each skip-gram model comprises a w…
Who is the assignee on this patent?
Disney Entpr Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/284. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 20 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).