Leveraging class information to initialize a neural network langauge model

US2018060730A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018060730-A1
Application numberUS-201615249872-A
CountryUS
Kind codeA1
Filing dateAug 29, 2016
Priority dateAug 29, 2016
Publication dateMar 1, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems for language processing includes initializing a word embedding matrix based on pre-determined word classes, such that matrix entries associated with a class of which a word is a member are initialized to a non-zero value and other entries are initialized to zero. A neural network is trained based on the initialized word embedding matrix to generate a neural network language model. A language processing task is performed using the neural network language model.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for language processing, comprising: initializing a word embedding matrix based on pre-determined word classes, such that matrix entries associated with a class of which a word is a member are initialized to a non-zero value and other entries are initialized to zero; training a neural network based on the initialized word embedding matrix to generate a neural network language model; and performing a language processing task using the neural network language model. 2 . The method of claim 1 , further comprising determining a first set of classes using a first classification method to use as the pre-determined word classes. 3 . The method of claim 2 , wherein determining the first set of classes comprises performing Brown clustering on the training corpus. 4 . The method of claim 2 , wherein determining the first set of classes comprises providing a named entity list. 5 . The method of claim 2 , further comprising determining a second set of classes using a second classification method that is different from the first classification method, said first and second sets of classes being used together as the pre-determined word classes. 6 . The method of claim 5 , wherein initializing the word embedding matrix comprises initializing one entry for each word for the first set of classes and one entry for each word for the second set of classes to the non-zero value. 7 . The method of claim 1 , further comprising reducing a dimensionality of the word embedding matrix if the dimensionality is greater than a maximum size of the neural network. 8 . The method of claim 7 , wherein reducing the dimensionality of the word embedding matrix is performed using principal component analysis. 9 . The method of claim 1 , wherein initializing the word embedding matrix further comprises randomly initializing entries that are not associated with the pre-determined word classes. 10 . A non-transitory computer readable storage medium comprising a computer readable program for language processing, wherein the computer readable program when executed on a computer causes the computer to perform the steps of claim 1 . 11 . A method for language processing, comprising: determining a first set of classes using a first classification method to use as predetermined word classes; initializing a word embedding matrix based on the pre-determined word classes, such that matrix entries associated with a class of which a word is a member are initialized to a non-zero value, such that matrix entries associated with word classes of which the word is not a member are initialized to zero, and such that matrix entries not associated with a word class are randomly initialized; reducing a dimensionality of the word embedding matrix if the dimensionality is greater than a maximum size of a neural network; training the neural network based on the initialized word embedding matrix to generate a neural network language model; and performing a language processing task using the neural network language model. 12 . A system for language processing, comprising: an initializing module configured to initialize a word embedding matrix based on pre-determined word classes, such that matrix entries associated with a class of which a word is a member are initialized to a non-zero value and other entries are initialized to zero; a training module configured to train a neural network based on the initialized word embedding matrix to generate a neural network language model; and a language processing module configured to perform a language processing task using the neural network language model. 13 . The system of claim 12 , further comprising a class module configured to determine a first set of classes using a first classification method to use as the pre-determined word classes. 14 . The system of claim 13 , wherein the class module is further configured to perform Brown clustering on the training corpus. 15 . The system of claim 13 , wherein the class module is further configured to provide a named entity list. 16 . The system of claim 13 , wherein the class module is further configured to determine a second set of classes using a second classification method that is different from the first classification method, said first and second sets of classes being used together as the pre-determined word classes. 17 . The system of claim 16 , wherein the initializing module is further configured to initialize one entry for each word for the first set of classes and one entry for each word for the second set of classes to the non-zero value. 18 . The system of claim 12 , wherein the initializing module is further configured to reduce a dimensionality of the word embedding matrix if the dimensionality is greater than a maximum size of the neural network. 19 . The system of claim 18 , wherein the initializing module is further configured to reduce the dimensionality of the word embedding matrix using principal component analysis. 20 . The system of claim 12 , wherein the initializing module is further configured to randomly initialize entries that are not associated with the pre-determined word classes.

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Probabilistic or stochastic networks · CPC title

  • using artificial neural networks · CPC title

  • Probabilistic grammars, e.g. word n-grams · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018060730A1 cover?
Methods and systems for language processing includes initializing a word embedding matrix based on pre-determined word classes, such that matrix entries associated with a class of which a word is a member are initialized to a non-zero value and other entries are initialized to zero. A neural network is trained based on the initialized word embedding matrix to generate a neural network language …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Mar 01 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).