What technology area does this patent fall under?

Primary CPC classification G06N3/084. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Mar 01 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Leveraging class information to initialize a neural network langauge model

US2018060730A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2018060730-A1
Application number	US-201615249872-A
Country	US
Kind code	A1
Filing date	Aug 29, 2016
Priority date	Aug 29, 2016
Publication date	Mar 1, 2018
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems for language processing includes initializing a word embedding matrix based on pre-determined word classes, such that matrix entries associated with a class of which a word is a member are initialized to a non-zero value and other entries are initialized to zero. A neural network is trained based on the initialized word embedding matrix to generate a neural network language model. A language processing task is performed using the neural network language model.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for language processing, comprising: initializing a word embedding matrix based on pre-determined word classes, such that matrix entries associated with a class of which a word is a member are initialized to a non-zero value and other entries are initialized to zero; training a neural network based on the initialized word embedding matrix to generate a neural network language model; and performing a language processing task using the neural network language model. 2 . The method of claim 1 , further comprising determining a first set of classes using a first classification method to use as the pre-determined word classes. 3 . The method of claim 2 , wherein determining the first set of classes comprises performing Brown clustering on the training corpus. 4 . The method of claim 2 , wherein determining the first set of classes comprises providing a named entity list. 5 . The method of claim 2 , further comprising determining a second set of classes using a second classification method that is different from the first classification method, said first and second sets of classes being used together as the pre-determined word classes. 6 . The method of claim 5 , wherein initializing the word embedding matrix comprises initializing one entry for each word for the first set of classes and one entry for each word for the second set of classes to the non-zero value. 7 . The method of claim 1 , further comprising reducing a dimensionality of the word embedding matrix if the dimensionality is greater than a maximum size of the neural network. 8 . The method of claim 7 , wherein reducing the dimensionality of the word embedding matrix is performed using principal component analysis. 9 . The method of claim 1 , wherein initializing the word embedding matrix further comprises randomly initializing entries that are not associated with the pre-determined word classes. 10 . A non-transitory computer readable storage medium comprising a computer readable program for language processing, wherein the computer readable program when executed on a computer causes the computer to perform the steps of claim 1 . 11 . A method for language processing, comprising: determining a first set of classes using a first classification method to use as predetermined word classes; initializing a word embedding matrix based on the pre-determined word classes, such that matrix entries associated with a class of which a word is a member are initialized to a non-zero value, such that matrix entries associated with word classes of which the word is not a member are initialized to zero, and such that matrix entries not associated with a word class are randomly initialized; reducing a dimensionality of the word embedding matrix if the dimensionality is greater than a maximum size of a neural network; training the neural network based on the initialized word embedding matrix to generate a neural network language model; and performing a language processing task using the neural network language model. 12 . A system for language processing, comprising: an initializing module configured to initialize a word embedding matrix based on pre-determined word classes, such that matrix entries associated with a class of which a word is a member are initialized to a non-zero value and other entries are initialized to zero; a training module configured to train a neural network based on the initialized word embedding matrix to generate a neural network language model; and a language processing module configured to perform a language processing task using the neural network language model. 13 . The system of claim 12 , further comprising a class module configured to determine a first set of classes using a first classification method to use as the pre-determined word classes. 14 . The system of claim 13 , wherein the class module is further configured to perform Brown clustering on the training corpus. 15 . The system of claim 13 , wherein the class module is further configured to provide a named entity list. 16 . The system of claim 13 , wherein the class module is further configured to determine a second set of classes using a second classification method that is different from the first classification method, said first and second sets of classes being used together as the pre-determined word classes. 17 . The system of claim 16 , wherein the initializing module is further configured to initialize one entry for each word for the first set of classes and one entry for each word for the second set of classes to the non-zero value. 18 . The system of claim 12 , wherein the initializing module is further configured to reduce a dimensionality of the word embedding matrix if the dimensionality is greater than a maximum size of the neural network. 19 . The system of claim 18 , wherein the initializing module is further configured to reduce the dimensionality of the word embedding matrix using principal component analysis. 20 . The system of claim 12 , wherein the initializing module is further configured to randomly initialize entries that are not associated with the pre-determined word classes.

Assignees

Inventors

Kurata Gakuto

Classifications

G06N3/045
Combinations of networks · CPC title
G06N3/047
Probabilistic or stochastic networks · CPC title
G10L15/16
using artificial neural networks · CPC title
G10L15/197
Probabilistic grammars, e.g. word n-grams · CPC title
G06N3/084Primary
Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

View patent family 61242889

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018060730A1 cover?: Methods and systems for language processing includes initializing a word embedding matrix based on pre-determined word classes, such that matrix entries associated with a class of which a word is a member are initialized to a non-zero value and other entries are initialized to zero. A neural network is trained based on the initialized word embedding matrix to generate a neural network language …
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Mar 01 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).