Feature-augmented neural networks and applications of same

US9519858B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9519858-B2
Application numberUS-201313763701-A
CountryUS
Kind codeB2
Filing dateFeb 10, 2013
Priority dateFeb 10, 2013
Publication dateDec 13, 2016
Grant dateDec 13, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system is described herein which uses a neural network having an input layer that accepts an input vector and a feature vector. The input vector represents at least part of input information, such as, but not limited to, a word or phrase in a sequence of input words. The feature vector provides supplemental information pertaining to the input information. The neural network produces an output vector based on the input vector and the feature vector. In one implementation, the neural network is a recurrent neural network. Also described herein are various applications of the system, including a machine translation application.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed using one or more processing devices, the method comprising: receiving a word input vector at an input layer of a neural network, the word input vector representing an individual word from an input sequence of words; receiving a topic feature vector at the input layer of the neural network, the topic feature vector being separate from the word input vector and representing topics expressed in the input sequence of words; using the neural network to generate an output vector at an output layer of the neural network based at least on the word input vector and the topic feature vector, wherein using the neural network includes, by a hidden layer of the neural network: modifying the word input vector using a first learned matrix; and modifying the topic feature vector using a second learned matrix that is separate from the first learned matrix, wherein the output vector represents a word probability given the word input vector and the topic feature vector; and performing a natural language processing operation based at least on the word probability represented by the output vector. 2. The method of claim 1 , wherein using the neural network includes: by the hidden layer of the neural network, modifying a time-delayed hidden-state vector with a third learned matrix, wherein the time-delayed hidden-state vector represents an output of the hidden layer in a prior time instance, wherein the word input vector, the topic feature vector, and the time-delayed hidden-state vector are separate vectors, and wherein the first learned matrix, the second learned matrix, and the third learned matrix are separate matrices. 3. The method of claim 2 , wherein using the neural network includes, by the output layer of the neural network: modifying the output of the hidden layer with a fourth learned matrix; and modifying the topic feature vector with a fifth learned matrix, wherein the first learned matrix, the second learned matrix, the third learned matrix, the fourth learned matrix, and the fifth learned matrix are separate matrices. 4. The method of claim 2 , wherein using the neural network includes, by the hidden layer: performing a first multiplication operation of the word input vector by the first learned matrix to generate a first multiplication output; performing a second multiplication operation of the topic feature vector by the second learned matrix to generate a second multiplication output; performing a third multiplication operation of the time-delayed hidden-state vector by the third learned matrix to generate a third multiplication output; and summing the first multiplication output, the second multiplication output, and the third multiplication output to generate the output of the hidden layer. 5. The method of claim 1 , further comprising: generating the topic feature vector using a Latent Dirichlet Allocation (LDA) technique; and as subsequent words from the input sequence are processed using the neural network, incrementally generating next topic feature vectors based at least on previous feature vectors. 6. The method of claim 5 , wherein said incrementally generating comprises applying a decay factor to previous topic feature vectors for previous words that have already been processed by the neural network. 7. The method of claim 1 , wherein the input sequence of words is part of an input document. 8. A system comprising: at least one processing device; and at least one computer readable medium storing instructions which, when executed by the at least one processing device, cause the at least one processing device to: receive a word input vector at an input layer of a neural network, the word input vector representing an individual word from an input sequence of words; receive a topic feature vector at the input layer of the neural network, the topic feature vector being separate from the word input vector and representing topics expressed in the input sequence of words; use the neural network to generate an output vector at an output layer of the neural network based at least on the word input vector and the topic feature vector, wherein using the neural network includes, by a hidden layer of the neural network: modifying the word input vector using a first learned matrix; and modifying the topic feature vector using a second learned matrix that is separate from the first learned matrix, wherein the output vector represents a word probability given the word input vector and the topic feature vector; and perform a natural language processing operation based at least on the word probability represented by the output vector. 9. The system of claim 8 , wherein the instructions, when executed by the at least one processing device, cause the at least one processing device to: by the hidden layer of the neural network, modify a time-delayed hidden-state vector with a third learned matrix, wherein the time-delayed hidden-state vector represents an output of the hidden layer in a prior time instance, wherein the word input vector, the topic feature vector, and the time-delayed hidden-state vector are separate vectors, and wherein the first learned matrix, the second learned matrix, and the third learned matrix are separate matrices. 10. The system of claim 9 , wherein the instructions, when executed by the at least one processing device, cause the at least one processing device to: by the output layer of the neural network: modify the output of the hidden layer with a fourth learned matrix; and modify the topic feature vector with a fifth learned matrix, wherein the first learned matrix, the second learned matrix, the third learned matrix, the fourth learned matrix, and the fifth learned matrix are separate matrices. 11. The system of claim 9 , wherein the instructions, when executed by the at least one processing device, cause the at least one processing device to: by the hidden layer of the neural network: perform a first multiplication operation of the word input vector by the first learned matrix to generate a first multiplication output; perform a second multiplication operation of the topic feature vector by the second learned matrix to generate a second multiplication output; perform a third multiplication operation of the time-delayed hidden-state vector by the third learned matrix to generate a third multiplication output; and sum the first multiplication output, the second multiplication output, and the third multiplication output to generate the output of the hidden layer. 12. The system of claim 8 , wherein the instructions, when executed by the at least one processing device, cause the at least one processing device to: generate the topic feature vector using a Latent Dirichlet Allocation (LDA) technique; and as subsequent words from the input sequence are processed using the neural network, incrementally generate next topic feature vectors based at least on previous feature vectors. 13. The system of claim 12 , wherein the instructions, when executed by the at least one processing device, cause the at least one processing device to: apply a decay factor to previous topic feature vectors for previous words that have already been processed by the neural network. 14. The system of claim 8 , wherein the input sequence of words is part of an input document. 15. At least one computer readable storage medium storing instructions which, when executed by at least one processing device, cause the at least one processing device to perform acts comprising: receiving a word input vector at an input layer of a n

Assignees

Inventors

Classifications

  • G06N3/08Primary

    Learning methods · CPC title

  • Statistical methods, e.g. probability models · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9519858B2 cover?
A system is described herein which uses a neural network having an input layer that accepts an input vector and a feature vector. The input vector represents at least part of input information, such as, but not limited to, a word or phrase in a sequence of input words. The feature vector provides supplemental information pertaining to the input information. The neural network produces an output…
Who is the assignee on this patent?
Microsoft Corp, Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 13 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).