Method and apparatus for extending neural network
US-2016155049-A1 · Jun 2, 2016 · US
US10699191B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10699191-B2 |
| Application number | US-201615349901-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 11, 2016 |
| Priority date | Nov 12, 2015 |
| Publication date | Jun 30, 2020 |
| Grant date | Jun 30, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
This specification describes methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a larger neural network from a smaller neural network. One of the described methods includes obtaining data specifying an original neural network and generating a larger neural network from the original neural network The larger neural network has a larger neural network structure than the original neural network structure. The values of the parameters of the original neural network units and the additional neural network units are initialized so that the larger neural network generates the same outputs from the same inputs as the original neural network and the larger neural network is trained to determine trained values of the parameters of the original neural network units and the additional neural network units from the initialized values.
Opening claim text (preview).
What is claimed is: 1. A method of generating a larger neural network from a smaller neural network, the method comprising: obtaining data specifying an original neural network, the original neural network being configured to generate neural network outputs from neural network inputs, the original neural network having an original neural network structure comprising a plurality of original neural network units, each original neural network unit having respective parameters, and each of the parameters of each of the original neural network units having a respective original value; generating a larger neural network from the original neural network, the larger neural network having a larger neural network structure comprising: (i) the plurality of original neural network units, and (ii) a plurality of additional neural network units not in the original neural network structure, each additional neural network unit having respective parameters; initializing values of the parameters of the original neural network units and the additional neural network units by setting the values of the parameters of the original neural network units and the additional neural network units to values that result in the larger neural network generating, for any particular neural network input, the same neural network output for the particular neural network input as would be generated by the original neural network by processing the particular neural network input in accordance with the original parameter values for the original neural network units; and training the larger neural network to determine trained values of the parameters of the original neural network units and the additional neural network units from the initialized values. 2. The method of claim 1 , further comprising: training the original neural network to determine the original values of the parameters of the original neural network. 3. The method of claim 2 , wherein the original neural network structure comprises a first original neural network layer having a first number of original units, and wherein generating the larger neural network comprises: adding a plurality of additional neural network units to the first original neural network layer to generate a larger neural network layer. 4. The method of claim 3 , wherein initializing values of the parameters of the original neural network units and the additional neural network units so that the larger neural network generates the same neural network outputs from the same neural network inputs as the original neural network comprises: initializing the values of the parameters of the original neural network units in the larger neural network layer to the respective original values for the parameters; and for each additional neural network unit in the larger neural network layer: selecting an original neural network unit in the original neural network layer, and initializing the values of the parameters of the additional neural network unit to be the same as the respective original values for the selected original neural network unit. 5. The method of claim 4 , wherein selecting an original neural network unit in the larger neural network layer comprises: randomly selecting an original neural network unit from the original neural network units in the original neural network layer. 6. The method of claim 4 , wherein: in the original neural network structure, a second original neural network layer is configured to receive as input outputs generated by the first original neural network layer; in the larger neural network structure, the second original neural network layer is configured to receive as input outputs generated by the larger neural network layer; and initializing values of the parameters of the original neural network units and the additional neural network units so that the larger neural network generates the same neural network outputs from the same neural network inputs as the original neural network comprises: initializing the values of the parameters of the original neural network units in the second original neural network layer so that, for a given neural network input, the second neural network layer generates the same output in both the original neural network structure and the larger neural network structure. 7. The method of claim 6 , wherein the original neural network structure comprises a third original neural network layer configured to receive a third original layer input and generate a third original layer output from the third layer input, and wherein generating the larger neural network comprises: replacing the third original neural network layer with a first additional neural network layer having additional neural network units and a second additional neural network layer having additional neural network units, wherein: the first additional neural network layer is configured to receive the third original layer input and generate a first additional layer output from the third original layer input, and the second additional neural network layer is configured to receive the first additional layer output and generate a second additional layer output from the first additional layer output. 8. The method of claim 7 , wherein initializing values of the parameters of the original neural network units and the additional neural network units so that the larger neural network generates the same neural network outputs from the same neural network inputs as the original neural network comprises: initializing the values of the parameters of the additional neural network units in the first additional neural network layer and in the second additional neural network layer so that, for the same neural network input, the second additional layer output is the same as the third original layer output. 9. The method of claim 7 , wherein initializing values of the parameters of the original neural network units and the additional neural network units so that the larger neural network generates the same neural network outputs from the same neural network inputs as the original neural network comprises: initializing the values of the parameters of the additional neural network units in the first additional neural network layer using the respective original values for the parameters of the original neural network units in the third original neural network layer. 10. A system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: obtaining data specifying an original neural network, the original neural network being configured to generate neural network outputs from neural network inputs, the original neural network having an original neural network structure comprising a plurality of original neural network units, each original neural network unit having respective parameters, and each of the parameters of each of the original neural network units having a respective original value; generating a larger neural network from the original neural network, the larger neural network having a larger neural network structure comprising: (i) the plurality of original neural network units, and (ii) a plurality of additional neural network units not in the original neural network structure, each additional neural network unit having respective parameters; initializing values of the parameters of the original neural network units and the additional neural network units by setting the values of the parameters of the original neural network units and the additional neural network units to values that result in the larger neural network generat
Related publications grouped by family.
Answers are generated from the same data shown on this page.