Alternative training distribution based on density modification
US-2017011307-A1 · Jan 12, 2017 · US
US9858534B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9858534-B2 |
| Application number | US-201414451899-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 5, 2014 |
| Priority date | Nov 22, 2013 |
| Publication date | Jan 2, 2018 |
| Grant date | Jan 2, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Technologies are generally described for systems, devices and methods relating to a machine learning environment. In some examples, a processor may identify a training distribution of a training data. The processor may identify information about a test distribution of a test data. The processor may identify a coordinate of the training data and the test data. The processor may determine, for the coordinate, differences between the test distribution and the training distribution. The processor may determine weights based on the differences. The weights may be adapted to cause the training distribution to conform to the test distribution when the weights are applied to the training distribution.
Opening claim text (preview).
What is claimed is: 1. A method to improve predictive capability of a machine learning system, the method comprising: receiving, by a computer, training data that includes one or more points; identifying, by the computer, a training distribution of the one or more points of the training data; receiving, by the computer, test data that includes one or more points; identifying, by the computer, information about a test distribution of the one or more points of the test data; identifying, by the computer, one or more coordinates for the one or more points of the training data and the one or more points of the test data; determining, for each identified coordinate and by the computer differences between the one or more points of the test data and the one or more points of the training data; determining, by the computer, weights for the one or more points of the training data based on the determined differences, wherein the weights are adapted to cause the training distribution to conform to the test distribution in response to the weights being applied to the training distribution; generating, by the computer, a weighted function based on the determined weights and the training data; and generating, by the computer, a first output based on an application of an input to the generated weighted function, wherein the first output is different than a second output generated by an application of the input to a non-weighted function, wherein the first output and the second output respectively correspond to a first predictive capability and a second predictive capability of the machine learning system, and wherein the first predictive capability is greater than the second predictive capability. 2. The method of claim 1 , wherein generating the first output includes generating at least one of a recommendation, a classification, a prediction, and a determination. 3. The method of claim 1 , wherein: the training data is generated at a first instance in time; and the test data is generated at a second instance in time, wherein the second instance in time is later than the first instance in time. 4. The method of claim 1 , wherein determining the weights comprises: iteratively determining, for each identified coordinate, differences between the one or more points of the training data and the one or more points of the test data, wherein the weights for the one or more points of the training data are determined based on a convergent value of the differences between the one or more points of the training data and the one or more points of the test data. 5. The method of claim 1 , wherein identifying the one or more coordinates includes identifying a range of values in a coordinate space, and wherein the method further comprises: dividing the range of values in the coordinate space into bins, wherein determining the weights is based on a number of the one or more points in the training data and a number of the bins. 6. The method of claim 1 , wherein the one or more points of the test data and the one or more points of the training data include at least one first point and at least one second point, respectively, wherein the one or more coordinates include a range of values in a coordinate space, and wherein the method further comprises: dividing the range of values in the coordinate space into bins, wherein determining the weights is based on a number of the at least one first point and a number of the at least one second point which are located in the bins. 7. A method to improve predictive capability of a machine learning system, the method comprising, by a computer: identifying first points of training data; identifying information about test data, wherein the test data includes second points; identifying a coordinate of the first points and the second points, wherein the coordinate includes a range of values in a coordinate space; dividing the range of values in the coordinate space into bins, wherein the bins define subsets of the range of values; determining a first frequency, wherein the first frequency relates to a first percentage of the first points being located within a particular bin; determining a second frequency, wherein the second frequency relates to a second percentage of the second points being located within the particular bin; comparing the first frequency and the second frequency; determining a weight for the training data, based at least, in part, on the comparison of the first frequency and the second frequency, and on a number of the bins; generating a weighted function based on the determined weight and the training data; and generating a first output based on an application of an input to the generated weighted function, wherein the first output is different than a second output generated by an application of the input to a non-weighted function, wherein the first output and the second output respectively correspond to a first predictive capability and a second predictive capability of the machine learning system, and wherein the first predictive capability is greater than the second predictive capability. 8. The method of claim 7 , wherein: the first points follow a training distribution, the second points follow a test distribution, and the weight is effective to conform a particular point in the training distribution to a particular point in the test distribution. 9. The method of claim 7 , wherein comparing the first frequency and the second frequency includes: identifying a first comparison value; comparing frequency values of the test data and the training data in the bins to produce a difference value; updating the first comparison value to produce a second comparison value based on the difference value; and iteratively repeating the identifying the first comparison value, comparing frequency values of the test data and the training data in the bins to produce the difference value, and updating the first comparison value to produce the second comparison value based on the difference value, until the second comparison value converges to a convergent value. 10. The method of claim 9 , wherein updating the first comparison value to produce the second comparison value based on the difference value comprises: adding a fraction of the difference value to the first comparison value to produce the second comparison value. 11. The method of claim 7 , wherein determining the weight for the training data is based on a number of the first points. 12. The method of claim 7 , wherein determining the weight for the training data is based on a number of the first points and a number of the second points which are located in the bins. 13. A computing device, comprising: a first processor; a second processor; and a memory configured to be in communication with the first processor and the second processor, the memory effective to store training data and test data, wherein the training data comprises first points and the test data comprises second points, and wherein: the first processor is effective to: identify a coordinate of the first points and the second points, wherein the coordinate includes a range of values in a coordinate space; divide the range of values in the coordinate space into bins, wherein the bins define subsets of the range of values; determine a first frequency, wherein the first frequency relates to a first percentage of the first points being located within a particular bin; determine a second frequency, wherein the second frequency relates to a second percentage of the second points being located within the particular bin; compare the first frequency and the second frequency; an
Selection of the most significant subset of features · CPC title
Physics · mapped topic
Physics · mapped topic
Extracting rules from data · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.