Weight generation in machine learning

US9858534B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9858534-B2
Application numberUS-201414451899-A
CountryUS
Kind codeB2
Filing dateAug 5, 2014
Priority dateNov 22, 2013
Publication dateJan 2, 2018
Grant dateJan 2, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Technologies are generally described for systems, devices and methods relating to a machine learning environment. In some examples, a processor may identify a training distribution of a training data. The processor may identify information about a test distribution of a test data. The processor may identify a coordinate of the training data and the test data. The processor may determine, for the coordinate, differences between the test distribution and the training distribution. The processor may determine weights based on the differences. The weights may be adapted to cause the training distribution to conform to the test distribution when the weights are applied to the training distribution.

First claim

Opening claim text (preview).

What is claimed is: 1. A method to improve predictive capability of a machine learning system, the method comprising: receiving, by a computer, training data that includes one or more points; identifying, by the computer, a training distribution of the one or more points of the training data; receiving, by the computer, test data that includes one or more points; identifying, by the computer, information about a test distribution of the one or more points of the test data; identifying, by the computer, one or more coordinates for the one or more points of the training data and the one or more points of the test data; determining, for each identified coordinate and by the computer differences between the one or more points of the test data and the one or more points of the training data; determining, by the computer, weights for the one or more points of the training data based on the determined differences, wherein the weights are adapted to cause the training distribution to conform to the test distribution in response to the weights being applied to the training distribution; generating, by the computer, a weighted function based on the determined weights and the training data; and generating, by the computer, a first output based on an application of an input to the generated weighted function, wherein the first output is different than a second output generated by an application of the input to a non-weighted function, wherein the first output and the second output respectively correspond to a first predictive capability and a second predictive capability of the machine learning system, and wherein the first predictive capability is greater than the second predictive capability. 2. The method of claim 1 , wherein generating the first output includes generating at least one of a recommendation, a classification, a prediction, and a determination. 3. The method of claim 1 , wherein: the training data is generated at a first instance in time; and the test data is generated at a second instance in time, wherein the second instance in time is later than the first instance in time. 4. The method of claim 1 , wherein determining the weights comprises: iteratively determining, for each identified coordinate, differences between the one or more points of the training data and the one or more points of the test data, wherein the weights for the one or more points of the training data are determined based on a convergent value of the differences between the one or more points of the training data and the one or more points of the test data. 5. The method of claim 1 , wherein identifying the one or more coordinates includes identifying a range of values in a coordinate space, and wherein the method further comprises: dividing the range of values in the coordinate space into bins, wherein determining the weights is based on a number of the one or more points in the training data and a number of the bins. 6. The method of claim 1 , wherein the one or more points of the test data and the one or more points of the training data include at least one first point and at least one second point, respectively, wherein the one or more coordinates include a range of values in a coordinate space, and wherein the method further comprises: dividing the range of values in the coordinate space into bins, wherein determining the weights is based on a number of the at least one first point and a number of the at least one second point which are located in the bins. 7. A method to improve predictive capability of a machine learning system, the method comprising, by a computer: identifying first points of training data; identifying information about test data, wherein the test data includes second points; identifying a coordinate of the first points and the second points, wherein the coordinate includes a range of values in a coordinate space; dividing the range of values in the coordinate space into bins, wherein the bins define subsets of the range of values; determining a first frequency, wherein the first frequency relates to a first percentage of the first points being located within a particular bin; determining a second frequency, wherein the second frequency relates to a second percentage of the second points being located within the particular bin; comparing the first frequency and the second frequency; determining a weight for the training data, based at least, in part, on the comparison of the first frequency and the second frequency, and on a number of the bins; generating a weighted function based on the determined weight and the training data; and generating a first output based on an application of an input to the generated weighted function, wherein the first output is different than a second output generated by an application of the input to a non-weighted function, wherein the first output and the second output respectively correspond to a first predictive capability and a second predictive capability of the machine learning system, and wherein the first predictive capability is greater than the second predictive capability. 8. The method of claim 7 , wherein: the first points follow a training distribution, the second points follow a test distribution, and the weight is effective to conform a particular point in the training distribution to a particular point in the test distribution. 9. The method of claim 7 , wherein comparing the first frequency and the second frequency includes: identifying a first comparison value; comparing frequency values of the test data and the training data in the bins to produce a difference value; updating the first comparison value to produce a second comparison value based on the difference value; and iteratively repeating the identifying the first comparison value, comparing frequency values of the test data and the training data in the bins to produce the difference value, and updating the first comparison value to produce the second comparison value based on the difference value, until the second comparison value converges to a convergent value. 10. The method of claim 9 , wherein updating the first comparison value to produce the second comparison value based on the difference value comprises: adding a fraction of the difference value to the first comparison value to produce the second comparison value. 11. The method of claim 7 , wherein determining the weight for the training data is based on a number of the first points. 12. The method of claim 7 , wherein determining the weight for the training data is based on a number of the first points and a number of the second points which are located in the bins. 13. A computing device, comprising: a first processor; a second processor; and a memory configured to be in communication with the first processor and the second processor, the memory effective to store training data and test data, wherein the training data comprises first points and the test data comprises second points, and wherein: the first processor is effective to: identify a coordinate of the first points and the second points, wherein the coordinate includes a range of values in a coordinate space; divide the range of values in the coordinate space into bins, wherein the bins define subsets of the range of values; determine a first frequency, wherein the first frequency relates to a first percentage of the first points being located within a particular bin; determine a second frequency, wherein the second frequency relates to a second percentage of the second points being located within the particular bin; compare the first frequency and the second frequency; an

Assignees

Inventors

Classifications

  • Selection of the most significant subset of features · CPC title

  • G06N99/005Primary

    Physics · mapped topic

  • Physics · mapped topic

  • Extracting rules from data · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9858534B2 cover?
Technologies are generally described for systems, devices and methods relating to a machine learning environment. In some examples, a processor may identify a training distribution of a training data. The processor may identify information about a test distribution of a test data. The processor may identify a coordinate of the training data and the test data. The processor may determine, for th…
Who is the assignee on this patent?
California Inst Of Techn
What technology area does this patent fall under?
Primary CPC classification G06N99/005. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 02 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).