Probability mapping model for location of natural resources

US10318552B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10318552-B2
Application numberUS-201414278433-A
CountryUS
Kind codeB2
Filing dateMay 15, 2014
Priority dateMay 15, 2014
Publication dateJun 11, 2019
Grant dateJun 11, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer processor generates a topic-based dataset based on parsing content received from a plurality of information sources, which includes historical data and scientific data, associated with a location of a natural resource. The processor generates a plurality of clusters, respectively corresponding to like-topic data of the topic-based dataset. The processor determines a plurality of hypotheses, respectively corresponding to the plurality of clusters of the like-topic data, wherein the plurality of hypotheses are based on features associated with each of the plurality of clusters of the like-topic data. The processor combines pairs of clusters, based on a similarity heuristic applied to the one or more pairs of clusters, and the processor determines a plurality of probabilities respectively corresponding to a validity of each hypothesis of the plurality of hypotheses, associated with the location of a natural resource.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for predicting a location of a natural resource, the method comprising: generating, by a computer processor, a topic-based dataset based on parsing content received from a plurality of information sources, identifying topics and relevancy of data within the topic-based dataset by filtering out non-topic related content based on singular value decomposition and N-gram techniques applied to the received content, and annotating the topic-based dataset with numerical values wherein the topic-based dataset that is generated includes data associated with a plurality of locations of a natural resource; generating, by the computer processor, a plurality of clusters, respectively corresponding to like-topic data of the content of the topic-based dataset from the plurality of information sources, wherein numerated data of the content of the like-topic data included in each cluster of the plurality of clusters are extracted as features corresponding to characteristics of respective topics of the content associated with each respective cluster, and are represented as feature vectors having one or more dimensions, and stored in a proximity matrix, which is trained by unsupervised learning to determine a threshold of cluster aggregation and separation; determining, by the computer processor, a plurality of hypotheses corresponding respectively to the plurality of clusters, each hypothesis associated with a prediction of a particular location of the natural resource, the plurality of hypotheses respectively corresponding to the plurality of clusters of the content of the like-topic data, wherein each hypothesis is based on one or more features that are related and extracted respectively from the plurality of clusters of the like-topic data; determining, by the computer processors, a confidence level of each hypothesis based on the one or more features of a respective feature vector serving as dimensions of evidence; combining, by the computer processor, two or more clusters of the plurality of clusters into a plurality of aggregate clusters based, at least in part, on a similarity heuristic applied to the clusters; generating, by the computer processor, a sequence of regression models, wherein a regression model of the sequence of regression models is based on the proximity matrix storing feature vectors corresponding to respective aggregate clusters of the plurality of aggregate clusters, and the particular sequence of regression models through which the respective feature vectors and respective hypotheses of the plurality of aggregate clusters are routed is based on groups of related features as dimensions of evidence; and generating, by the computer processor, a level of validity, respectively, of the plurality of hypotheses associated with the prediction of the particular location of the natural resource by processing the hypotheses through the sequence of regression models and identifying a highest probability hypothesis. 2. The method of claim 1 , wherein the topic-based dataset generated from the plurality of information sources includes multimedia data. 3. The method of claim 1 , wherein the topic-based dataset generated from the plurality of information sources further includes substantially real-time data. 4. The method of claim 1 , further comprising: transforming, by the computer processor, the topic-based dataset into a summarized format output, based, at least in part, on processing by one or more analytic engines. 5. The method of claim 1 , wherein the topic-based dataset is weighted based on respective beta distributions of the topic-based data. 6. The method of claim 1 , further comprising: combining, by the computer processor, hypotheses corresponding to the one or more clusters that are combined into the plurality of aggregate clusters. 7. The method of claim 1 , further comprising: generating, by the computer processor, one or more feature vectors from the one or more features of the plurality of clusters; generating, by the computer processor, a plurality of cluster spaces, which include the one or more clusters of the plurality of clusters that are combined into the plurality of aggregate clusters, wherein each cluster space is based on a disparate threshold of similarity; determining, by the computer processor, a cluster space of the plurality of cluster spaces that is favorable, based on a score of the cluster space and the disparate threshold of similarity; and generating, by the computer processor, one or more proximity matrices from the one or more feature vectors, based on the cluster space that is favorable. 8. The method of claim 7 , wherein determining the cluster space that is favorable further comprises: determining, by the computer processor, a limit of combining clusters of the plurality of clusters based on the disparate threshold of similarity which produces the score of the cluster space that is favorable. 9. The method of claim 7 , wherein generating one or more proximity matrices from the one or more feature vectors, further comprises: generating, by the computer processor, the one or more proximity matrices, based on a probability density function of the one or more features of the plurality of clusters that are combined. 10. The method of claim 1 , wherein generating the sequence of regression models associated with location of the natural resource, further comprises: training, by the computer processor, the sequence of regression models based on successive refinement of supervised learning performed on the plurality of aggregate clusters, wherein the sequence of regression models is based, at least in part, on the one or more proximity matrices. 11. The method of claim 1 , wherein an output of a previous model of the sequence of regression models is used as input to a subsequent model of the sequence of regression models. 12. The method of claim 1 , further comprising: representing, by the computer processor, the plurality of probabilities respectively corresponding to the validity of each hypothesis of the plurality of hypotheses as a heat map, wherein a first element of the heat map corresponds to a first probability of a hypothesis of the plurality of hypotheses, and disparate from a second element of the heat map corresponding to a second probability of a hypothesis of the plurality of hypotheses. 13. A computer program product for predicting a location of a natural resource, the computer program product comprising: a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer processor to cause the computer processor to perform a method comprising: generating a topic-based dataset based on parsing content received from a plurality of information sources, identifying topics and relevancy of data within the topic-based dataset by filtering out non-topic related content based on singular value decomposition and N-gram techniques applied to the received content, and annotating the topic-based dataset with numerical values, wherein the topic-based dataset that is generated includes data associated with a plurality of locations of a natural resource; generating a plurality of clusters respectively corresponding to like-topic data of the content of the topic-based dataset from the plurality of information sources, wherein numerated data of the content of the like-topic data included in each cluster of the plurality of clusters are extracted as features corresponding to characteristics of respective topics of the content associated with each respective cluster, and are represented as feature vectors having one or more

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • G06F16/285Primary

    Clustering or classification · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • Information retrieval; Database structures therefor; File system structures therefor · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10318552B2 cover?
A computer processor generates a topic-based dataset based on parsing content received from a plurality of information sources, which includes historical data and scientific data, associated with a location of a natural resource. The processor generates a plurality of clusters, respectively corresponding to like-topic data of the topic-based dataset. The processor determines a plurality of hypo…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/285. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 11 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).