Information sensors for sensing web dynamics

US2016125083A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016125083-A1
Application numberUS-201314896339-A
CountryUS
Kind codeA1
Filing dateJun 7, 2013
Priority dateJun 7, 2013
Publication dateMay 5, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are techniques and systems for building “information sensors,” which are programmable “focused crawlers” that periodically discover, extract, analyze and aggregate structured information around a topic from the Web. A platform for building an information sensor allows a user to specify one or more data elements within a data source that the user desires to monitor, and an update frequency at which the data elements are to be extracted. Code may be generated based on the user specifications for creation and submission of the information sensor for storage in a database with metadata containing the code and update frequency. Once created, information sensors are scanned to check if running conditions are met, and if met, they may be executed by retrieving the metadata using a sensor identifier (ID). The code is executed to locate a data source, and periodically extract specified data elements therefrom to output structured time-series data.

First claim

Opening claim text (preview).

1 . A method comprising: scanning, by one or more processors, a set of information sensors to determine that a running condition is met for executing at least one information sensor in the set of information sensors; at least partly in response to a determination the running condition is met for the at least one information sensor, retrieving metadata associated with the at least one information sensor, the metadata including an update frequency and code to extract one or more data elements from a data source, the code being user-editable and providing predefined functions for at least extracting the one or more data elements from the data source; running, by the one or more processors, the code to: locate the data source, identify the one or more data elements within the data source, and periodically extract the one or more data elements from the data source according to the update frequency; and storing each extracted data element as a data point in a structured time series. 2 . The method of claim 1 , wherein the metadata further includes a number of versions to be kept, the method further comprising stopping the periodic extraction of the one or more data elements when a number of extracted data elements meets the number of versions to be kept. 3 . The method of claim 1 , wherein the data source is a website including a search engine, and wherein the identification of the one or more data elements within the data source comprises submitting a query to the search engine to identify a plurality of search results as the one or more data elements. 4 . The method of claim 3 , further comprising, collecting a predetermined number of the plurality of search results, analyzing each search result to determine a sentiment of each search result as being one of a positive, negative or neutral sentiment about the query, aggregating the search results according to the positive, negative and neutral sentiment to determine counts of positive, negative and neutral search results; and storing the counts of positive, negative and neutral search results as data points. 5 . The method of claim 1 , wherein the code specifies multiple data sources from which a plurality of data elements are to be extracted, the method further comprising aggregating each of the extracted data elements to obtain a single data point based on the aggregated data points. 6 . The method of claim 1 , further comprising publishing the structured time series. 7 . The method of claim 1 , further comprising: analyzing the data points to determine whether any two consecutive data points lie on either side of a threshold value indicating that the threshold value has been crossed; and transmitting a notification that the threshold value has been crossed to a user device. 8 . The method of claim 1 , further comprising: analyzing the data points to determine a maximum or minimum value among the data points indicative of a peak among the data points, and transmitting a notification of the peak to a user device. 9 . The method of claim 1 , further comprising analyzing the data points to forecast future data points to be obtained by the information sensor over a time period. 10 . A system for executing an information sensor, the system comprising: one or more processors; one or more memories comprising: a sensor scheduler maintained in the one or more and executable by the one or more processors to periodically scan a set of information sensors to determine that a running condition is met for execution of at least one information sensor in the set of information sensors, the at least one information sensor having an identifier (ID); a sensor worker module maintained in the one or more memories and executable by the one or more processors to retrieve metadata associated with the ID and to assign a worker to the at least one information sensor to execute the information sensor, the metadata including an update frequency and code that is user-editable to provide predefined functions for at least extracting one or more data elements from a data source, the worker being configured to run the code to: locate the data source, identify the one or more data elements within the data source to be extracted, and periodically extract the one or more data elements according to the update frequency, and the sensor worker module being configured to store each extracted data element in a database in association with a time and a version number associated with each extracted data element. 11 . The system of claim 10 , wherein the data source is a website including a search engine, and wherein the identification of the one or more data elements within the data source comprises submitting a query to the search engine to identify a plurality of search results as the one or more data elements. 12 . The system of claim 10 , wherein the one or more data elements include at least one of hypertext markup language (HTML) content, hyperlinks, images, tables, search results, comments, posts, or rich site summary (RSS) feeds. 13 . The system of claim 10 , further comprising an analysis and publishing module maintained in the one or more memories and executable by the one or more processors to forecast future data points to be obtained by the information sensor over a time period based at least in part on the extracted data elements. 14 . A computer-readable medium storing computer-executable instructions that, when executed, cause one or more processors to perform acts comprising: receiving, from a user, a specification of: a data element within a data source that the user desires to monitor using an information sensor, and an update frequency at which the information sensor is to extract the data element from the data source, generating code configured to extract the data element from the data source according to the update frequency, the code being further editable by the user by providing predefined functions for at least extracting the data element from the data source; and creating the information sensor by storing the information sensor in a database along with metadata specifying the code and the update frequency. 15 . The computer-readable medium of claim 14 , wherein the data source comprises a website, and wherein the receiving the specification of the data element further comprises receiving a selection of the data element from the user while the user is accessing the website. 16 . The computer-readable medium of claim 15 , wherein the generating the code comprises generating the code in response to the selection of the data element from the user. 17 . The computer readable medium of claim 14 , wherein the data element is a price of an item, and the data source is a website displaying the item for sale. 18 . The computer readable medium of claim 17 , wherein the code is further configured to determine at least one of a lowest price of the item over a period of time in the past, or an optimal time period in the future during which the price may be at a low point. 19 . The computer readable medium of claim 14 , wherein the receiving the specification of the update frequency further comprises receiving a selection of update frequency from the user via a wizard tool. 20 . The computer readable medium of claim 14 , wherein the receiving the specification of the data element further comprises receiving a specification of at least one of the following predefined functions: get a top subset of search results from a search engine for a

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016125083A1 cover?
Disclosed herein are techniques and systems for building “information sensors,” which are programmable “focused crawlers” that periodically discover, extract, analyze and aggregate structured information around a topic from the Web. A platform for building an information sensor allows a user to specify one or more data elements within a data source that the user desires to monitor, and an updat…
Who is the assignee on this patent?
Dou Zhicheng, Wen Ji-Rong, Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/951. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 05 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).