Online active learning in user-generated content streams

US9967218B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9967218-B2
Application numberUS-201113282285-A
CountryUS
Kind codeB2
Filing dateOct 26, 2011
Priority dateOct 26, 2011
Publication dateMay 8, 2018
Grant dateMay 8, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Software for online active learning receives content posted to an online stream at a website. The software converts the content into an elemental representation and inputs the elemental representation into a probit model to obtain a predictive probability that the content is abusive. The software also calculates an importance weight based on the elemental representation. And the software updates the probit model using the content, the importance weight, and an acquired label if a condition is met. The condition depends on an instrumental distribution. The software removes the content from the online stream if a condition is met. The condition depends on the predictive probability, if an acquired label is unavailable.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for delivering modified user generated content for display on client devices, comprising the operations of: receiving, at one or more servers over a network, content that is user generated content from an online stream at a website, the content including text; converting, by a machine process executed at the one or more servers, the content into an elemental representation using a bag of words model; applying a probit model to the elemental representation to obtain a predictive probability that the content is abusive or not abusive, the machine process further includes, calculating an importance weight for the probit model based on the elemental representation, the importance weight is modeled as a multivariate Gaussian distribution with a mean and a covariance matrix; creating a probabilistic queue for delivering the content to a human labeler for acquiring a label for the content, wherein placement of the content within the probabilistic queue depends on the predictive probability that the content is abusive or not abusive; updating the probit model using the elemental representation, the importance weight, and the label acquired from the human labeler, the updating the probit model includes calculating an updated mean and an updated covariance matrix for the multivariate Gaussian distribution of the importance weight based on the label; receiving, at the one or more servers, a request from a client device for the online stream at the website, the online stream including the content; applying the probit model having been updated to the content and removing the content from the online stream to produce a modified online stream, the removing is based on the predictive probability that the content is abusive as calculated by the probit model having been updated; and sending, from the one or more servers, the modified online stream to the client device for display. 2. The method of claim 1 , wherein the content further includes an image and wherein the elemental representation is further based at least in part on a bag of features model. 3. The method of claim 1 , wherein the importance weight is further based on entropy or on function values. 4. The method of claim 1 , wherein the probit model includes a memory loss factor. 5. The method of claim 1 , wherein when the abusive content includes spam, fraudulent offers, illegal offers, offensive language, threatening language, or treasonous language. 6. The method of claim 1 , wherein the delivering the content to the human labeler includes pushing the content to the human labeler or having the human labeler pull the content. 7. A method for preventing spread of abusive content to client devices, comprising: receiving content that is user generated from an online stream at a website, the content including text; converting the content into elemental representation; applying a probit model to the elemental representation to obtain a predictive probability that the content is abusive or not abusive, the applying the probit model includes calculating an importance weight based on the elemental representation, the probit model includes a memory loss factor associated with the importance weight, the importance weight is modeled as a multivariate Gaussian distribution with a mean and a covariance matrix; creating a probabilistic queue for delivering the content to a human labeler for acquiring a label for the content indicative of whether the content is abusive or not abusive, the content is inserted into the probabilistic queue depending on the predictive probability that the content is abusive or not abusive; updating the probit model using the elemental representation, the importance weight, the memory loss factor, and the label acquired from the human labeler, the updating includes calculating an updated mean and an updated covariance matrix for the multivariate Gaussian distribution of the importance weight based on the label; receiving a request from a client device for the online stream at the website, the online stream including the content; applying the probit model having been updated to the content, the applying the probit model having been updated includes, removing the content from the online stream if the predictive probability is determined to be abusive as calculated by the probit model having been updated, or not removing the content from the online stream if the predictive probability is determined to be not abusive as calculated by the probit model having been updated; and sending the online stream to the client device. 8. The method of claim 7 , wherein the converting the content into the elemental representation includes using a bag of words model, wherein words of the content are represented as uni-grams, bi-grams, n-grams, or skip-grams in the bag of words model. 9. The method of claim 7 , wherein the converting the content into the elemental representation includes generating a vector using term frequency-inverse document frequency. 10. The method of claim 7 , wherein when the abusive content includes spam, fraudulent offers, illegal offers, offensive language, threatening language, or treasonous language. 11. The method of claim 7 , wherein the delivering the content to the human labeler includes pushing the content to the human labeler or having the human labeler pull the content. 12. A computer-readable storage medium that is non-transitory and that stores a program for delivering modified user generated content for display on client devices, wherein the program, when executed, instructs a processor to perform the following operations: receive, at one or more servers, content posted to an online stream at a website; convert, by a machine process, the content into an elemental representation using a bag of words model; apply the probit model to the elemental representation to obtain a predictive probability that the content is abusive, the machine process further includes, calculate an importance weight for the probit model based on the elemental representation, the importance weight is defined by a multivariate Gaussian distribution defined by a mean and a covariance matrix; create a probabilistic queue for delivering the content to a human labeler for acquiring a label for the content, wherein placement of the content within the probabilistic queue depends on the predictive probability that the content is abusive or not abusive; update the probit model with the elemental representation and the importance weight, and the label acquired from the human labeler, said update the probit model includes calculating, based on the label, an updated mean and an updated covariance matrix that define the multivariate Gaussian distribution that defines the importance weight; receive, at the one or more servers, a request from a client device for the online stream at the website, the online stream including the content; apply the probit model having been updated to the content to remove the content from the online stream to produce a modified online stream, said remove the content is based on the predictive probability that the content is abusive as calculated by the probit model having been updated; and send, from the one or more servers, the modified online stream to the client device for display. 13. The computer-readable storage medium of claim 12 , wherein the content further includes an image and wherein the elemental representation is further based at least in part on a bag of features model. 14. The computer-readable storage medium of claim 12 , wherein the importance weight is further based on entropy or on function val

Assignees

Inventors

Classifications

  • G06Q50/20Primary

    Education · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs {(coordinating program control therefor G06F9/52; in regulating and control system G05B)} · CPC title

  • Knowledge representation; Symbolic representation · CPC title

  • Computing arrangements based on specific mathematical models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9967218B2 cover?
Software for online active learning receives content posted to an online stream at a website. The software converts the content into an elemental representation and inputs the elemental representation into a probit model to obtain a predictive probability that the content is abusive. The software also calculates an importance weight based on the elemental representation. And the software update…
Who is the assignee on this patent?
Chu Wei, Zinkevich Martin, Li Lihong, and 3 more
What technology area does this patent fall under?
Primary CPC classification G06Q50/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 08 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).