Quality-based scoring and inhibiting of user-generated content

US10489712B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10489712-B2
Application numberUS-201615055220-A
CountryUS
Kind codeB2
Filing dateFeb 26, 2016
Priority dateFeb 26, 2016
Publication dateNov 26, 2019
Grant dateNov 26, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and devices for assessing the quality of user-generated content are described. In one embodiment, a method is disclosed for measuring the quality of a user-generated answer to a question by combining various factors, including question-answer surface word vector similarity, question-answer explicit semantic analysis vector similarity, answer-answer explicit sematic analysis vector similarity, query performance predictor, sentiment analysis, textual analysis of the answer, and reputation of the answerer. The method uses a learning procedure to determine the best algorithm for measuring the overall quality of the answer based on these factors.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for displaying answers to a question, comprising: acquiring a question text; retrieving N answer texts corresponding to N answers to the question, wherein N is an integer greater than one; computing a question vector representing an explicit semantic analysis vector for the question text; computing an explicit semantic analysis vector for each of the N answer texts to obtain N answer vectors; computing an overlap between the question vector and each of the N answer vectors to obtain a set of N measures of question-answer similarity each corresponding to a measure of similarity between the question vector and each of the N answer vectors; determining a quality score for each of the N answer texts and a quality ranking of the N answer texts based at least partially on the set of N measures of question-answer similarity; and displaying one or more of the N answer texts on a user interface in an order based on the quality ranking. 2. The method of claim 1 , wherein an overlap between two explicit semantic analysis vectors is determined by computing the cosine of the two explicit semantic analysis vectors. 3. The method of claim 1 , wherein retrieving N answer texts corresponding to N answers to the question comprises: querying a database containing question-answer pairs; and receiving, from the database, the N answer texts. 4. The method of claim 1 , wherein retrieving N answer texts corresponding to N answers to the question comprises: querying an internet-based search engine using the question text; and receiving, from the search engine, the N answer texts. 5. The method of claim 1 , further comprising: for each of the N answer texts, computing a corresponding subset of N−1 measures of answer-answer similarity each corresponding to an overlap between the answer vector of the each of the N answer texts and each of the other N−1 answer vectors; and calculating an average of the subset of N−1 measures of answer-answer similarity for each of the N answer text to obtain a set of N average measures of similarity each corresponding to a measure of similarity between each of the N answer vectors and the rest of the N−1 answer vectors, wherein the determining of a quality score for each of the N answer texts and a quality ranking of the answer texts is further based at least partially on the set of N average measures of similarity. 6. The method of claim 5 , wherein the set of N measures of question-answer similarity are weighed more heavily than the set of N average measures of similarity in determining the quality score for each of the N answer texts and the ranking for the answer texts. 7. The method of claim 5 , wherein the first set of N measures of question-answer similarity and the set of N average measures of similarity are weighed using a predetermined proportion in determining the quality score for each the N answer texts and the quality ranking of the answer texts. 8. The method of claim 5 , wherein at least some of the set of N measures of question-answer similarity and at least some of the set of N average measures of similarity are combined to determine the quality score for each of the N answer texts in an automatic quality scoring process using a learning procedure comprising: constructing a gold standard classification of quality for a plurality of answers of a corresponding set of questions; fitting a model to best match the gold standard classification; comparing a quality classification of one or more answers from a plurality of classification algorithms to the gold standard classification; and choosing a best-performing classifier as a classifier for the automatic quality scoring process. 9. The method of claim 1 , further comprising: using each of the N answer texts as a search term to query a search engine; receiving a set of documents from the search engine for each of the N answer texts; computing a first set of N general language models for the set of documents; using answer texts from a random question-best answer pairs as search terms to query the search engine and obtain a general corps of documents from the search engine; computing a second general language model of the general corps; and computing a difference between the second general language model and each of the first set of N general language models to obtain a set of N language model differences, wherein the determining of the quality score for each of the N answer texts and the quality ranking for the answer texts is further based at least partially on the set of N language model differences. 10. The method of claim 1 , further comprising: conducting a sentiment analysis of each of the N answer texts; and obtaining a set of N sentiment levels each corresponding to a sentiment level of one of the N answer texts, wherein the determining of the quality score for each of the N answer texts and the quality ranking for the N answer texts is further based at least partially on the set of N sentiment levels. 11. The method of claim 1 , further comprising: Identifying a highest quality score of the N answer texts, wherein displaying one or more of the N answer texts on a user interface in an order based on the quality ranking comprises: displaying, based on the quality ranking, one or more of the answer texts having at least one quality score higher than a threshold value derived from the highest quality score of the N answer texts. 12. A community question-answering server, comprising: an input interface configured to acquire a question text from a user; a database storing N answer texts to the question, wherein N is an integer greater than one; a processing unit configured to: retrieve the N answer texts; compute a question vector representing an explicit semantic analysis vector for the question text; compute an explicit semantic analysis vector for each of the N answer texts to obtain N answer vectors; compute an overlap between the question vector and each of the N answer vectors to obtain a set of N measures of question-answer similarity each corresponding to a measure of similarity between the question vector and each of the N answer vectors; and determine a quality score for each of the N answer texts and a quality ranking of the N answer texts based at least partially on the set of N measures of question-answer similarity; and an output interface for causing a display of one or more of the N answer texts on a user device in an order based on the quality ranking. 13. The community question-answering server of claim 12 , the processing unit is further configured to: for each of the N answer texts, compute a corresponding subset of N−1 measures of similarity each corresponding to an overlap between the answer vector of the each of the N answer texts and each of the other N−1 answer vectors; and calculate an average of the subset of N−1 measures of answer-answer similarity for each of the N answer text to obtain a set of N average measures of similarity each corresponding to a measure of similarity between each of the N answer vectors and the rest of the N−1 answer vectors, wherein the determining of the quality score for each of the N answer texts and the ranking of the answer texts is further based at least partially on the set of N average measures of similarity. 14. The community question-answering server of claim 12 , the processing unit is further configured to: use each of the N answer text as a search term to query a search engine; receive a set of documents from the search engine for each of the N answer texts; compute a first set of N general lan

Assignees

Inventors

Classifications

  • G06N5/04Primary

    Inference or reasoning models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10489712B2 cover?
Methods and devices for assessing the quality of user-generated content are described. In one embodiment, a method is disclosed for measuring the quality of a user-generated answer to a question by combining various factors, including question-answer surface word vector similarity, question-answer explicit semantic analysis vector similarity, answer-answer explicit sematic analysis vector simil…
Who is the assignee on this patent?
Yahoo Holdings Inc, Oath Inc
What technology area does this patent fall under?
Primary CPC classification G06N5/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 26 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).