What technology area does this patent fall under?

Primary CPC classification G06F16/215. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Mar 23 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Miscategorized outlier detection using unsupervised slm-gbm approach and structured data

Patent metadata
Field	Value
Publication number	US-2017083602-A1
Application number	US-201514861746-A
Country	US
Kind code	A1
Filing date	Sep 22, 2015
Priority date	Sep 22, 2015
Publication date	Mar 23, 2017
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In an example, one or more leaf category specific unsupervised statistical language model (SLM) models are trained using sample item listings corresponding to each of one or more leaf categories and structured data about the one or more leaf categories, the training including calculating an expected perplexity and a standard deviation for item listing titles. A perplexity for a title of a particular item listing is calculated and a perplexity deviation signal is generated based on a difference between the perplexity for the title of the particular item listing and the expected perplexity for item listing titles in a leaf category of the particular item listing and based on the standard deviation for item listing titles in the leaf category of the particular item listing. A gradient boosting machine (GBM) fuses the perplexity deviation signal with one or more other signals to generate a miscategorization classification score corresponding to the particular item listing.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system comprising: a statistical language model (SLM) training component executable by one or more processors and configured to train one or more leaf-category-specific unsupervised statistical language model (SLM) models using sample item listings corresponding to each of one or more leaf categories and structured data about the one or more leaf categories, the training including calculating an expected perplexity and a standard deviation for item listing titles; a perplexity deviation signal generator configured to, in response to a request for a miscategorization classification score corresponding to a particular item listing: calculate a perplexity for a title of the particular item listing, and generate a perplexity deviation signal based on a difference between the perplexity for the title of the particular item listing and the expected perplexity for item listing titles in a leaf category of the particular item listing and based on the standard deviation for item listing titles in the leaf category of the particular item listing; and a gradient boosting machine (GBM) configured to fuse the perplexity deviation signal with one or more other signals to generate a miscategorization classification score corresponding to the particular item listing. 2 . The system of claim 1 , wherein the training further includes generating an SLM for each leaf category for structured data, an SLM for each leaf category's queries, and an SLM for each leaf category's titles, and interpolating the SLM for each leaf category for structured data, the SLM for each leaf category's queries, and the SLM for each leaf category's titles into an SLM for each leaf category. 3 . The system of claim 2 , wherein the training further includes generating an expected perplexity and a standard deviation for each leaf category based on the SLM for each leaf category and perplexity and standard deviation calculations for each sample item listing. 4 . The system of claim 1 , wherein the generating the perplexity deviation signal includes computing a sentence log probability. 5 . The system of claim 1 , further comprising: a GBM training component configured to: create a tuning set of item listings by labeling item listings as miscategorized or non-miscategorized based on application of filters to the item listings; and feed the tuning set of item listings to the GBM for tuning of a GBM model used by the GBM. 6 . The system of claim 1 , wherein the GBM takes a product type signal as input. 7 . A method comprising: training one or more leaf-category-specific unsupervised statistical language model (SLM) models using sample item listings corresponding to each of one or more leaf categories and structured data about the one or more leaf categories, the training including calculating an expected perplexity and a standard deviation for item listing titles; in response to a request for a miscategorization classification score corresponding to a particular item listing, calculating a perplexity for a title of the particular item listing and generating a perplexity deviation signal based on a difference between the perplexity for the title of the particular item listing and the expected perplexity for item listing titles in a leaf category of the particular item listing and based on the standard deviation for item listing titles in the leaf category of the particular item listing; and using a gradient boosting machine (GBM) to fuse the perplexity deviation signal with one or more other signals to generate a miscategorization classification score corresponding to the particular item listing. 8 . The method of claim 7 , wherein the training comprises calculating a sentence perplexity PP(S) for each sequence S of N words {w 1 , w 2 , . . . , w N } in each title of each of the sample item listings according to the following formula: PP  ( S ) = P  ( w 1  …   w N ) - 1 / N = ∏ i = 1 N   1 P  ( w 1 | w 1  …   w i - 1 ) N . 9 . The method of claim 7 , wherein the training further includes generating an SLM for each leaf category for structured data, an SLM for each leaf category's queries, and an SLM for each leaf category's titles, and interpolating the SLM for each leaf category for structured data, the SLM for each leaf category's queries, and the SLM for each leaf category's titles into an SLM for each leaf category. 10 . The method of claim 9 , wherein the training further includes generating an expected perplexity and a standard deviation for each leaf category based on the SLM for each leaf category and perplexity and standard deviation calculations for each sample item listing. 11 . The method of claim 7 , wherein the generating the perplexity deviation signal includes computing a sentence log probability.

Assignees

Ebay Inc

Inventors

Liu Mingkuan

Classifications

G06F16/215Primary
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
G06F40/284
Lexical analysis, e.g. tokenisation or collocates · CPC title
G06F40/211
Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars · CPC title
G06F40/216
using statistical methods · CPC title
G06F16/285Primary
Clustering or classification · CPC title

Patent family

Related publications grouped by family.

View patent family 58282868

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017083602A1 cover?: In an example, one or more leaf category specific unsupervised statistical language model (SLM) models are trained using sample item listings corresponding to each of one or more leaf categories and structured data about the one or more leaf categories, the training including calculating an expected perplexity and a standard deviation for item listing titles. A perplexity for a title of a parti…
Who is the assignee on this patent?: Ebay Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Mar 23 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).