Automatic lot classification

US11036780B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11036780-B2
Application numberUS-201815916207-A
CountryUS
Kind codeB2
Filing dateMar 8, 2018
Priority dateMar 8, 2018
Publication dateJun 15, 2021
Grant dateJun 15, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and media for lot classification are disclosed. In one example, a classification system for identifying lot listings receives a description for a listing in a publication system, identifies a string in the listing, identifies a quantity word or digit in the string, and converts an identified quantity word into digit form. A normalized string is tokenized to produce tokens, the tokenizing of the normalized string including splitting the normalized string into a series of substrings using a sequence of delimiters. For each substring, an additional split is performed by separating any digit from any other adjacent character, unless that character is another digit, and maintaining an internal character order of each split substring to produce a flattened list of tokenized tokens.

First claim

Opening claim text (preview).

What is claimed is: 1. A classification system comprising: at least one processor; and a memory storing instructions that, when executed by the at least one processor, cause the classification system to perform operations comprising, at least: receiving a description for a listing in a publication system; identifying a string in the listing; identifying a quantity word in the string; converting the identified quantity word into digit form; producing a normalized string including only lowercase characters and digits based at least in part on the converting; tokenizing the normalized string to produce tokens, the tokenizing of the normalized string including: splitting the normalized string into a series of substrings using a sequence of delimiters, a first substring of the series of substrings including a lowercase character and a digit; performing an additional split on the first substring by separating the digit from the lowercase character; maintaining an internal character order of each split substring; and producing a flattened list of tokenized tokens; based on a trained model, assigning a probability to at least one token as being indicative of a lot quantity; classifying the listing as a lot listing based on the assigned probability; and based on the classification, causing display of the listing as a lot listing. 2. The classification system of claim 1 , wherein identifying the string in the listing includes identifying the string in a title of the listing. 3. The classification system of claim 1 , wherein the sequence of delimiters includes white space delimiters. 4. The classification system of claim 1 , further comprising a feature vector component for creating a feature vector for training the trained model, the feature vector component receiving as input a tokenized listing title including a position of a numerical token in a tokenized listing title and wherein, in relation to the position of the numerical token, the feature vector includes one or more of: a token after vector, a bigram after vector, a token before vector, a bigram before vector, a unit of measure vector, a token position ratio, and a token is divisible by k vector. 5. The classification system of claim 4 , wherein the system further comprises a training component receiving as input the feature vector computed by the feature vector component, and wherein a training model per meta category is trained by the training component using a training set of listing titles for items which are listed under that meta category. 6. The classification system of claim 2 , wherein only a raw listing title string and an assigned lot size value are extracted from the listing title and are included as an entry in a training set for the trained model. 7. A classification method at a classification system comprising one or more processors, comprising: receiving, at the one or more processors, a description for a listing in a publication system; identifying, at the one or more processors, a string in the listing; identifying, at the one or more processors, a quantity word in the string; converting, at the one or more processors, the identified quantity word into digit form; producing, at the one or more processors, a normalized string including only lowercase characters and digits based at least in part on the converting; tokenizing, at the one or more processors, the normalized string to produce tokens, the tokenizing of the normalized string including; splitting the normalized string into a series of substrings using a sequence of delimiters, a first substring of the series of substrings including a lowercase character and a digit; performing an additional split on the first substring by separating the digit from the lowercase character; maintaining an internal character order of each split sub string; and producing a flattened list of tokenized tokens; based on a trained model, assigning, at the one or more processors, a probability to at least one token as being indicative of a lot quantity; classifying, at the one or more processors, the listing as a lot listing based on the assigned probability; and based on the classification, causing, at the one or more processors, the display of the listing as a lot listing. 8. The classification method of claim 7 , wherein identifying the string in the listing includes identifying the string in a title of the listing. 9. The classification method of claim 7 , wherein the sequence of delimiters includes white space delimiters. 10. The classification method of claim 7 , further comprising creating a feature vector for training the trained model by receiving a tokenized listing title including a position of a numerical token in a tokenized listing title and wherein, in relation to the position of the numerical token, the feature vector includes one or more of: a token after vector, a bigram after vector, a token before vector, a bigram before vector, a unit of measure vector, a token position ratio, and a token is divisible by k vector. 11. The classification method of claim 10 , further comprising receiving the feature vector as input into a training component, and training a training model for a meta category using the training component and a training set of listing titles for items which are listed under that meta category. 12. The classification method of claim 8 , further comprising extracting only a raw listing title string and an assigned lot size value from the listing title and including them as an entry in a training set for the trained model. 13. A non-transitory machine-readable medium containing instructions which, when read by a machine, cause the machine to perform operations comprising, at least: receiving a description for a listing in a publication system; identifying a string in the listing; identifying a quantity word in the string; converting the identified quantity word into digit form; producing a normalized string including only lowercase characters and digits based at least in part on the converting; tokenizing the normalized string to produce tokens, the tokenizing of the normalized string including; splitting the normalized string into a series of substrings using a sequence of delimiters, a first substring of the series of substrings including a lowercase character and a digit; performing an additional split on the first substring by separating the digit from the lowercase character; maintaining an internal character order of each split substring; and producing a flattened list of tokenized tokens; based on a trained model, assigning a probability to at least one token as being indicative of a lot quantity; classifying the listing as a lot listing based on the assigned probability; and based on the classification, causing the display of the listing as a lot listing. 14. The medium of claim 13 , wherein identifying the string in the listing includes identifying the string in a title of the listing. 15. The medium of claim 13 , wherein the sequence of delimiters includes white space delimiters. 16. The medium of claim 13 , wherein the operations further comprise creating a feature vector for training the trained model by receiving a tokenized listing title including a position of a numerical token in a tokenized listing title and wherein, in relation to the position of the numerical token, the feature vector includes one or more of: a token after vector, a bigram after vector, a token before vector, a bigram before vector, a unit of measure vector, a token position ratio, and a token is divisible by k vector.

Assignees

Inventors

Classifications

  • G06F16/35Primary

    Clustering; Classification · CPC title

  • Machine learning · CPC title

  • Recognition of textual entities · CPC title

  • Handling of whitespace · CPC title

  • G06F16/358Primary

    Browsing; Visualisation therefor · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11036780B2 cover?
Methods, systems, and media for lot classification are disclosed. In one example, a classification system for identifying lot listings receives a description for a listing in a publication system, identifies a string in the listing, identifies a quantity word or digit in the string, and converts an identified quantity word into digit form. A normalized string is tokenized to produce tokens, the…
Who is the assignee on this patent?
Ebay Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/35. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 15 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).