System and method for topic extraction and opinion mining
US-2017068667-A1 · Mar 9, 2017 · US
US11036780B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11036780-B2 |
| Application number | US-201815916207-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 8, 2018 |
| Priority date | Mar 8, 2018 |
| Publication date | Jun 15, 2021 |
| Grant date | Jun 15, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and media for lot classification are disclosed. In one example, a classification system for identifying lot listings receives a description for a listing in a publication system, identifies a string in the listing, identifies a quantity word or digit in the string, and converts an identified quantity word into digit form. A normalized string is tokenized to produce tokens, the tokenizing of the normalized string including splitting the normalized string into a series of substrings using a sequence of delimiters. For each substring, an additional split is performed by separating any digit from any other adjacent character, unless that character is another digit, and maintaining an internal character order of each split substring to produce a flattened list of tokenized tokens.
Opening claim text (preview).
What is claimed is: 1. A classification system comprising: at least one processor; and a memory storing instructions that, when executed by the at least one processor, cause the classification system to perform operations comprising, at least: receiving a description for a listing in a publication system; identifying a string in the listing; identifying a quantity word in the string; converting the identified quantity word into digit form; producing a normalized string including only lowercase characters and digits based at least in part on the converting; tokenizing the normalized string to produce tokens, the tokenizing of the normalized string including: splitting the normalized string into a series of substrings using a sequence of delimiters, a first substring of the series of substrings including a lowercase character and a digit; performing an additional split on the first substring by separating the digit from the lowercase character; maintaining an internal character order of each split substring; and producing a flattened list of tokenized tokens; based on a trained model, assigning a probability to at least one token as being indicative of a lot quantity; classifying the listing as a lot listing based on the assigned probability; and based on the classification, causing display of the listing as a lot listing. 2. The classification system of claim 1 , wherein identifying the string in the listing includes identifying the string in a title of the listing. 3. The classification system of claim 1 , wherein the sequence of delimiters includes white space delimiters. 4. The classification system of claim 1 , further comprising a feature vector component for creating a feature vector for training the trained model, the feature vector component receiving as input a tokenized listing title including a position of a numerical token in a tokenized listing title and wherein, in relation to the position of the numerical token, the feature vector includes one or more of: a token after vector, a bigram after vector, a token before vector, a bigram before vector, a unit of measure vector, a token position ratio, and a token is divisible by k vector. 5. The classification system of claim 4 , wherein the system further comprises a training component receiving as input the feature vector computed by the feature vector component, and wherein a training model per meta category is trained by the training component using a training set of listing titles for items which are listed under that meta category. 6. The classification system of claim 2 , wherein only a raw listing title string and an assigned lot size value are extracted from the listing title and are included as an entry in a training set for the trained model. 7. A classification method at a classification system comprising one or more processors, comprising: receiving, at the one or more processors, a description for a listing in a publication system; identifying, at the one or more processors, a string in the listing; identifying, at the one or more processors, a quantity word in the string; converting, at the one or more processors, the identified quantity word into digit form; producing, at the one or more processors, a normalized string including only lowercase characters and digits based at least in part on the converting; tokenizing, at the one or more processors, the normalized string to produce tokens, the tokenizing of the normalized string including; splitting the normalized string into a series of substrings using a sequence of delimiters, a first substring of the series of substrings including a lowercase character and a digit; performing an additional split on the first substring by separating the digit from the lowercase character; maintaining an internal character order of each split sub string; and producing a flattened list of tokenized tokens; based on a trained model, assigning, at the one or more processors, a probability to at least one token as being indicative of a lot quantity; classifying, at the one or more processors, the listing as a lot listing based on the assigned probability; and based on the classification, causing, at the one or more processors, the display of the listing as a lot listing. 8. The classification method of claim 7 , wherein identifying the string in the listing includes identifying the string in a title of the listing. 9. The classification method of claim 7 , wherein the sequence of delimiters includes white space delimiters. 10. The classification method of claim 7 , further comprising creating a feature vector for training the trained model by receiving a tokenized listing title including a position of a numerical token in a tokenized listing title and wherein, in relation to the position of the numerical token, the feature vector includes one or more of: a token after vector, a bigram after vector, a token before vector, a bigram before vector, a unit of measure vector, a token position ratio, and a token is divisible by k vector. 11. The classification method of claim 10 , further comprising receiving the feature vector as input into a training component, and training a training model for a meta category using the training component and a training set of listing titles for items which are listed under that meta category. 12. The classification method of claim 8 , further comprising extracting only a raw listing title string and an assigned lot size value from the listing title and including them as an entry in a training set for the trained model. 13. A non-transitory machine-readable medium containing instructions which, when read by a machine, cause the machine to perform operations comprising, at least: receiving a description for a listing in a publication system; identifying a string in the listing; identifying a quantity word in the string; converting the identified quantity word into digit form; producing a normalized string including only lowercase characters and digits based at least in part on the converting; tokenizing the normalized string to produce tokens, the tokenizing of the normalized string including; splitting the normalized string into a series of substrings using a sequence of delimiters, a first substring of the series of substrings including a lowercase character and a digit; performing an additional split on the first substring by separating the digit from the lowercase character; maintaining an internal character order of each split substring; and producing a flattened list of tokenized tokens; based on a trained model, assigning a probability to at least one token as being indicative of a lot quantity; classifying the listing as a lot listing based on the assigned probability; and based on the classification, causing the display of the listing as a lot listing. 14. The medium of claim 13 , wherein identifying the string in the listing includes identifying the string in a title of the listing. 15. The medium of claim 13 , wherein the sequence of delimiters includes white space delimiters. 16. The medium of claim 13 , wherein the operations further comprise creating a feature vector for training the trained model by receiving a tokenized listing title including a position of a numerical token in a tokenized listing title and wherein, in relation to the position of the numerical token, the feature vector includes one or more of: a token after vector, a bigram after vector, a token before vector, a bigram before vector, a unit of measure vector, a token position ratio, and a token is divisible by k vector.
Clustering; Classification · CPC title
Machine learning · CPC title
Recognition of textual entities · CPC title
Handling of whitespace · CPC title
Browsing; Visualisation therefor · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.