Deep neural network-based relationship analysis with multi-feature token model
US-10565498-B1 · Feb 18, 2020 · US
US12411884B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12411884-B2 |
| Application number | US-202418641994-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 22, 2024 |
| Priority date | Mar 8, 2018 |
| Publication date | Sep 9, 2025 |
| Grant date | Sep 9, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and media for lot classification are disclosed. In one example, a classification system for identifying lot listings receives a description for a listing in a publication system, identifies a string in the listing, identifies a quantity word or digit in the string, and converts an identified quantity word into digit form. A normalized string is tokenized to produce tokens, the tokenizing of the normalized string including splitting the normalized string into a series of substrings using a sequence of delimiters. For each substring, an additional split is performed by separating any digit from any other adjacent character, unless that character is another digit, and maintaining an internal character order of each split substring to produce a flattened list of tokenized tokens.
Opening claim text (preview).
What is claimed is: 1. A computer implemented tokenization method comprising: receiving a normalized title string; tokenizing the normalized title string by: splitting, by one or more processors, the normalized title string into a plurality of substrings using a sequence of whitespaces as a delimiter; for each substring of the plurality of substrings, performing an additional split in each substring of the plurality of substrings where a digit character is separated from a non-digit character to create split substrings for each of the substrings of the plurality of substrings to create a plurality of tokens for the normalized title string, an individual substring of the plurality of substrings comprising the digit character adjacent to the non-digit character without the delimiter between the digit character and the non-digit character, the performing the additional split comprising processing the individual substring by a machine learning model to split the digit character from the non-digit character to create the two tokens of the plurality of tokens; and maintaining an order of the split substrings; and creating a flattened list of the plurality of tokens for the normalized title string. 2. The tokenization method of claim 1 , wherein the normalized title string is normalized from a non-normalized text string and the method includes normalizing the non-normalized text string by: converting non-digit characters that are uppercase non-digit characters to lowercase non-digit characters; determining that one of the non-digit characters of the non-normalized text string corresponds to quantity words; and converting the non-digit characters that correspond to quantity words to digit characters. 3. The tokenization method of claim 2 , wherein others of the non-digit characters of the non-normalized text string correspond to an item title in a listing for an item. 4. The tokenization method of claim 1 , further comprising assigning a probability to a token of the plurality of tokens, the probability being indicative of a lot quantity, the machine learning model trained by performing training operations comprising: receiving a training set of listing titles having assigned lot size values greater than one; for each listing title in the training set: preprocessing the listing title to identify numerical tokens; computing a feature vector for each numerical token; assigning a positive label to the numerical token if the numerical token equals the assigned lot size value for the listing title; assigning a negative label to the numerical token if the numerical token does not equal the assigned lot size value for the listing title; and training a logistic regression binary classifier using the computed feature vectors and assigned labels to generate a trained model for identifying lot quantities. 5. The tokenization method of claim 4 , wherein the feature vector includes one or more of: a token after vector indicating a token following the numerical token, a bigram after vector, a token before vector indicating a token preceding the numerical token, a bigram before vector, a unit of measure vector, a token position ratio indicating a ratio of the numerical token's position to a length of the listing title, and a token divisibility vector, and wherein the probability is based on a position of the token in the normalized title string and the method further comprises classifying a listing associated with the normalized title string as a lot listing based on the probability. 6. The tokenization method of claim 4 , wherein the order is an internalized order of the split substrings. 7. The tokenization method of claim 1 , wherein performing the additional split in each substring of the plurality of substrings comprises separating a character from an adjacent character based on a difference between the character and the adjacent character. 8. A system, comprising: at least one processor; and a memory device storing instructions which, when executed by the at least one processor, causes the system to perform operations comprising: receiving a normalized title string; tokenizing the normalized title string by: splitting the normalized title string into a plurality of substrings using a sequence of whitespaces as a delimiter; for each substring of the plurality of substrings, performing an additional split in each substring of the plurality of substrings where a digit character is separated from a non-digit character to create split substrings for each of the substrings of the plurality of substrings to create a plurality of tokens for the normalized title string, an individual substring of the plurality of substrings comprising the digit character adjacent to the non-digit character without the delimiter between the digit character and the non-digit character, the performing the additional split comprising processing the individual substring by a machine learning model to split the digit character from the non-digit character to create the two tokens of the plurality of tokens; and maintaining an order of the split substrings; and creating a flattened list of the plurality of tokens for the normalized title string. 9. The system of claim 8 , wherein the normalized title string is normalized from a non-normalized text string and the processor, when executing the instructions, causes the system to perform operations comprising: converting non-digit characters that are uppercase non-digit characters to lowercase non-digit characters; determining that one of the non-digit characters of the non-normalized text string corresponds to quantity words; and converting the non-digit characters that correspond to quantity words to digit characters. 10. The system of claim 9 , wherein others of the non-digit characters of the non-normalized text string correspond to an item title in a listing for an item. 11. The system of claim 8 , the processor, when executing the instructions, causes the system to perform operations comprising assigning a probability to a token of the plurality of tokens, the probability being indicative of a lot quantity. 12. The system of claim 11 , wherein the probability is based on a position of the token in the normalized title string and the processor, when executing the instructions, causes the system to perform operations comprising classifying a listing associated with the normalized title string as a lot listing based on the probability. 13. The system of claim 11 , wherein the order is an internalized order of the split substrings. 14. The system of claim 8 , wherein when performing the additional split in each substring of the plurality of substrings the processor, when executing the instructions, causes the system to perform operations comprising separating a character from an adjacent character based on a difference between the character and the adjacent character. 15. A non-transitory computer-readable medium comprising instructions which, when read by a machine, cause the machine to perform operations comprising: receiving a normalized title string; tokenizing the normalized title string by: splitting the normalized title string into a plurality of substrings using a sequence of whitespaces as a delimiter; for each substring of the plurality of substrings, performing an additional split in each substring of the plurality of substrings where a digit character is separated from a non-digit character to create split substrings for each of the substrings of the plurality of substrings to create a plurality of tokens for the normalized title string, an individual substring of the pl
Clustering; Classification · CPC title
Machine learning · CPC title
Transformation · CPC title
Handling of whitespace · CPC title
Recognition of textual entities · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.