Hybrid machine learning model for code classification

US12299586B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12299586-B2
Application numberUS-201917279439-A
CountryUS
Kind codeB2
Filing dateSep 11, 2019
Priority dateSep 28, 2018
Publication dateMay 13, 2025
Grant dateMay 13, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An embodiment involves a hybrid machine learning classifier that uses a random forest of decision tree classifiers to predict a tariff code prefix, and uses a plurality of expert trees to predict a tariff code suffix from properties related to chemical components associated with the respective tariff code prefixes. The embodiment also involves: determining a proportion of a dominant chemical component in comparison to other chemical components in a new set of chemical components; calculating similarity scores for the new set of chemical components and words associated with the tariff code prefixes; generating a feature vector from the proportion and the similarity scores; and obtaining a predicted tariff code including a predicted tariff code prefix determined by applying the random forest to the feature vector, and a predicted tariff code suffix determined by traversing a particular expert tree in accordance with properties related to the new set of chemical components.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a memory containing: (i) a hybrid machine learning classifier, wherein the hybrid machine learning classifier uses a random forest of decision tree classifiers to predict a tariff code prefix from an input feature vector, wherein the hybrid machine learning classifier uses a plurality of expert trees, one for each of a plurality of tariff code prefixes, to predict a tariff code suffix from properties related to chemical components associated with the tariff code prefixes, (ii) a plurality of product files, one for each of the tariff code prefixes, that contains information for products associated with the tariff code prefix, and (iii) a plurality of component files, one for each of the tariff code prefixes, that contain names of chemical components associated with the tariff code prefix; and a computing device including one or more processors that perform operations comprising: obtain a new set of chemical components associated with a product; determine, from the new set of chemical components, a dominant chemical component; determine a proportion of the dominant chemical component in comparison to other chemical components in the new set of chemical components; for each respective product file of the plurality of product files, calculate similarity scores for a product name associated with the product by determining edit distances between words used for the product name and words in the respective product file; for each respective component file of the plurality of component files, calculate similarity scores for the new set of chemical components by determining edit distances between words used to describe the new set of chemical components and words in the respective component file; generate a feature vector including elements corresponding to each of the calculated similarity scores for the product name and elements corresponding to each of the calculated similarity scores for the new set of chemical components; obtain a predicted tariff code prefix by applying the random forest of decision tree classifiers to the feature vector; select, from the plurality of expert trees, a particular expert tree associated with the predicted tariff code prefix; obtain a predicted tariff code suffix by traversing the particular expert tree in accordance with the properties related to the new set of chemical components; generate a tariff code for the new set of chemical components by concatenating the predicted tariff code prefix and the predicted tariff code suffix; and automatically labelling the product with the generated tariff code. 2. The system of claim 1 , wherein the memory is disposed within a database device that is physically separate from the computing device. 3. The system of claim 1 , wherein the tariff code is n digits long, the tariff code prefixes are j digits long, and the tariff code suffix is n-j digits long, wherein j is less than n. 4. The system of claim 3 , wherein n is 10 and j is 4. 5. The system of claim 1 , wherein the tariff code prefixes are defined by a single chapter of a Harmonized Tariff Schedule Code. 6. The system of claim 1 , wherein calculating the similarity scores for the new set of chemical components comprises: determining words related to the new set of chemical components; and removing, from the determined words, punctuation and stopwords. 7. The system of claim 6 , wherein calculating the similarity scores for the new set of chemical components further comprises: expanding any acronyms in the determined words. 8. The system of claim 1 , wherein at least one of the words in the plurality of component files is represented as a regular expression. 9. The system of claim 1 , wherein determining the edit distances for the product name comprises calculating normalized Levenshtein distances between the words used for the product name and words in the respective product file, wherein determining the edit distances for the new set of chemical components comprises calculating normalized Levenshtein distances between the words used to describe the new set of chemical components and names in the respective component file. 10. The system of claim 1 , wherein the similarity scores for the product name and the new set of chemical components are calculated based on a sum of the respective edit distances. 11. A computer-implemented method comprising: obtaining, by a computing device, a new set of chemical components associated with a product, wherein the computing device has access to: (i) a hybrid machine learning classifier, wherein the hybrid machine learning classifier uses a random forest of decision tree classifiers to predict a tariff code prefix from an input feature vector, wherein the hybrid machine learning classifier uses a plurality of expert trees, one for each of a plurality of tariff code prefixes, to predict a tariff code suffix from properties related to chemical components associated with the tariff code prefixes, (ii) a plurality of product files, one for each of the tariff code prefixes, that contains information for products associated with the tariff code prefix, and (iii) a plurality of component files, one for each of the tariff code prefixes, that contain names of chemical components associated with the tariff code prefixes; determining, by the computing device and from the new set of chemical components, a dominant chemical component; determining, by the computing device, a proportion of the dominant chemical component in comparison to other chemical components in the new set of chemical components; for each respective product file of the plurality of product files, calculating similarity scores for a product name associated with the product by determining edit distances between words used for the product name and words in the respective product file; for each respective component file of the plurality of component files, calculating, by the computing device, similarity scores for the new set of chemical components by determining edit distances between words used to describe the new set of chemical components; generating, by the computing device, a feature vector including elements corresponding to each of the calculated similarity scores for the product name and elements corresponding to each of the calculated similarity scores for the new set of chemical components from the proportion of the dominant chemical component and the similarity scores; obtaining, by the computing device, a predicted tariff code prefix by applying the random forest of decision tree classifiers to the feature vector; selecting, by the computing device and from the plurality of expert trees, a particular expert tree associated with the predicted tariff code prefix; obtaining, by the computing device, a predicted tariff code suffix by traversing the particular expert tree in accordance with the properties related to the new set of chemical components; generating, by the computing device, a tariff code for the new set of chemical components by concatenating the predicted tariff code prefix and the predicted tariff code suffix; and automatically labelling the product with the generated tariff code. 12. The computer-implemented method of claim 11 , wherein the tariff code is n digits long, the tariff code prefixes are j digits long, and the tariff code suffix is n-j digits long, wherein j is less than n. 13. The computer-implemented method of claim 12 , wherein n is 10 and j is 4. 14. The computer-implemented method of claim 11 , wherein the tariff code prefixes are defined by a single chapter of a Harmonized Tariff Schedule Code. 15

Assignees

Inventors

Classifications

  • Machine learning · CPC title

  • G06N5/01Primary

    Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • G06N20/20Primary

    Ensemble learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12299586B2 cover?
An embodiment involves a hybrid machine learning classifier that uses a random forest of decision tree classifiers to predict a tariff code prefix, and uses a plurality of expert trees to predict a tariff code suffix from properties related to chemical components associated with the respective tariff code prefixes. The embodiment also involves: determining a proportion of a dominant chemical co…
Who is the assignee on this patent?
Dow Global Technologies Llc, Dow Chemical Co
What technology area does this patent fall under?
Primary CPC classification G06N5/01. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 13 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).