Feature generation for asset classification
US-2023076569-A1 · Mar 9, 2023 · US
US2022019918A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2022019918-A1 |
| Application number | US-202117330073-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 25, 2021 |
| Priority date | Jul 17, 2020 |
| Publication date | Jan 20, 2022 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A pre-trained model trained to predict a measure of expected model performance based at least in part on a feature relevance score associated with a text field data type is generated. A specification of a desired target field for machine learning prediction and one or more text fields storing input content is received. A corresponding feature relevance score for each of the one or more text fields storing the input content is calculated. Based on the corresponding calculated feature relevance scores, a corresponding measure of expected model performance for each of the one or more text fields storing the input content is predicted using the pre-trained model. The predicted measures of expected model performance are provided for use in feature selection among the one or more text fields storing the input content for generating a machine learning model to predict the desired target field.
Opening claim text (preview).
What is claimed is: 1 . A method, comprising: generating a pre-trained model trained to predict a measure of expected model performance based at least in part on a feature relevance score associated with a text field data type; receiving a specification of a desired target field for machine learning prediction and one or more text fields storing input content; calculating a corresponding feature relevance score for each of the one or more text fields storing the input content; based on the corresponding calculated feature relevance scores, predicting a corresponding measure of expected model performance for each of the one or more text fields storing the input content using the pre-trained model; and providing the predicted measures of expected model performance for use in feature selection among the one or more text fields storing the input content for generating a machine is learning model to predict the desired target field. 2 . The method of claim 1 , wherein calculating the corresponding feature relevance score for each of the one or more text fields storing the input content includes determining a statistical measurement for each of the one or more text fields. 3 . The method of claim 2 , wherein the statistical measurement is based at least in part on a term frequency-inverse document frequency (TF-IDF) metric. 4 . The method of claim 1 , wherein calculating the corresponding feature relevance score for each of the one or more text fields storing the input content includes generating one or more sample data sets of each of the one or more text fields storing input content. 5 . The method of claim 4 , wherein the one or more generated sample data sets of each of the one or more text fields storing input content are stratified samples. 6 . The method of claim 4 , further comprising determining a relevance score for each of the one or more generated sample data sets. 7 . The method of claim 1 , wherein calculating the corresponding feature relevance score for each of the one or more text fields includes averaging for each of the one or more text fields one or more sampled relevance scores. 8 . The method of claim 1 , wherein predicting the corresponding measure of the expected model performance for each of the one or more text fields storing the input content using the pre-trained model includes applying the pre-trained model to one or more information metrics for each of the one or more text fields. 9 . The method of claim 8 , wherein the one or more information metrics includes a text field density metric. 10 . The method of claim 1 , wherein the calculated feature relevance score for each of the one or more text fields storing the input content is a weighted and normalized relief score. 11 . The method of claim 1 , wherein the corresponding measure of expected model performance for each of the one or more text fields storing the input content is based on an increased amount of an area under a precision-recall curve associated with the machine learning model as compared to a baseline model to predict the desired target field. 12 . The method of claim 1 , further comprising ranking the one or more text fields storing the input content based on the predicted measures of expected model performance for use in the feature selection for generating the machine learning model to predict the desired target field. 13 . The method of claim 1 , wherein the one or more text fields storing the input content include text gathered from an input text field, an email subject, an email body, or a chat dialogue. 14 . A system, comprising: one or more processors; and memory coupled to the one or more processors, wherein the memory is configured to provide the one or more processors with instructions which when executed cause the one or more processors to: generate a pre-trained model trained to predict a measure of expected model performance based at least in part on a feature relevance score associated with a text field data type; receive a specification of a desired target field for machine learning prediction and one or more text fields storing input content; calculate a corresponding feature relevance score for each of the one or more text fields storing the input content; based on the corresponding calculated feature relevance scores, predict a corresponding measure of expected model performance for each of the one or more text fields storing the input content using the pre-trained model; and provide the predicted measures of expected model performance for use in feature selection among the one or more text fields storing the input content for generating a machine learning model to predict the desired target field. 15 . The system of claim 14 , wherein causing the one or more processors to calculate the corresponding feature relevance score for each of the one or more text fields storing the input content includes causing the one or more processors to determine a statistical measurement for each of the one or more text fields, and wherein the statistical measurement is based at least in part on a term frequency-inverse document frequency (TF-IDF) metric. 16 . The system of claim 14 , wherein the memory is further configured to provide the one or more processors with instructions which when executed cause the one or more processors to: generate one or more sample data sets of each of the one or more text fields storing input content; determine a sampled relevance score for each of the one or more generated sample data sets; and for each of the one or more text fields, average one or more determined sampled relevance scores. 17 . The system of claim 14 , wherein causing the one or more processors to predict the corresponding measure of the expected model performance for each of the one or more text fields storing the input content using the pre-trained model includes causing the one or more processors to apply the pre-trained model to one or more information metrics for each of the one or more text fields, and wherein the one or more information metrics includes a text field density metric. 18 . The system of claim 14 , wherein the calculated feature relevance score for each of the one or more text fields storing the input content is a weighted and normalized relief score. 19 . The system of claim 14 , wherein the corresponding measure of expected model performance for each of the one or more text fields storing the input content is based on an increased amount of an area under a precision-recall curve associated with the machine learning model as compared to a baseline model to predict the desired target field. 20 . A computer program product, the computer program product being embodied in a non-transitory computer readable medium and comprising computer instructions for: generating a pre-trained model trained to predict a measure of expected model performance based at least in part on a feature relevance score associated with a text field data type; receiving a specification of a desired target field for machine learning prediction and one or more text fields storing input content; calculating a corresponding feature relevance score for each of the one or more text fields storing the input content; based on the corresponding calculated feature relevance scores, predicting a corresponding measure of expected model performance for each of the one or more text fields storing the input content using the pre-trained model; and providing the predicted measures of expected model p
Related publications grouped by family.
Answers are generated from the same data shown on this page.