Machine learning feature recommendation

US2022019918A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022019918-A1
Application numberUS-202117330073-A
CountryUS
Kind codeA1
Filing dateMay 25, 2021
Priority dateJul 17, 2020
Publication dateJan 20, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A pre-trained model trained to predict a measure of expected model performance based at least in part on a feature relevance score associated with a text field data type is generated. A specification of a desired target field for machine learning prediction and one or more text fields storing input content is received. A corresponding feature relevance score for each of the one or more text fields storing the input content is calculated. Based on the corresponding calculated feature relevance scores, a corresponding measure of expected model performance for each of the one or more text fields storing the input content is predicted using the pre-trained model. The predicted measures of expected model performance are provided for use in feature selection among the one or more text fields storing the input content for generating a machine learning model to predict the desired target field.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, comprising: generating a pre-trained model trained to predict a measure of expected model performance based at least in part on a feature relevance score associated with a text field data type; receiving a specification of a desired target field for machine learning prediction and one or more text fields storing input content; calculating a corresponding feature relevance score for each of the one or more text fields storing the input content; based on the corresponding calculated feature relevance scores, predicting a corresponding measure of expected model performance for each of the one or more text fields storing the input content using the pre-trained model; and providing the predicted measures of expected model performance for use in feature selection among the one or more text fields storing the input content for generating a machine is learning model to predict the desired target field. 2 . The method of claim 1 , wherein calculating the corresponding feature relevance score for each of the one or more text fields storing the input content includes determining a statistical measurement for each of the one or more text fields. 3 . The method of claim 2 , wherein the statistical measurement is based at least in part on a term frequency-inverse document frequency (TF-IDF) metric. 4 . The method of claim 1 , wherein calculating the corresponding feature relevance score for each of the one or more text fields storing the input content includes generating one or more sample data sets of each of the one or more text fields storing input content. 5 . The method of claim 4 , wherein the one or more generated sample data sets of each of the one or more text fields storing input content are stratified samples. 6 . The method of claim 4 , further comprising determining a relevance score for each of the one or more generated sample data sets. 7 . The method of claim 1 , wherein calculating the corresponding feature relevance score for each of the one or more text fields includes averaging for each of the one or more text fields one or more sampled relevance scores. 8 . The method of claim 1 , wherein predicting the corresponding measure of the expected model performance for each of the one or more text fields storing the input content using the pre-trained model includes applying the pre-trained model to one or more information metrics for each of the one or more text fields. 9 . The method of claim 8 , wherein the one or more information metrics includes a text field density metric. 10 . The method of claim 1 , wherein the calculated feature relevance score for each of the one or more text fields storing the input content is a weighted and normalized relief score. 11 . The method of claim 1 , wherein the corresponding measure of expected model performance for each of the one or more text fields storing the input content is based on an increased amount of an area under a precision-recall curve associated with the machine learning model as compared to a baseline model to predict the desired target field. 12 . The method of claim 1 , further comprising ranking the one or more text fields storing the input content based on the predicted measures of expected model performance for use in the feature selection for generating the machine learning model to predict the desired target field. 13 . The method of claim 1 , wherein the one or more text fields storing the input content include text gathered from an input text field, an email subject, an email body, or a chat dialogue. 14 . A system, comprising: one or more processors; and memory coupled to the one or more processors, wherein the memory is configured to provide the one or more processors with instructions which when executed cause the one or more processors to: generate a pre-trained model trained to predict a measure of expected model performance based at least in part on a feature relevance score associated with a text field data type; receive a specification of a desired target field for machine learning prediction and one or more text fields storing input content; calculate a corresponding feature relevance score for each of the one or more text fields storing the input content; based on the corresponding calculated feature relevance scores, predict a corresponding measure of expected model performance for each of the one or more text fields storing the input content using the pre-trained model; and provide the predicted measures of expected model performance for use in feature selection among the one or more text fields storing the input content for generating a machine learning model to predict the desired target field. 15 . The system of claim 14 , wherein causing the one or more processors to calculate the corresponding feature relevance score for each of the one or more text fields storing the input content includes causing the one or more processors to determine a statistical measurement for each of the one or more text fields, and wherein the statistical measurement is based at least in part on a term frequency-inverse document frequency (TF-IDF) metric. 16 . The system of claim 14 , wherein the memory is further configured to provide the one or more processors with instructions which when executed cause the one or more processors to: generate one or more sample data sets of each of the one or more text fields storing input content; determine a sampled relevance score for each of the one or more generated sample data sets; and for each of the one or more text fields, average one or more determined sampled relevance scores. 17 . The system of claim 14 , wherein causing the one or more processors to predict the corresponding measure of the expected model performance for each of the one or more text fields storing the input content using the pre-trained model includes causing the one or more processors to apply the pre-trained model to one or more information metrics for each of the one or more text fields, and wherein the one or more information metrics includes a text field density metric. 18 . The system of claim 14 , wherein the calculated feature relevance score for each of the one or more text fields storing the input content is a weighted and normalized relief score. 19 . The system of claim 14 , wherein the corresponding measure of expected model performance for each of the one or more text fields storing the input content is based on an increased amount of an area under a precision-recall curve associated with the machine learning model as compared to a baseline model to predict the desired target field. 20 . A computer program product, the computer program product being embodied in a non-transitory computer readable medium and comprising computer instructions for: generating a pre-trained model trained to predict a measure of expected model performance based at least in part on a feature relevance score associated with a text field data type; receiving a specification of a desired target field for machine learning prediction and one or more text fields storing input content; calculating a corresponding feature relevance score for each of the one or more text fields storing the input content; based on the corresponding calculated feature relevance scores, predicting a corresponding measure of expected model performance for each of the one or more text fields storing the input content using the pre-trained model; and providing the predicted measures of expected model p

Assignees

Inventors

Classifications

  • G06N20/00Primary

    Machine learning · CPC title

  • G06N5/04Primary

    Inference or reasoning models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022019918A1 cover?
A pre-trained model trained to predict a measure of expected model performance based at least in part on a feature relevance score associated with a text field data type is generated. A specification of a desired target field for machine learning prediction and one or more text fields storing input content is received. A corresponding feature relevance score for each of the one or more text fie…
Who is the assignee on this patent?
Servicenow Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 20 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).