Framework for Managing Natural Language Processing Tools

US2025315613A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025315613-A1
Application numberUS-202519085100-A
CountryUS
Kind codeA1
Filing dateMar 20, 2025
Priority dateDec 31, 2019
Publication dateOct 9, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system performs operations that include receiving, via first computing environment, a request to process text data using a first natural language processing (NLP) model. The operations further include accessing configuration data associated with the NLP model, where the configuration data generated using a domain specific language that supports a plurality of preprocessing modules in a plurality of programming languages. The operations also include selecting, based on the configuration data, one or more preprocessing modules of the plurality of preprocessing modules, generating, based on the configuration data, a preprocessing pipeline using the one or more preprocessing modules, and generating preprocessed text data by inputting the text data into the preprocessing pipeline. The preprocessed text data is provided to the first NLP model.

First claim

Opening claim text (preview).

1 . (canceled) 2 . A method, comprising: accessing a request to process text data using a Natural Language Processing (NLP) model; accessing a preprocessing pipeline that comprises at least a subset of preprocessing modules of a plurality of preprocessing modules, wherein the subset of preprocessing modules is implemented using a plurality of software toolkits or libraries that are written in a plurality of different computer programming languages, wherein the preprocessing pipeline is generated in a first computing environment; validating, in a second computing environment different from the first computing environment, the preprocessing pipeline at least in part based on configuration data associated with the NLP model; generating preprocessed text data at least in part by inputting the text data into the validated preprocessing pipeline; and executing the NLP model based on the preprocessed text data. 3 . The method of claim 2 , wherein: the first computing environment comprises an offline computing environment; and the second computing environment comprises an online computing environment. 4 . The method of claim 2 , wherein the configuration data is generated using a domain specific language that provides a uniform description of the plurality of different computer programming languages. 5 . The method of claim 4 , wherein the configuration data specifies which of the preprocessing modules of the plurality of preprocessing modules should be included in the subset. 6 . The method of claim 4 , wherein the configuration data specifies a sequence in which the subset of preprocessing modules of the preprocessing pipeline should be used to process the text data. 7 . The method of claim 2 , wherein the preprocessing pipeline is validated without computer code translation. 8 . The method of claim 2 , wherein the preprocessed text data is in a format that is recognizable by the NLP model. 9 . The method of claim 2 , wherein the validating comprises verifying that a result produced by the preprocessing pipeline in the second computing environment is consistent with a result produced by the preprocessing pipeline in the first computing environment. 10 . The method of claim 2 , wherein the subset of preprocessing modules comprises one or more of: an input module configured to receive the text data as an input; a language detection module configured to determine a language of the text data; a sentence detection module configured to identify one or more sentences within the text data; a tokenization module configured to generate one or more tokens from the text data; a cleaning module configured to filter out at least a subset of the one or more tokens generated by the tokenization module; an annotation module configured to categorize the text data into a plurality of different categories; a normalization module configured to normalize the text data into values in a desired value range; and an embedding module configured to convert the one or more tokens generated by the tokenization module into a format that is useable by the NLP model. 11 . The method of claim 2 , wherein before the accessing the request to process the text data, the NLP model is trained in the first computing environment based on the preprocessing pipeline. 12 . The method of claim 11 , wherein the NLP model is further trained based on model architecture information. 13 . The method of claim 2 , wherein one or more of the accessing the request, the accessing the preprocessing pipeline, the validating, the generating, and the executing is performed by one or more hardware processors of a service provider. 14 . A system, comprising: one or more hardware processors; and a memory storing computer-executable instructions, that in response to execution by the one or more hardware processors, causes the system to perform operations comprising: receiving a request to process text data using a Natural Language Processing (NLP) model; accessing configuration data associated with the NLP model, the configuration data describing a pipeline formed by a plurality of preprocessing modules, wherein the plurality of preprocessing modules is implemented using a plurality of software toolkits or libraries that are written in a plurality of different computer programming languages; validating the pipeline at least in part based on the configuration data, wherein the validating comprises verifying that the pipeline generates same results in an offline computing environment and in an online computing environment; generating preprocessed text data at least in part by preprocessing the text data via the validated pipeline; and providing the preprocessed text data to the NLP model. 15 . The system of claim 14 , wherein the configuration data is generated using a domain specific language that provides a uniform description of the plurality of different computer programming languages. 16 . The system of claim 14 , wherein the validating is performed without translating computer code. 17 . The system of claim 14 , wherein: the configuration data specifies a sequence in which the plurality of preprocessing modules are arranged into the pipeline; and the text data is preprocessed by the pipeline according to the sequence specified by the configuration data. 18 . The system of claim 14 , wherein the plurality of preprocessing modules comprises one or more of: an input module configured to receive the text data as an input; a language detection module configured to determine a language of the text data; a sentence detection module configured to identify one or more sentences within the text data; a tokenization module configured to generate one or more tokens from the text data; a cleaning module configured to filter out at least a subset of the one or more tokens generated by the tokenization module; an annotation module configured to categorize the text data into a plurality of different categories; a normalization module configured to normalize the text data into values in a desired value range; and an embedding module configured to convert the one or more tokens generated by the tokenization module into a format that is useable by the NLP model. 19 . A non-transitory computer readable medium storing computer-executable instructions that in response to execution by one or more hardware processors, causes a service provider system to perform operations comprising: accessing a request to process text data using a Natural Language Processing (NLP) model; accessing a preprocessing pipeline that comprises a plurality of preprocessing modules that are arranged in a manner specified by configuration data associated with the NLP model, wherein the plurality of preprocessing modules is implemented using a plurality of software toolkits or libraries that are written in a plurality of different computer programming languages; verifying that the preprocessing pipeline is capable of generating same results in an offline computing environment and in an online computing environment; generating, after the verifying, preprocessed text data at least in part by feeding the text data into the preprocessing pipeline; and causing the NLP model to process the preprocessed text data via the NLP model. 20 . The non-transitory computer readable medium of claim 19 , wherein the configuration data is generated using a domain specific language that provides a uniform description of the plurality of different computer programming languages

Assignees

Inventors

Classifications

  • Software reuse · CPC title

  • model driven · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Phrasal analysis, e.g. finite state techniques or chunking · CPC title

  • Language identification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025315613A1 cover?
A system performs operations that include receiving, via first computing environment, a request to process text data using a first natural language processing (NLP) model. The operations further include accessing configuration data associated with the NLP model, where the configuration data generated using a domain specific language that supports a plurality of preprocessing modules in a plural…
Who is the assignee on this patent?
Paypal Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Oct 09 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).