Framework for managing natural language processing tools

US12282736B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12282736-B2
Application numberUS-202117361073-A
CountryUS
Kind codeB2
Filing dateJun 28, 2021
Priority dateDec 31, 2019
Publication dateApr 22, 2025
Grant dateApr 22, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system performs operations that include receiving, via first computing environment, a request to process text data using a first natural language processing (NLP) model. The operations further include accessing configuration data associated with the NLP model, where the configuration data generated using a domain specific language that supports a plurality of preprocessing modules in a plurality of programming languages. The operations also include selecting, based on the configuration data, one or more preprocessing modules of the plurality of preprocessing modules, generating, based on the configuration data, a preprocessing pipeline using the one or more preprocessing modules, and generating preprocessed text data by inputting the text data into the preprocessing pipeline. The preprocessed text data is provided to the first NLP model.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system, comprising: one or more hardware processors; and a memory storing computer-executable instructions, that in response to execution by the one or more hardware processors, causes the system to perform operations comprising: generating, in an offline computing environment and using a domain specific language that provides a uniform way to designate a plurality of computer programming languages usable to create preprocessing modules, first model configuration data associated with a first natural language processing (NLP) model and second model configuration data associated with a second NLP model, wherein the first model configuration data or the second model configuration data is usable to: 1) Define a preprocessing pipeline that includes a subset of the preprocessing modules; and 2) indicate one or more software toolkits or libraries usable to implement each of the preprocessing modules; receiving, via an online computing environment that is different than the offline computing environment, a first request to process first data using the first NLP model and receiving a second request to process second data using the second NLP model; accessing the first model configuration data and the second model configuration data; responsive to the first request, executing the first NLP model at least in part by determining, based on the first model configuration data, a first set of preprocessing modules from the preprocessing modules to preprocess the first data; and responsive to the second request, executing the second NLP model at least in part by determining, based on the second model configuration data, a second set of preprocessing modules from the preprocessing modules to preprocess the second data, the second set of preprocessing modules different than the first set of preprocessing modules. 2. The system of claim 1 , wherein the first set of preprocessing modules are implemented using the one or more software toolkits or libraries of a first computer programming language of the plurality of computer programming languages, and the second set of preprocessing modules are implemented using the one or more software toolkits or libraries of a second computer programming language of the plurality of computer programming languages that is different than the first computer programming language. 3. The system of claim 1 , wherein the first set of preprocessing modules and the second set of preprocessing modules are different subsets of the preprocessing modules, the preprocessing modules comprising at least one of a language detection module, a sentence detection module, a tokenization module, a cleaning module, an annotation module, a normalization module, or an embedding module. 4. The system of claim 1 , wherein the preprocessing pipeline includes a first preprocessing pipeline and a second preprocessing pipeline, wherein the first model configuration data defines the first preprocessing pipeline that specifies a first execution sequence of the first set of preprocessing modules, and the second model configuration data defines the second preprocessing pipeline that specifies a second execution sequence of the second set of preprocessing modules. 5. The system of claim 4 , wherein the first set of preprocessing modules includes the same modules as the second set of preprocessing modules, and wherein the first execution sequence is different from the second execution sequence. 6. The system of claim 1 , wherein the executing the first set of preprocessing modules for the first NLP model further comprises inputting the first model configuration data into a software engine that is programmed in the domain specific language. 7. The system of claim 6 , wherein the operations further comprise: automatically validating, by the software engine programmed in the domain specific language, the first set of preprocessing modules with sample data. 8. The system of claim 1 , wherein the online computing environment comprises at least one different computing characteristic than the offline computing environment, the at least one different computing characteristic including at least one of different computer hardware, a different operating system, a different software version for one or more software libraries, a different database topography, or access to different data. 9. The system of claim 1 , wherein the operations further comprise: training the first NLP model and the second NLP model in the offline computing environment. 10. A method, comprising: receiving, via a first computing environment, a request to process text data using a first natural language processing (NLP) model; accessing configuration data associated with the NLP model, the configuration data generated using a domain specific language that supports a plurality of preprocessing modules in a plurality of computer programming languages, wherein the domain specific language provides a uniform description of the plurality of computer programming languages, wherein the configuration data indicates one or more of the plurality of preprocessing modules to be included as a part of a preprocessing pipeline, and one or more specific software toolkits or libraries associated with a first computer programming language of the plurality of computer programming languages usable to generate the preprocessing pipeline; generating, based on the configuration data, the preprocessing pipeline using the one or more of the plurality of preprocessing modules; generating preprocessed text data at least in part by inputting the text data into the preprocessing pipeline; and providing the preprocessed text data to the first NLP model. 11. The method of claim 10 , wherein the preprocessing pipeline defines a sequence of the one or more of the plurality of preprocessing modules. 12. The method of claim 10 , wherein the one or more of the plurality of preprocessing modules comprise at least one of a language detection module, a sentence detection module, a tokenization module, a cleaning module, an annotation module, a normalization module, or an embedding module. 13. The method of claim 10 , wherein the configuration data is generated in a second computing environment that is different from the first computing environment. 14. The method of claim 13 , wherein at least one of the first computing environment or the second computing environment includes a processing engine configured to process the configuration data in the domain specific language. 15. The method of claim 10 , wherein the preprocessing pipeline comprises a first preprocessing pipeline, and wherein the method further comprises: generating a second preprocessing pipeline for a second NLP model based on second configuration data, the second preprocessing pipeline including a different set of the one or more preprocessing modules that are programmed in a second computer programming language of the plurality of computer programming languages, wherein the second preprocessing pipeline includes at least one different type of preprocessing module than the first preprocessing pipeline. 16. The method of claim 15 , wherein the domain specific language provides a uniform way to describe or designate the one or more specific software toolkits or libraries usable to generate the preprocessing pipeline. 17. A non-transitory computer readable medium storing computer-executable instructions that in response to execution by one or more hardware processors, causes a service provider system to perform operations comprising: receiving, via a first computing environ

Assignees

Inventors

Classifications

  • Software reuse · CPC title

  • model driven · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Phrasal analysis, e.g. finite state techniques or chunking · CPC title

  • Language identification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12282736B2 cover?
A system performs operations that include receiving, via first computing environment, a request to process text data using a first natural language processing (NLP) model. The operations further include accessing configuration data associated with the NLP model, where the configuration data generated using a domain specific language that supports a plurality of preprocessing modules in a plural…
Who is the assignee on this patent?
Paypal Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 22 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).