Machine learning collaboration techniques
US-2024420212-A1 · Dec 19, 2024 · US
US2025315613A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025315613-A1 |
| Application number | US-202519085100-A |
| Country | US |
| Kind code | A1 |
| Filing date | Mar 20, 2025 |
| Priority date | Dec 31, 2019 |
| Publication date | Oct 9, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system performs operations that include receiving, via first computing environment, a request to process text data using a first natural language processing (NLP) model. The operations further include accessing configuration data associated with the NLP model, where the configuration data generated using a domain specific language that supports a plurality of preprocessing modules in a plurality of programming languages. The operations also include selecting, based on the configuration data, one or more preprocessing modules of the plurality of preprocessing modules, generating, based on the configuration data, a preprocessing pipeline using the one or more preprocessing modules, and generating preprocessed text data by inputting the text data into the preprocessing pipeline. The preprocessed text data is provided to the first NLP model.
Opening claim text (preview).
1 . (canceled) 2 . A method, comprising: accessing a request to process text data using a Natural Language Processing (NLP) model; accessing a preprocessing pipeline that comprises at least a subset of preprocessing modules of a plurality of preprocessing modules, wherein the subset of preprocessing modules is implemented using a plurality of software toolkits or libraries that are written in a plurality of different computer programming languages, wherein the preprocessing pipeline is generated in a first computing environment; validating, in a second computing environment different from the first computing environment, the preprocessing pipeline at least in part based on configuration data associated with the NLP model; generating preprocessed text data at least in part by inputting the text data into the validated preprocessing pipeline; and executing the NLP model based on the preprocessed text data. 3 . The method of claim 2 , wherein: the first computing environment comprises an offline computing environment; and the second computing environment comprises an online computing environment. 4 . The method of claim 2 , wherein the configuration data is generated using a domain specific language that provides a uniform description of the plurality of different computer programming languages. 5 . The method of claim 4 , wherein the configuration data specifies which of the preprocessing modules of the plurality of preprocessing modules should be included in the subset. 6 . The method of claim 4 , wherein the configuration data specifies a sequence in which the subset of preprocessing modules of the preprocessing pipeline should be used to process the text data. 7 . The method of claim 2 , wherein the preprocessing pipeline is validated without computer code translation. 8 . The method of claim 2 , wherein the preprocessed text data is in a format that is recognizable by the NLP model. 9 . The method of claim 2 , wherein the validating comprises verifying that a result produced by the preprocessing pipeline in the second computing environment is consistent with a result produced by the preprocessing pipeline in the first computing environment. 10 . The method of claim 2 , wherein the subset of preprocessing modules comprises one or more of: an input module configured to receive the text data as an input; a language detection module configured to determine a language of the text data; a sentence detection module configured to identify one or more sentences within the text data; a tokenization module configured to generate one or more tokens from the text data; a cleaning module configured to filter out at least a subset of the one or more tokens generated by the tokenization module; an annotation module configured to categorize the text data into a plurality of different categories; a normalization module configured to normalize the text data into values in a desired value range; and an embedding module configured to convert the one or more tokens generated by the tokenization module into a format that is useable by the NLP model. 11 . The method of claim 2 , wherein before the accessing the request to process the text data, the NLP model is trained in the first computing environment based on the preprocessing pipeline. 12 . The method of claim 11 , wherein the NLP model is further trained based on model architecture information. 13 . The method of claim 2 , wherein one or more of the accessing the request, the accessing the preprocessing pipeline, the validating, the generating, and the executing is performed by one or more hardware processors of a service provider. 14 . A system, comprising: one or more hardware processors; and a memory storing computer-executable instructions, that in response to execution by the one or more hardware processors, causes the system to perform operations comprising: receiving a request to process text data using a Natural Language Processing (NLP) model; accessing configuration data associated with the NLP model, the configuration data describing a pipeline formed by a plurality of preprocessing modules, wherein the plurality of preprocessing modules is implemented using a plurality of software toolkits or libraries that are written in a plurality of different computer programming languages; validating the pipeline at least in part based on the configuration data, wherein the validating comprises verifying that the pipeline generates same results in an offline computing environment and in an online computing environment; generating preprocessed text data at least in part by preprocessing the text data via the validated pipeline; and providing the preprocessed text data to the NLP model. 15 . The system of claim 14 , wherein the configuration data is generated using a domain specific language that provides a uniform description of the plurality of different computer programming languages. 16 . The system of claim 14 , wherein the validating is performed without translating computer code. 17 . The system of claim 14 , wherein: the configuration data specifies a sequence in which the plurality of preprocessing modules are arranged into the pipeline; and the text data is preprocessed by the pipeline according to the sequence specified by the configuration data. 18 . The system of claim 14 , wherein the plurality of preprocessing modules comprises one or more of: an input module configured to receive the text data as an input; a language detection module configured to determine a language of the text data; a sentence detection module configured to identify one or more sentences within the text data; a tokenization module configured to generate one or more tokens from the text data; a cleaning module configured to filter out at least a subset of the one or more tokens generated by the tokenization module; an annotation module configured to categorize the text data into a plurality of different categories; a normalization module configured to normalize the text data into values in a desired value range; and an embedding module configured to convert the one or more tokens generated by the tokenization module into a format that is useable by the NLP model. 19 . A non-transitory computer readable medium storing computer-executable instructions that in response to execution by one or more hardware processors, causes a service provider system to perform operations comprising: accessing a request to process text data using a Natural Language Processing (NLP) model; accessing a preprocessing pipeline that comprises a plurality of preprocessing modules that are arranged in a manner specified by configuration data associated with the NLP model, wherein the plurality of preprocessing modules is implemented using a plurality of software toolkits or libraries that are written in a plurality of different computer programming languages; verifying that the preprocessing pipeline is capable of generating same results in an offline computing environment and in an online computing environment; generating, after the verifying, preprocessed text data at least in part by feeding the text data into the preprocessing pipeline; and causing the NLP model to process the preprocessed text data via the NLP model. 20 . The non-transitory computer readable medium of claim 19 , wherein the configuration data is generated using a domain specific language that provides a uniform description of the plurality of different computer programming languages
Software reuse · CPC title
model driven · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
Phrasal analysis, e.g. finite state techniques or chunking · CPC title
Language identification · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.