Simplified Hash Table
US-2024422006-A1 · Dec 19, 2024 · US
US2025053453A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025053453-A1 |
| Application number | US-202418798221-A |
| Country | US |
| Kind code | A1 |
| Filing date | Aug 8, 2024 |
| Priority date | Aug 8, 2023 |
| Publication date | Feb 13, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The invention relates to computer-implemented systems and methods that implement an innovative generative AI service based on proprietary expertise and industry knowledge. The generative AI service provides unique autonomous features, such as combining separate and distinct LLM responses and prompts to create unique results. Other autonomous features may include an ability to handle scaling and auto deployment of models and rerouting requests autonomously to ensure user load is balanced across the entire globally distributed Generative AI infrastructure estate. The generative AI service may further deploy new Production instances of models on demand by predefined system criteria as well as by explicit user request based on projected demand increase and/or the need for specific instance for further model fine-tuning.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented system for implementing a middleware platform that provides a series of generative AI services, the system comprising: a user interface that is configured to receive one or more requests from a user, via a communication network; a Digital Matrix Application Gateway that is configured to receive the one or more requests and route the one or more requests to a set of API Compute Resources, wherein the set of API Compute Resources comprises one or more API Apps and one or more API App Functions, wherein each of the set of API Compute Resources is configured to make calls to one of a plurality of APIs, wherein the set of API Compute Resources interacts with a plurality of different large language models (LLMs) that collectively generate a response to the one or more requests; a data storage component that reads and writes configuration and operational data associated with the API Compute Resources; an insights analytics processing component that is configured to receive application telemetry data from one or more API Compute Resources; and a log analytics component that stores log data from the insights component. 2 . The system of claim 1 , wherein the response represents a combination of LLM responses from the plurality of LLMs. 3 . The system of claim 1 , wherein the plurality of LLMs have access to a proprietary knowledgebase. 4 . The system of claim 1 , wherein user load balancing is applied across the plurality of different LLMs on an entire globally distributed generative AI infrastructure. 5 . The system of claim 1 , wherein the user interface comprises a generative AI chat interface. 6 . The system of claim 1 , wherein the user interface comprises a cognitive search interface. 7 . The system of claim 1 , wherein each of the plurality of LLMs is independently run and secured in a virtual container. 8 . The system of claim 1 , wherein a copy of a LLM from the plurality of LLMs is created to respond in a predetermined way through a training process to create a tuned version of the LLM. 9 . The system of claim 1 , wherein the log data is globally and centrally managed to determine model usage and performance metrics across the plurality of different LLMs. 10 . The system of claim 1 , wherein the plurality of LLMs are selected based on model optimization. 11 . A computer-implemented method for implementing a middleware platform that provides a series of generative AI services, the method comprising the steps of: receiving, via a user interface, one or more requests from a user, via a communication network; receiving, via a Digital Matrix Application Gateway, the one or more requests and routing the one or more requests to a set of API Compute Resources, wherein the set of API Compute Resources comprises one or more API Apps and one or more API App Functions, wherein each of the set of API Compute Resources is configured to make calls to one of a plurality of APIs, wherein the set of API Compute Resources interacts with a plurality of different large language models (LLMs) that collectively generate a response to the one or more requests; reading and writing, via a data storage component, configuration and operational data associated with the API Compute Resources; receiving, via an insights analytics processing component, application telemetry data from one or more API Compute Resources; storing, via a log analytics component, log data from the insights component; and transmitting, via user interface, the response. 12 . The method of claim 11 , wherein the response represents a combination of LLM responses from the plurality of LLMs. 13 . The method of claim 11 , wherein the plurality of LLMs have access to a proprietary knowledgebase. 14 . The method of claim 11 , wherein user load balancing is applied across the plurality of different LLMs on an entire globally distributed generative AI infrastructure. 15 . The method of claim 11 , wherein the user interface comprises a generative AI chat interface. 16 . The method of claim 11 , wherein the user interface comprises a cognitive search interface. 17 . The method of claim 11 , wherein each of the plurality of LLMs is independently run and secured in a virtual container. 18 . The method of claim 11 , wherein a copy of a LLM from the plurality of LLMs is created to respond in a predetermined way through a training process to create a tuned version of the LLM. 19 . The method of claim 11 , wherein the log data is globally and centrally managed to determine model usage and performance metrics across the plurality of different LLMs. 20 . The method of claim 11 , wherein the plurality of LLMs are selected based on model optimization.
the resource being a machine, e.g. CPUs, Servers, Terminals · CPC title
Machine learning · CPC title
Knowledge engineering; Knowledge acquisition · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.