Summary generation for a distributed graph database
US-2024184827-A1 · Jun 6, 2024 · US
US12411879B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12411879-B2 |
| Application number | US-202418924763-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 23, 2024 |
| Priority date | Oct 24, 2023 |
| Publication date | Sep 9, 2025 |
| Grant date | Sep 9, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In an example, a method for fine-tuning a Large Visual Language Model (LVLM) includes providing visual queries, each of the visual queries comprises at least an image and a textual query related to the image; processing, by the LVLM, the visual queries to extract visual embeddings from the visual queries, wherein the LVLM comprises a Visual Language Model (VLM), a first Large Language Model (LLM), and a linear projection layer interconnecting the VLM and the LLM; for visual queries: i) generating, by the LVLM, a response to the corresponding visual query based on the corresponding visual embedding; ii) evaluating, by a second LLM, the generated response to verify that the generated response satisfies predefined criteria; and iii) providing, by the second LLM, a feedback to the LVLM, in response to the evaluating the generated response; and fine-tuning the LVLM using aggregated feedback provided by the second LLM for the visual queries.
Opening claim text (preview).
What is claimed is: 1. A method for fine-tuning a Large Visual Language Model (LVLM), the method comprising: providing a plurality of visual queries, wherein each of the plurality of visual queries comprises at least an image and a textual query related to the image; processing, by the LVLM, the plurality of visual queries to extract one or more visual embeddings from each of the plurality of visual queries, wherein the LVLM comprises a Visual Language Model (VLM), a first Large Language Model (LLM), and a linear projection layer interconnecting the VLM and the LLM; for each of the plurality of visual queries: i) generating, by the LVLM, a response to the corresponding visual query based on the corresponding one or more visual embeddings; ii) evaluating, by a second LLM, the generated response to verify that the generated response satisfies one or more predefined criteria; and iii) providing, by the second LLM, a feedback to the LVLM, in response to the evaluating the generated response; and fine-tuning the LVLM using aggregated feedback provided by the second LLM for the plurality of visual queries. 2. The method of claim 1 , wherein the one or more predefined criteria comprise at least one of helpfulness, honesty, and harmlessness. 3. The method of claim 1 , wherein the feedback comprises a Natural Language Feedback (NLF). 4. The method of claim 3 , wherein the feedback comprises at least a numerical score, critique feedback, and refinement feedback. 5. The method of claim 4 , wherein the refinement feedback suggests improvements or modifications to the generated response. 6. The method of claim 3 , further comprising training the LVLM using the NLF. 7. The method of claim 6 , wherein training the LVLM further comprises: training the LVLM using the NLF incorporated into a conditional Reinforcement Learning (RL) algorithm. 8. A computing system for fine-tuning a Large Visual Language Model (LVLM), the computing system comprising: processing circuitry in communication with storage media, the processing circuitry configured to execute a machine learning system comprising the LVLM, the processing circuitry configured to: provide a plurality of visual queries, wherein each of the plurality of visual queries comprises at least an image and a textual query related to the image; process, by the LVLM, the plurality of visual queries to extract one or more visual embeddings from each of the plurality of visual queries, wherein the LVLM comprises a Visual Language Model (VLM), a first Large Language Model (LLM), and a linear projection layer interconnecting the VLM and the LLM; for each of the plurality of visual queries: i) generate, by the LVLM, a response to the corresponding visual query based on the corresponding one or more visual embeddings; ii) evaluate, by a second LLM, the generated response to verify that the generated response satisfies one or more predefined criteria; and iii) provide, by the second LLM, a feedback to the LVLM, in response to the evaluating the generated response; and fine-tune the LVLM using aggregated feedback provided by the second LLM for the plurality of visual queries. 9. The system of claim 8 , wherein the one or more predefined criteria comprise at least one of helpfulness, honesty, and harmlessness. 10. The system of claim 8 , wherein the feedback comprises a Natural Language Feedback (NLF). 11. The system of claim 10 , wherein the feedback comprises at least a numerical score, critique feedback, and refinement feedback. 12. The system of claim 11 , wherein the refinement feedback suggests improvements or modifications to the generated response. 13. The system of claim 10 , the processing circuitry further configured to: train the LVLM using the NLF. 14. The system of claim 13 , wherein the processing circuitry configured to train the LVLM is further configured to: train the LVLM using the NLF incorporated into a conditional Reinforcement Learning (RL) algorithm. 15. Non-transitory computer-readable storage media having instructions encoded thereon, the instructions configured to cause processing circuitry to: provide a plurality of visual queries, wherein each of the plurality of visual queries comprises at least an image and a textual query related to the image; process, by a Large Visual Language Model (LVLM), the plurality of visual queries to extract one or more visual embeddings from each of the plurality of visual queries, wherein the LVLM comprises a Visual Language Model (VLM), a first Large Language Model (LLM), and a linear projection layer interconnecting the VLM and the LLM; for each of the plurality of visual queries: i) generate, by the LVLM, a response to the corresponding visual query based on the corresponding one or more visual embeddings; ii) evaluate, by a second LLM, the generated response to verify that the generated response satisfies one or more predefined criteria; and iii) provide, by the second LLM, a feedback to the LVLM, in response to the evaluating the generated response; and fine-tune the LVLM using aggregated feedback provided by the second LLM for the plurality of visual queries. 16. The storage media of claim 15 , wherein the one or more predefined criteria comprise at least one of helpfulness, honesty, and harmlessness. 17. The storage media of claim 15 , wherein the feedback comprises a Natural Language Feedback (NLF). 18. The storage media of claim 17 , wherein the feedback comprises at least a numerical score, critique feedback, and refinement feedback. 19. The storage media of claim 18 , wherein the refinement feedback suggests improvements or modifications to the generated response. 20. The storage media of claim 17 , the instructions further configured to cause processing circuitry to: train the LVLM using the NLF.
using natural language analysis · CPC title
Query formulation, e.g. graphical querying · CPC title
Presentation of query results · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.