Retrieval-based, self-supervised augmentation using transformer models

US12360977B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12360977-B2
Application numberUS-202318191896-A
CountryUS
Kind codeB2
Filing dateMar 29, 2023
Priority dateMar 29, 2023
Publication dateJul 15, 2025
Grant dateJul 15, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the invention are directed to a computer system comprising a memory communicatively coupled to a processor system. The processor system is operable to perform processor system operations that include accessing query information associated with a to-be-augmented information set (TBAIS) having a TBAIS format. Query information sequence vectors (QISV) are generated that represent the query information and the TBAIS. Unannotated data repository information sequence vectors (UDRSV) are accessed that represent unannotated data repository information having a plurality of information formats. Matching UDRSV are identified, where the matching UDRSV include the UDRSV that match the QISV. A response to the query information is generated based at least in part on the matching UDRSV.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer system comprising a processor system coupled to a memory, wherein the processor system is configured to perform processor system operations comprising: accessing query information associated with a to-be-augmented information set (TBAIS) having a TBAIS format; applying the query information, the TBAIS, and the TBAIS format to a sequencer to generate a sequence; applying the sequence to a non-parametric transformer of the processor system to generate query information sequence vectors (QISV) that represent the query information and the TBAIS; accessing unannotated data repository information sequence vectors (UDRSV) that represent unannotated data repository information having a plurality of information formats; applying an approximate nearest neighbor (ANN) search technique of the processor system to identify matching UDRSV comprising the UDRSV that match the QISV; wherein the non-parametric transformer and the ANN search technique provide a traceable path to how the ANN search technique identified the matching UDRSV; and generating a response to the query information based at least in part on the matching UDRSV. 2. The computer system of claim 1 , wherein the processor system operations further comprise incorporating the response into the TBAIS to generate an augmented version of the TBAIS. 3. The computer system of claim 1 , wherein the TBAIS format is different from at least one of the plurality of information formats. 4. The computer system of claim 3 , wherein the TBAIS format is selected from the group consisting of a database, a table, and a portable document format (PDF). 5. The computer system of claim 3 , wherein the information format is selected from the group consisting of a database, a table, and a portable document format (PDF). 6. The computer system of claim 1 , wherein the processor system operations further comprise using the traceable path to generate an explanation of how the ANN search technique identified the matching UDRSV. 7. The computer system of claim 1 , wherein the processor system operations further comprise using the traceable path to generate accuracy confirmation information for the matching UDRSV. 8. A computer-implemented method comprising: accessing, using a processor system, query information associated with a to-be-augmented information set (TBAIS) having a TBAIS format; applying the query information, the TBAIS, and the TBAIS format to a sequencer to generate a sequence; applying the sequence to a non-parametric transformer of the processor system to generate query information sequence vectors (QISV) that represent the query information and the TBAIS; accessing, using the processor system, unannotated data repository information sequence vectors (UDRSV) that represent unannotated data repository information having a plurality of information formats; applying an approximate nearest neighbor (ANN) search technique of the processor system to identify matching UDRSV comprising the UDRSV that match the QISV; wherein the non-parametric transformer and the ANN search technique provide a traceable path to how the ANN search technique identified the matching UDRSV; and generating, using the processor system, a response to the query information based at least in part on the matching UDRSV. 9. The computer-implemented method of claim 8 further comprising incorporating the response into the TBAIS to generate an augmented version of the TBAIS. 10. The computer-implemented method of claim 8 , wherein the TBAIS format is different from at least one of the plurality of information formats. 11. The computer-implemented method of claim 10 , wherein the TBAIS format is selected from the group consisting of a database, a table, and a portable document format (PDF). 12. The computer-implemented method of claim 10 , wherein the information format is selected from the group consisting of a database, a table, and a portable document format (PDF). 13. The computer-implemented method of claim 8 further comprising using the traceable path to generate an explanation of how the ANN search technique identified the matching UDRSV. 14. The computer-implemented method of claim 8 further comprising using the traceable path to generate accuracy confirmation information for the matching UDRSV. 15. A computer program product comprising a computer readable program stored on a computer readable storage medium, wherein the computer readable program, when executed on a processor system, causes the processor to perform processor system operations comprising: accessing query information associated with a to-be-augmented information set (TBAIS) having a TBAIS format; applying the query information, the TBAIS, and the TBAIS format to a sequencer to generate a sequence; applying the sequence to a non-parametric transformer of the processor system to generate query information sequence vectors (QISV) that represent the query information and the TBAIS; accessing unannotated data repository information sequence vectors (UDRSV) that represent unannotated data repository information having a plurality of information formats; applying an approximate nearest neighbor (ANN) search technique of the processor system to identify matching UDRSV comprising the UDRSV that match the QISV; wherein the non-parametric transformer and the ANN search technique provide a traceable path to how the ANN search technique identified the matching UDRSV; and generating a response to the query information based at least in part on the matching UDRSV. 16. The computer program product of claim 15 , wherein the processor system operations further comprise incorporating the response into the TBAIS to generate an augmented version of the TBAIS. 17. The computer program product of claim 15 , wherein the TBAIS format is different from at least one of the plurality of information formats. 18. The computer program product of claim 17 , wherein the TBAIS format and the information format are each selected from the group consisting of a database, a table, and a portable document format (PDF). 19. The computer program product of claim 15 , wherein the processor system operations further comprise using the traceable path to generate an explanation of how the ANN search technique identified the matching UDRSV. 20. The computer program product of claim 15 , wherein the processor system operations further comprise using the traceable path to generate accuracy confirmation information for the matching UDRSV.

Assignees

Inventors

Classifications

  • using ranking · CPC title

  • Data format conversion from or to a database · CPC title

  • Tablespace storage structures; Management thereof · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12360977B2 cover?
Embodiments of the invention are directed to a computer system comprising a memory communicatively coupled to a processor system. The processor system is operable to perform processor system operations that include accessing query information associated with a to-be-augmented information set (TBAIS) having a TBAIS format. Query information sequence vectors (QISV) are generated that represent th…
Who is the assignee on this patent?
IBM, Univ Of Massachusetts Amherst
What technology area does this patent fall under?
Primary CPC classification G06F16/2282. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 15 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).