Language model for processing a multi-mode query input

US2023350936A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2023350936-A1
Application numberUS-202318141337-A
CountryUS
Kind codeA1
Filing dateApr 28, 2023
Priority dateApr 28, 2022
Publication dateNov 2, 2023
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A query processing system is described which receives a query input comprising an input token string and also at least one data item having a second, different modality, and generates a corresponding output token string.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method of generating an output token string based on a query input comprising an input token string and one or more data items, the input token string and output token string being strings of tokens selected from a token vocabulary, and the data items being of a modality other than tokens selected from the token vocabulary, the method comprising: inputting each data item of the query input into a modality network trained, upon receiving a data item of the modality, to generate one or more compressed representations of each data item; generating a prompt input comprising the input token string of the query input; and inputting the prompt input to a data-item-token processing model having a plurality of processing layers arranged as a stack, the output token string being an output of the data-item-token processing model, the processing layers including a plurality of token processing layers and a plurality of gated cross-attention layers, each gated cross-attention layer being arranged to receive at least one of the compressed representations, the token processing layers being interleaved with the gated cross-attention layers. 2 . The computer-implemented method of claim 1 in which the token processing layers are operative to provide together, in the absence of the gated cross-attention layers, a token string processing model, to receive input token strings and to generate corresponding output token strings. 3 . The computer-implemented method of claim 1 , comprising: generating an output token string based on a query input; and at least once performing the set of steps of: based on the query input and the output token string, forming a new query input; and generating a new output token string based on the new query input. 4 . A computer-implemented method of training a query processing system, the query processing system being for generating an output token string based on a query input comprising an input token string and one or more data items, the input token string and output token string being strings of tokens selected from a token vocabulary, and the data items being of a modality other than tokens selected from the token vocabulary, the method employing a token processing model comprising a stack of token processing layers, the stack of token processing layer being configured to receive input token strings and to generate corresponding output token strings, and a database of training examples, each training example comprising at least one data item and at least one token string; the method comprising: forming a data-item-token processing model by interleaving token processing layers from a token processing model with gated cross-attention layers, the data-item-token processing model being configured to generate an output token string upon receiving a prompt input which is a token string, the token processing model comprising a stack of the token processing layers, the stack of token processing layers being configured to receive input token strings and to generate corresponding output token strings, and a database of training examples, each training example comprising at least one data item and at least one token string; forming the query processing system, the query processing system comprising: (a) a modality network configured to receive the data items of the query input, to generate one or more compressed representations of each data item; and (b) the data-item-token processing model, the data-item-token processing model being configured to receive a prompt input comprising the input token string of the query input, and each gated cross-attention layer being arranged to receive at least one of the compressed representations; and using the training database, training: the modality network, and the plurality of gated cross-attention layers. 5 . The computer-implemented method of claim 4 in which the training trains the query processing system, upon an encoder of the modality network receiving the at least one data item of any of the training examples, and the data-item-token processing model receiving a prompt input comprising a first portion of the token string of the training example, to generate an output of the query processing system which is positively statistically correlated with a subsequent portion of the token string of the training example. 6 . The computer-implemented method of claim 4 in which the modality network comprises: an encoder configured to encode a data item received by the encoder to generate an encoded data item, and a compressed representation generation system arranged to receive the encoded data item and generate an output, the output of the modality network being based on the output of the compressed representation generation system. 7 . The computer-implemented method of claim 6 , in which the encoder has been trained to encode a data item received by the encoder to generate an encoded data item, and the training of the modality network and the plurality of gated cross-attention layers comprises training the compressed representation generation system without further training the encoder. 8 . The computer-implemented method of claim 6 , in which the compressed representation generation system comprises a stack of one or more resampler layers, each resampler layer being adapted to perform an attention operation which employs a key vector, a value vector and a query vector, a subset of the key vector, value vector and query vector being based on the encoded data item, and the remainder of the key vector, value vector and query vector being based on either an output of the preceding one of the resampler layers or, in the case of the first resampler layer of the stack, a set of input latent values, the output of the modality network being based on an output of the last resampler layer of the stack of resampler layers. 9 . The computer-implemented method of claim 8 in which the key vector and value vector of each resampler layer are based on the encoded data item and a latent input which is either the output of the preceding one of the resampler layers or, in the case of the first resampler layer of the stack, the set of input latent values, and the query vector is based on the latent input. 10 . The computer-implemented method of claim 8 in which each resampler layer further comprises a perceptron arranged to receive the output of the attention operation, and to generate an output, the output of the modality network being based on the output of the perceptron of the last resampler layer of the stack. 11 . The computer-implemented method of claim 4 , in which the prompt input further comprises one or more corresponding marker items for each data item in the query input, the one or more marker items being indicative of the presence of the data item in the query input. 12 . The computer-implemented method of claim 11 in which a position of each marker item in the prompt input is indicative of a position of the corresponding data item in the query input. 13 . The computer-implemented method of claim 4 in which each gated cross-attention layer generates its output as a component-wise sum of: a first input which is the output of the preceding processing layer in the stack of processing layers or, in the case that the gated cross-attention layer is the first processing layer of the stack of processing layers, the prompt input, and an interaction term based on the output of the compressed representation generation system received by the gated cross-attention layer, and at least part of the first input to the gated cross-attention lay

Assignees

Inventors

Classifications

  • Presentation of query results · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Learning methods · CPC title

  • G06N3/045Primary

    Combinations of networks · CPC title

  • Feedforward networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2023350936A1 cover?
A query processing system is described which receives a query input comprising an input token string and also at least one data item having a second, different modality, and generates a corresponding output token string.
Who is the assignee on this patent?
Deepmind Tech Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/045. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 02 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).