What technology area does this patent fall under?

Primary CPC classification G06N3/045. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Nov 02 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Language model for processing a multi-mode query input

US2023350936A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2023350936-A1
Application number	US-202318141337-A
Country	US
Kind code	A1
Filing date	Apr 28, 2023
Priority date	Apr 28, 2022
Publication date	Nov 2, 2023
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A query processing system is described which receives a query input comprising an input token string and also at least one data item having a second, different modality, and generates a corresponding output token string.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method of generating an output token string based on a query input comprising an input token string and one or more data items, the input token string and output token string being strings of tokens selected from a token vocabulary, and the data items being of a modality other than tokens selected from the token vocabulary, the method comprising: inputting each data item of the query input into a modality network trained, upon receiving a data item of the modality, to generate one or more compressed representations of each data item; generating a prompt input comprising the input token string of the query input; and inputting the prompt input to a data-item-token processing model having a plurality of processing layers arranged as a stack, the output token string being an output of the data-item-token processing model, the processing layers including a plurality of token processing layers and a plurality of gated cross-attention layers, each gated cross-attention layer being arranged to receive at least one of the compressed representations, the token processing layers being interleaved with the gated cross-attention layers. 2 . The computer-implemented method of claim 1 in which the token processing layers are operative to provide together, in the absence of the gated cross-attention layers, a token string processing model, to receive input token strings and to generate corresponding output token strings. 3 . The computer-implemented method of claim 1 , comprising: generating an output token string based on a query input; and at least once performing the set of steps of: based on the query input and the output token string, forming a new query input; and generating a new output token string based on the new query input. 4 . A computer-implemented method of training a query processing system, the query processing system being for generating an output token string based on a query input comprising an input token string and one or more data items, the input token string and output token string being strings of tokens selected from a token vocabulary, and the data items being of a modality other than tokens selected from the token vocabulary, the method employing a token processing model comprising a stack of token processing layers, the stack of token processing layer being configured to receive input token strings and to generate corresponding output token strings, and a database of training examples, each training example comprising at least one data item and at least one token string; the method comprising: forming a data-item-token processing model by interleaving token processing layers from a token processing model with gated cross-attention layers, the data-item-token processing model being configured to generate an output token string upon receiving a prompt input which is a token string, the token processing model comprising a stack of the token processing layers, the stack of token processing layers being configured to receive input token strings and to generate corresponding output token strings, and a database of training examples, each training example comprising at least one data item and at least one token string; forming the query processing system, the query processing system comprising: (a) a modality network configured to receive the data items of the query input, to generate one or more compressed representations of each data item; and (b) the data-item-token processing model, the data-item-token processing model being configured to receive a prompt input comprising the input token string of the query input, and each gated cross-attention layer being arranged to receive at least one of the compressed representations; and using the training database, training: the modality network, and the plurality of gated cross-attention layers. 5 . The computer-implemented method of claim 4 in which the training trains the query processing system, upon an encoder of the modality network receiving the at least one data item of any of the training examples, and the data-item-token processing model receiving a prompt input comprising a first portion of the token string of the training example, to generate an output of the query processing system which is positively statistically correlated with a subsequent portion of the token string of the training example. 6 . The computer-implemented method of claim 4 in which the modality network comprises: an encoder configured to encode a data item received by the encoder to generate an encoded data item, and a compressed representation generation system arranged to receive the encoded data item and generate an output, the output of the modality network being based on the output of the compressed representation generation system. 7 . The computer-implemented method of claim 6 , in which the encoder has been trained to encode a data item received by the encoder to generate an encoded data item, and the training of the modality network and the plurality of gated cross-attention layers comprises training the compressed representation generation system without further training the encoder. 8 . The computer-implemented method of claim 6 , in which the compressed representation generation system comprises a stack of one or more resampler layers, each resampler layer being adapted to perform an attention operation which employs a key vector, a value vector and a query vector, a subset of the key vector, value vector and query vector being based on the encoded data item, and the remainder of the key vector, value vector and query vector being based on either an output of the preceding one of the resampler layers or, in the case of the first resampler layer of the stack, a set of input latent values, the output of the modality network being based on an output of the last resampler layer of the stack of resampler layers. 9 . The computer-implemented method of claim 8 in which the key vector and value vector of each resampler layer are based on the encoded data item and a latent input which is either the output of the preceding one of the resampler layers or, in the case of the first resampler layer of the stack, the set of input latent values, and the query vector is based on the latent input. 10 . The computer-implemented method of claim 8 in which each resampler layer further comprises a perceptron arranged to receive the output of the attention operation, and to generate an output, the output of the modality network being based on the output of the perceptron of the last resampler layer of the stack. 11 . The computer-implemented method of claim 4 , in which the prompt input further comprises one or more corresponding marker items for each data item in the query input, the one or more marker items being indicative of the presence of the data item in the query input. 12 . The computer-implemented method of claim 11 in which a position of each marker item in the prompt input is indicative of a position of the corresponding data item in the query input. 13 . The computer-implemented method of claim 4 in which each gated cross-attention layer generates its output as a component-wise sum of: a first input which is the output of the preceding processing layer in the stack of processing layers or, in the case that the gated cross-attention layer is the first processing layer of the stack of processing layers, the prompt input, and an interaction term based on the output of the compressed representation generation system received by the gated cross-attention layer, and at least part of the first input to the gated cross-attention lay

Assignees

Deepmind Tech Ltd

Inventors

Classifications

G06F16/438
Presentation of query results · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/08
Learning methods · CPC title
G06N3/045Primary
Combinations of networks · CPC title
G06N3/0499
Feedforward networks · CPC title

Patent family

Related publications grouped by family.

View patent family 86330887

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2023350936A1 cover?: A query processing system is described which receives a query input comprising an input token string and also at least one data item having a second, different modality, and generates a corresponding output token string.
Who is the assignee on this patent?: Deepmind Tech Ltd
What technology area does this patent fall under?: Primary CPC classification G06N3/045. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Nov 02 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Unified data processing across streaming and indexed data sets

Routing data between processing pipelines via a user defined data stream

Token Packing for Sequence Models

Frequently asked questions