Constrained prefix matching for generating next token predictions

US12014155B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12014155-B2
Application numberUS-202217847115-A
CountryUS
Kind codeB2
Filing dateJun 22, 2022
Priority dateJun 22, 2022
Publication dateJun 18, 2024
Grant dateJun 18, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Pre-fix matching may constrain the generation of next token predictions. Input text to perform a next token prediction may be received. Multiple tokens may be determined from the input text, including a partial token. From possible tokens, one or more matching possible tokens with the partial token may be identified. Next token predictions may then be filtered using the identified possible tokens in order to ensure that the partial token is matched.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: at least one processor; and a memory storing program instructions that, when executed by the at least one processor, cause the at least one processor to implement a code generation system, the code generation system configured to: receive input programming code to perform a next token prediction for the input programming code; determine word boundaries with respect to a tokenizer for the input programming code where rightmost boundary contains a partial token, the partial token being used as a prompt suffix; identify, from a plurality of tokens, one or more tokens that are a match with the prompt suffix and that start with the prompt suffix or end with the prompt suffix; filter next token predictions according to the one or more tokens, wherein the next token predictions are generated by applying a machine learning model, trained to predict next tokens for a programming code, to a remaining portion of the input programming code that does not include a number of backtrack tokens corresponding to a pre-token, wherein the filtering is performed for one or more iterations to remove, after each iteration, one or more characters from left side of the partial token until there are no remaining characters in the partial token, wherein the one or more characters match one of the next token predictions; and provide a last one of the next token predictions as the next token prediction for the input programming code. 2. The system of claim 1 , wherein to identify, from the plurality of tokens, the one or more tokens that are the match with the partial token, the code generation system is configured to access a trie data structure that stores the plurality of tokens. 3. The system of claim 1 , wherein to identify, from the plurality of tokens, the one or more tokens that are the match with the partial token, the code generation system is configured to access a mask cache that stores the plurality of tokens. 4. The system of claim 1 , wherein the code generation system is implemented as part of a code development service offered by a provider network, and wherein the input programming code is received as part of a request to generate a code suggestion for a code file received from a client of the provider network. 5. A method comprising: receiving, by a text generation system, input text to perform a next token prediction for the input text; determining, by the text generation system for the input text, word boundaries with respect to a tokenizer for the input text where rightmost boundary contains a partial token, the partial token being used as a prompt suffix; identifying, by the text generation system, from a plurality of tokens one or more tokens, that are a match with the prompt suffix and that start with the prompt suffix or end with the prompt suffix; filtering, by the text generation system, next token predictions according to the one or more tokens, wherein the next token predictions are generated by applying a machine learning model, trained to predict next tokens for a text, to a remaining portion of the input text that does not include a number of backtrack tokens corresponding to a pre-token, wherein the filtering is performed for one or more iterations to remove, after each iteration, one or more characters from left side of the partial token until there are no remaining characters in the partial token, wherein the one or more characters match one of the next token predictions; and providing, by the text generation system, a last one of the next token predictions as the next token prediction. 6. The method of claim 5 , wherein the identifying, from the plurality of tokens, the one or more tokens that are the match with the partial token comprises accessing a trie data structure that stores the plurality of tokens. 7. The method of claim 6 , wherein the trie data structure is used to generate the next token predictions using different machine learning models. 8. The method of claim 5 , wherein the identifying, from the plurality of tokens, the one or more tokens that are the match with the partial token comprises accessing a mask cache that stores the plurality of tokens. 9. The method of claim 5 , further comprising determining, by the text generation system, the number of backtrack tokens up to a maximum number of backtrack tokens. 10. The method of claim 5 , wherein the input text is received as part of a request to provide the next token prediction and wherein the next token prediction is provided as a response to the request. 11. The method of claim 5 , wherein the text generation system is implemented as part of an auto completion application. 12. The method of claim 5 , wherein the text generation system is implemented as part of a code development service offered by a provider network and wherein the input text is received as part of a request to generate a code suggestion for a code file received from a client of the provider network. 13. One or more non-transitory computer-readable storage media storing program instructions, that when executed on or across one or more computing devices, cause the one or more computing devices to implement: receiving input text to perform a next token prediction for the input text; determining word boundaries with respect to a tokenizer for the input text where rightmost boundary contains a partial token, the partial token being used as a prompt suffix; identifying, from a plurality of tokens, one or more tokens that are a match with the prompt suffix and that start with the prompt suffix or end with the prompt suffix; filtering next token predictions according to the one or more tokens, wherein the next token predictions are generated by applying a machine learning model, trained to predict next tokens for a text, to a remaining portion of the input text that does not include a number of backtrack tokens corresponding to a pre-token, wherein the filtering is performed for one or more iterations to remove, after each iteration, one or more characters from left side of the partial token until there are no remaining characters in the partial token, wherein the one or more characters match one of the next token predictions; and providing a last one of the next token predictions as the next token prediction for the input text. 14. The one or more non-transitory computer-readable storage media of claim 13 , wherein, in the identifying, from the plurality of tokens, the one or more tokens that are the match with the partial token, the program instructions further cause the one or more computing devices to implement accessing a trie data structure that stores the plurality of tokens. 15. The one or more non-transitory computer-readable storage media of claim 14 , wherein the trie data structure is used to generate the next token predictions using different machine learning models. 16. The one or more non-transitory computer-readable storage media of claim 13 , wherein, in the identifying, from the plurality of tokens, the one or more tokens that are the match with the partial token, the program instructions further cause the one or more computing devices to implement accessing a mask cache that stores the plurality of tokens. 17. The one or more non-transitory computer-readable storage media of claim 13 , storing further program instructions that, when executed on or across the one or more computing devices, cause the one or more computing devices to further implement determining the number of backtrack tokens up to a maximum number of backtrack tokens. 18. The one or more

Assignees

Inventors

Classifications

  • G06F8/33Primary

    Intelligent editors · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12014155B2 cover?
Pre-fix matching may constrain the generation of next token predictions. Input text to perform a next token prediction may be received. Multiple tokens may be determined from the input text, including a partial token. From possible tokens, one or more matching possible tokens with the partial token may be identified. Next token predictions may then be filtered using the identified possible toke…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F8/33. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 18 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).