Computer implemented methods for the automated analysis or use of data, including use of a large language model

US12164868B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12164868-B2
Application numberUS-202418648788-A
CountryUS
Kind codeB2
Filing dateApr 29, 2024
Priority dateAug 24, 2021
Publication dateDec 10, 2024
Grant dateDec 10, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

There is provided a computer-implemented method for ensuring that a large language model (LLM) generates original text, including (i) providing or accessing a database of previous text that the LLM should not generate, wherein the database includes text used to train the LLM; (ii) checking potential continuations generated by the LLM against the database; (iii) when a potential continuation generated by the LLM matches text in the database, adjusting the potential continuation generated by the LLM to no longer match that text in the database, to produce an adjusted potential continuation, and (iv) storing the adjusted potential continuation.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method for ensuring that a large language model (LLM) generates original text, including the steps of: (i) providing or accessing a database of previous text that the LLM should not generate, wherein the database includes text used to train the LLM; (ii) checking potential continuations generated by the LLM against the database; (iii) when a potential continuation generated by the LLM matches text in the database, adjusting the potential continuation generated by the LLM to no longer match that text in the database, to produce an adjusted potential continuation, and (iv) storing the adjusted potential continuation. 2. The method of claim 1 , in which the database identifies text where copyright status or source makes risks of reproducing the text identified in the database especially significant. 3. The method of claim 1 , where hashes of text of different lengths are used to enable identification of whether a continuation is present in the database. 4. The method of claim 1 , in which the database includes copyright-protected material, and in which checking sections of the potential continuations generated by the LLM against the database avoids copyright infringement. 5. The method of claim 1 , including performing a beam search. 6. The method of claim 5 , wherein adjusting the potential continuation includes the step of selecting a different sequence from the beam search that doesn't resemble the previous text. 7. The method of claim 1 , further comprising the step of providing a classifier that can distinguish between prompts that require original text as a continuation and those that do not and applying the classifier to a prompt of the prompts. 8. The method of claim 2 , further comprising the step of providing a classifier that can distinguish between prompts that require original text as a continuation and those that do not and applying the classifier to a prompt of the prompts. 9. The method of claim 3 , further comprising the step of providing a classifier that can distinguish between prompts that require original text as a continuation and those that do not and applying the classifier to a prompt of the prompts. 10. The method of claim 4 , further comprising the step of providing a classifier that can distinguish between prompts that require original text as a continuation and those that do not and applying the classifier to a prompt of the prompts. 11. The method of claim 5 , further comprising the step of providing a classifier that can distinguish between prompts that require original text as a continuation and those that do not and applying the classifier to a prompt of the prompts. 12. The method of claim 6 , further comprising the step of providing a classifier that can distinguish between prompts that require original text as a continuation and those that do not and applying the classifier to a prompt of the prompts. 13. The method of claim 7 , wherein the step of providing a classifier that can distinguish between prompts that require original text as a continuation and those that do not and applying the classifier to a prompt of the prompts, is performed using a GPU. 14. The method of claim 1 , where checking potential continuations against the database includes close but imperfect matches to the previous text. 15. The method of claim 1 , the method including a computer implemented method for automated analysis or use of data, comprising the steps of: (a) storing in a memory a structured, machine-readable representation of data that conforms to a machine-readable language; the structured, machine-readable representation of data including representations of social media postings; (b) automatically processing the structured representation of data to determine if the social media postings are compliant with requirements preventing copyright abusive social media postings. 16. The method of claim 15 , wherein the structured, machine-readable representation of data that conforms to a machine-readable language comprises semantic nodes and passages; and in which a semantic node represents an entity and is itself represented by an identifier; and a passage is either (i) a semantic node or (ii) a combination of semantic nodes; and where machine-readable meaning comes from choice of semantic nodes and a way they are combined and ordered as passages. 17. The method of claim 15 , wherein the processing includes determining whether the social media postings are factually true. 18. The method of claim 15 , wherein the processing includes determining whether the social media postings are illegal. 19. The method of claim 15 , wherein the machine-readable representation of data further includes at least a partial representation of the requirements preventing copyright abusive postings and the processing references the representation of the requirements. 20. The method of claim 15 , wherein the processing additionally generates a natural language explanation of why a social media posting of the social media postings is not compliant with the requirements. 21. The method of claim 15 , wherein the processing additionally applies statistical machine-learning models to the social media postings and uses results of the statistical machine-learning models. 22. The method of claim 1 , in which continuation data from the LLM is a partial continuation, namely an output made before the LLM has stopped generating or whilst the LLM is still generating. 23. The method of claim 1 , in which the LLM is an autoregressive language model. 24. The method of claim 1 , in which the LLM is an autoregressive language model, which is a Generative Pre-trained Transformer. 25. The method of claim 1 , when used for generation of program code. 26. The method of claim 1 , when used for any of the following: generation of poetry, lyrics, creative writing, generation of other forms of writing, writing essays, writing summaries of knowledge, writing summaries of longer texts, writing scientific papers. 27. The method of claim 1 , when used for internet search. 28. The method of claim 1 , including avoiding copyright infringement or other intellectual property breaches. 29. The method of claim 1 , wherein the checking against the database in step (ii) is performed using a CPU. 30. The method of claim 1 , wherein generation of potential continuations by the LLM is performed using a GPU.

Assignees

Inventors

Classifications

  • Natural language generation · CPC title

  • using natural language analysis · CPC title

  • G06F40/20Primary

    Natural language analysis (semantic analysis of natural language G06F40/30) · CPC title

  • Semantic analysis · CPC title

  • Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12164868B2 cover?
There is provided a computer-implemented method for ensuring that a large language model (LLM) generates original text, including (i) providing or accessing a database of previous text that the LLM should not generate, wherein the database includes text used to train the LLM; (ii) checking potential continuations generated by the LLM against the database; (iii) when a potential continuation gen…
Who is the assignee on this patent?
Unlikely Artificial Intelligence Ltd
What technology area does this patent fall under?
Primary CPC classification G06F16/3344. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 10 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).