Search interface with search query history based functionality
US-2017124220-A1 · May 4, 2017 · US
US12164868B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12164868-B2 |
| Application number | US-202418648788-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 29, 2024 |
| Priority date | Aug 24, 2021 |
| Publication date | Dec 10, 2024 |
| Grant date | Dec 10, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
There is provided a computer-implemented method for ensuring that a large language model (LLM) generates original text, including (i) providing or accessing a database of previous text that the LLM should not generate, wherein the database includes text used to train the LLM; (ii) checking potential continuations generated by the LLM against the database; (iii) when a potential continuation generated by the LLM matches text in the database, adjusting the potential continuation generated by the LLM to no longer match that text in the database, to produce an adjusted potential continuation, and (iv) storing the adjusted potential continuation.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method for ensuring that a large language model (LLM) generates original text, including the steps of: (i) providing or accessing a database of previous text that the LLM should not generate, wherein the database includes text used to train the LLM; (ii) checking potential continuations generated by the LLM against the database; (iii) when a potential continuation generated by the LLM matches text in the database, adjusting the potential continuation generated by the LLM to no longer match that text in the database, to produce an adjusted potential continuation, and (iv) storing the adjusted potential continuation. 2. The method of claim 1 , in which the database identifies text where copyright status or source makes risks of reproducing the text identified in the database especially significant. 3. The method of claim 1 , where hashes of text of different lengths are used to enable identification of whether a continuation is present in the database. 4. The method of claim 1 , in which the database includes copyright-protected material, and in which checking sections of the potential continuations generated by the LLM against the database avoids copyright infringement. 5. The method of claim 1 , including performing a beam search. 6. The method of claim 5 , wherein adjusting the potential continuation includes the step of selecting a different sequence from the beam search that doesn't resemble the previous text. 7. The method of claim 1 , further comprising the step of providing a classifier that can distinguish between prompts that require original text as a continuation and those that do not and applying the classifier to a prompt of the prompts. 8. The method of claim 2 , further comprising the step of providing a classifier that can distinguish between prompts that require original text as a continuation and those that do not and applying the classifier to a prompt of the prompts. 9. The method of claim 3 , further comprising the step of providing a classifier that can distinguish between prompts that require original text as a continuation and those that do not and applying the classifier to a prompt of the prompts. 10. The method of claim 4 , further comprising the step of providing a classifier that can distinguish between prompts that require original text as a continuation and those that do not and applying the classifier to a prompt of the prompts. 11. The method of claim 5 , further comprising the step of providing a classifier that can distinguish between prompts that require original text as a continuation and those that do not and applying the classifier to a prompt of the prompts. 12. The method of claim 6 , further comprising the step of providing a classifier that can distinguish between prompts that require original text as a continuation and those that do not and applying the classifier to a prompt of the prompts. 13. The method of claim 7 , wherein the step of providing a classifier that can distinguish between prompts that require original text as a continuation and those that do not and applying the classifier to a prompt of the prompts, is performed using a GPU. 14. The method of claim 1 , where checking potential continuations against the database includes close but imperfect matches to the previous text. 15. The method of claim 1 , the method including a computer implemented method for automated analysis or use of data, comprising the steps of: (a) storing in a memory a structured, machine-readable representation of data that conforms to a machine-readable language; the structured, machine-readable representation of data including representations of social media postings; (b) automatically processing the structured representation of data to determine if the social media postings are compliant with requirements preventing copyright abusive social media postings. 16. The method of claim 15 , wherein the structured, machine-readable representation of data that conforms to a machine-readable language comprises semantic nodes and passages; and in which a semantic node represents an entity and is itself represented by an identifier; and a passage is either (i) a semantic node or (ii) a combination of semantic nodes; and where machine-readable meaning comes from choice of semantic nodes and a way they are combined and ordered as passages. 17. The method of claim 15 , wherein the processing includes determining whether the social media postings are factually true. 18. The method of claim 15 , wherein the processing includes determining whether the social media postings are illegal. 19. The method of claim 15 , wherein the machine-readable representation of data further includes at least a partial representation of the requirements preventing copyright abusive postings and the processing references the representation of the requirements. 20. The method of claim 15 , wherein the processing additionally generates a natural language explanation of why a social media posting of the social media postings is not compliant with the requirements. 21. The method of claim 15 , wherein the processing additionally applies statistical machine-learning models to the social media postings and uses results of the statistical machine-learning models. 22. The method of claim 1 , in which continuation data from the LLM is a partial continuation, namely an output made before the LLM has stopped generating or whilst the LLM is still generating. 23. The method of claim 1 , in which the LLM is an autoregressive language model. 24. The method of claim 1 , in which the LLM is an autoregressive language model, which is a Generative Pre-trained Transformer. 25. The method of claim 1 , when used for generation of program code. 26. The method of claim 1 , when used for any of the following: generation of poetry, lyrics, creative writing, generation of other forms of writing, writing essays, writing summaries of knowledge, writing summaries of longer texts, writing scientific papers. 27. The method of claim 1 , when used for internet search. 28. The method of claim 1 , including avoiding copyright infringement or other intellectual property breaches. 29. The method of claim 1 , wherein the checking against the database in step (ii) is performed using a CPU. 30. The method of claim 1 , wherein generation of potential continuations by the LLM is performed using a GPU.
Natural language generation · CPC title
using natural language analysis · CPC title
Natural language analysis (semantic analysis of natural language G06F40/30) · CPC title
Semantic analysis · CPC title
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.