Methods and systems for modeling of design representation in a library of editing cassettes

US11566241B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11566241-B2
Application numberUS-202117492435-A
CountryUS
Kind codeB2
Filing dateOct 1, 2021
Priority dateOct 2, 2020
Publication dateJan 31, 2023
Grant dateJan 31, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed systems and methods relate to predicting the relative representation of genomic variants in an edited cell population, based on the editing cassette design representation in an editing cassette design library used to generate the edited cell population. A library of editing cassette designs is generated, and a feature vector, or sequence embedding, is developed for each design using natural language processing techniques. The feature vector may be based upon sequence attributes and editing kinetics of each cassette design as well as attributes that describe the library context. Features may include sequence embeddings generated from a neural network, linguistic-type distances, and statistical distance summaries thereof. The feature vectors are classified using one or more machine learning models, and the classified feature vectors are used to predict the representation of each design an edited cell population.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for modifying an editing cassette design library composition, comprising: receiving an editing cassette design library comprising a plurality of editing cassette designs, each of the plurality of editing cassette designs configured to modify a target sequence to produce a modified sequence when provided to an automated cell editing system; generating one or more feature vectors associated with each of the plurality of editing cassette designs, the feature vectors comprising cassette features predictive of a representation for each of the plurality of editing cassette designs upon editing with the editing cassette design library; generating a predictive representation of each modified sequence of each of the plurality of editing cassette designs using one or more trained machine learning (ML) models, the one or more trained ML models trained to generate the predictive representation based on the one or more feature vectors associated with each of the plurality of editing cassette designs; receiving a target representation of each modified target sequence; modifying the editing cassette design library to change one or more editing cassette designs of the editing cassette design library; generating a second predictive representation of each modified sequence of the modified editing cassette design library using the one or more trained machine learning models; and providing the modified editing cassette design library based on the second predictive representation and the target representation. 2. The method of claim 1 , wherein modifying the editing cassette design library comprises removing at least one of the editing cassette designs from the editing cassette design library. 3. The method of claim 2 , wherein modifying the editing cassette design library comprises placing at least one of the editing cassette designs in a second editing cassette design library. 4. The method of claim 1 , wherein modifying the editing cassette design library comprises updating the editing cassette design library to include an additional instance of at least one of editing cassette designs. 5. The method of claim 1 wherein generating the predictive representation comprises: classifying each feature vector based on features of each respective feature vector using a trained first machine learning (ML) model; and predicting relative representation of the plurality of editing cassette designs using regression, based on the classifying, using a trained second ML model. 6. The method of claim 5 , wherein the cassette features comprise at least one of an edit type, an edit length, a sequence composition, an auxiliary edit position, an auxiliary edit number, manufacturing complexity of the editing cassette design library, edit type complexity of the editing cassette design library, and edit length complexity of the editing cassette design library. 7. The method of claim 6 , wherein the generating the one or more feature vectors comprises encoding each of the editing cassette designs with one of Word2vec, Doc2vec, GloVe, or RandSent. 8. The method of claim 7 , wherein the first ML model comprises one or more of a multivariate linear regressor, a support vector machine, a gradient boosting regressor, ensemble model, or a neural network. 9. A system comprising: one or more memory devices; a processor configured to execute computer-readable instructions comprising a method for adjusting a genome design library composition, that causes the processor to: receive an editing cassette design library comprising a plurality of editing cassette designs, each of the cassette designs configured to modify a target sequence to produce a modified sequence when provided to an automated cell editing system; generate one or more feature vectors associated with each of the plurality of editing cassette designs, the feature vectors comprising cassette features predictive of a representation for each of the plurality of editing cassette designs upon editing with the editing cassette design library; generate a predictive representation of each modified sequence of each of the plurality of editing cassette designs using one or more trained machine learning (ML) models, the one or more trained ML models trained to generate the predictive representation based on the one or more feature vectors associated with each of the plurality of editing cassette designs; receive a target representation of each modified target sequence; modify the editing cassette design library to change one or more cassette designs of the design library; generate a second predictive representation of each modified sequence of the modified editing cassette design library using the one or more trained machine learning models; and provide the modified editing cassette design library based on the second predictive representation and the target representation. 10. The system of claim 9 , wherein the computer-readable instructions that cause the processor to modify the editing cassette design library comprises removing at least one of the editing cassette designs from the editing cassette design library. 11. The system of claim 10 , wherein the computer-readable instructions that cause the processor to modify the editing cassette design library comprises placing at least one of the editing cassette designs in a second editing cassette design library. 12. The system of claim 9 , wherein the computer-readable instructions that cause the processor to modify the editing cassette design library comprises updating the editing cassette design library to include an additional instance of at least one of the editing cassette designs. 13. The system of claim 9 , wherein the computer-readable instructions that cause the processor to generate the predictive representation further causes the processor to: classify each feature vector based on features of each respective feature vector, using a trained first machine learning (ML) model; and predict a relative representation of the plurality of editing cassette designs using regression, based on the classifying, using a trained second ML model. 14. The system of claim 13 , wherein the cassette features comprise at least one of an edit type, an edit length, a sequence composition, an auxiliary edit position, an auxiliary edit number, manufacturing complexity of the editing cassette design library, edit type complexity of the editing cassette design library, and edit length complexity of the editing cassette design library. 15. The system of claim 14 wherein the computer-readable instructions that cause the processor to generate the one or more feature vectors comprises encoding each of the editing cassette designs with on of word2vec, doc2vec, GloVe, and RandSent. 16. The system of claim 14 , wherein the first ML model comprises one or more of a multivariate linear regressor, a support vector machine, a gradient boosting regressor, ensemble model, or a neural network. 17. A non-transitory computer-readable medium comprising computer-readable instructions for a method for adjusting a genome design library composition, the computer readable instructions configured to cause a processor to: receive an editing cassette design library comprising a plurality of editing cassette designs, each of the cassette designs configured to modify a target sequence to produce a modified sequence when provided to an automated cell editing system; generate one or more feature vectors associated with each of the plurality of editing cassette designs, the feature vectors comprising cassette features pred

Assignees

Inventors

Classifications

  • Design of libraries · CPC title

  • Supervised data analysis · CPC title

  • Design, preparation, screening or analysis of libraries using computer algorithms · CPC title

  • Screening of libraries · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11566241B2 cover?
Disclosed systems and methods relate to predicting the relative representation of genomic variants in an edited cell population, based on the editing cassette design representation in an editing cassette design library used to generate the edited cell population. A library of editing cassette designs is generated, and a feature vector, or sequence embedding, is developed for each design using n…
Who is the assignee on this patent?
Inscripta Inc
What technology area does this patent fall under?
Primary CPC classification C12N15/1089. Mapped technology areas include Chemistry & Metallurgy.
When was this patent published?
Publication date Tue Jan 31 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).