Evolutionary models of multiple sequence alignments to predict offspring fitness prior to conception

US10658068B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10658068-B2
Application numberUS-201414568456-A
CountryUS
Kind codeB2
Filing dateDec 12, 2014
Priority dateJun 17, 2014
Publication dateMay 19, 2020
Grant dateMay 19, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system, device and method for receiving multiple aligned genetic sequences obtained from genetic samples of multiple organisms of one or more different species. A measure of evolutionary variation may be computed for one or more alleles at each of one or more aligned genetic loci. The aligned genetic loci in the multiple organisms may be derived from one or more common ancestral genetic loci or may be otherwise related. The measure of evolutionary variation may be a function of variation in alleles at corresponding aligned genetic loci in the multiple aligned genetic sequences. One or more likelihoods may be computed that an allele mutation at each of the one or more genetic loci in a simulated virtual progeny will be deleterious based on the measure of evolutionary variation of alleles at the corresponding aligned genetic loci for the multiple organisms.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method for predicting variants being deleterious in virtual progenies, a deleterious variant associated with disease or reduced likelihood of surviving or reproducing, the method comprising: retrieving, from a computer memory, multiple aligned genetic sequences representing genetic material obtained from DNA samples of multiple organisms of one or more different species, the aligned genetic sequences aligning genetic sequences of the multiple organisms at one or more genetic loci; generating a plurality of virtual progenies of two potential parents by combining at least a portion of genetic information representing genetic material obtained from biological samples of the two potential parents, the generating of the virtual progenies comprising: (i) retrieving, for each parent, a diploid genetic sequence comprising two alleles at each genetic locus, (ii) selecting, for each parent, one of the two alleles for each genetic locus, the selection progressing locus-by-locus along the diploid genetic sequence and based at least partially on a stochastic process, (iii) forming a haploid genetic sequence based on the selected alleles for each parent, the haploid genetic sequence representing a genetic sequence of a virtual gamete, (iv) combining the haploid genetic sequences of the two parents to form a virtual progeny diploid genetic sequence associated with a virtual progeny, and (v) repeating at least steps (ii) to (iv) multiple times to generate the plurality of virtual progenies of the two potential parents; aligning the plurality of virtual progenies with the multiple sequence alignment of the multiple organisms; retrieving a machine learning computer model that is trained based on at least the aligned genetic sequences of the multiple organisms, the machine learning computing model comprises a phylogenetic tree generated from the multiple aligned genetic sequences, the phylogenetic tree modeling evolutionary variations of allele variants, each evolutionary variation of an allele variant corresponding to a prediction of a likelihood that the allele variant would be deleterious; inputting aligned sequences of the plurality of virtual progenies to the machine learning computer model to compute one or more likelihoods that a particular allele variant in the plurality of virtual progenies will be deleterious based on the evolutionary variation of the particular allele variant of the plurality of virtual progenies aligned with the multiple sequence alignment of the multiple organisms, the one or more likelihoods computed based on a frequency with which the particular allele variant has occurred and persisted in the multiple organisms according to the phylogenetic tree. 2. The method of claim 1 , wherein an additional virtual progeny is generated by combining genetic information of one of the two potential parents and a reference genetic information data set. 3. The method of claim 1 , wherein the multiple organisms are from multiple different species. 4. The method of claim 1 , wherein the multiple organisms are from a single species. 5. The method of claim 1 , further comprising computing one or more functions of variation in alleles at corresponding aligned genetic loci between a genetic sequence of an individual organism and one or more reference genetic information data sets. 6. The method of claim 1 , further comprising comparing the one or more likelihoods to one or more thresholds or other statistical models to predict if the particular allele variant is deleterious. 7. The method of claim 1 , wherein the likelihood that the particular allele variant is deleterious is relatively higher for one or more variants at corresponding aligned genetic loci that have a relatively lower measure of evolutionary variations in alleles. 8. The method of claim 1 , further comprising weighing the evolutionary variations at different genetic loci based on a distribution of mutation rates at the different genetic loci in the multiple aligned genetic sequences. 9. The method of claim 1 , further comprising weighing the evolutionary variations at different genetic loci to identify genetic loci in which mutations have been observed in evolutionary history lower than a threshold rate. 10. The method of claim 1 , wherein the phylogenetic tree is used to generate a function of variation in alleles that have proliferated in the multiple organisms over evolutionary history to predict likelihoods that such variations in alleles are deleterious. 11. The method of claim 10 , wherein at least one of the likelihoods that one of such variations in alleles is deleterious is based on a frequency with which the one of such variations has occurred and persisted in the multiple organisms over evolutionary history. 12. The method of claim 10 , wherein at least one of the likelihoods that one of such variations in alleles is deleterious is based on a proximity in the phylogenetic tree representing an evolutionary timescale between a reference genetic sequence of the same species as the two potential parents and one or more other species in which the one of such variations has occurred. 13. The method of claim 1 , wherein the phylogenetic tree is defined by a model of probabilities that an allele i will mutate to an allele j over an interval of evolutionary time. 14. The method of claim 1 , wherein at least one of the evolutionary variations of a second particular allele variant is a score that quantifies a relative amount of sequence conservation at an aligned genetic loci corresponding to the second particular allele variant. 15. The method of claim 1 , wherein at least one of the evolutionary variations of a second particular allele variant based on a Shannon entropy of alleles at an aligned genetic loci corresponding to the second particular allele variant. 16. The method of claim 1 , wherein at least one of the evolutionary variations of a second particular allele variant based on an average pairwise difference between different alleles at an aligned genetic loci corresponding to the second particular allele variant. 17. The method of claim 1 , wherein at least one of the evolutionary variations is a composite of results of multiple functions for computing multiple measures of evolutionary variation for multiple genetic loci. 18. The method of claim 1 , wherein at least one of the evolutionary variations corresponding to multiple different aligned genetic loci is derived from multiple common ancestral genetic loci. 19. The method of claim 1 , wherein the one or more likelihoods are computed further based on training the machine learning computer model to discriminate between variants predefined to be deleterious and variants predefined not to be deleterious. 20. The method of claim 1 , wherein the one or more likelihoods are computed further based on training the machine learning computer model to assess a likelihood of a variant reaching a certain frequency in a population. 21. The method of claim 1 , wherein at least one of the evolutionary variations corresponding to one or more genetic loci is based on a ratio w of a non-synonymous substitution rate to a synonymous substitution rate, wherein a non-synonymous substitution is an allele substitution in a codon that does not change an amino acid encoded by the codon and a synonymous substitution is an allele substitution in the codon that does change the amino acid. 22. A system for executing a machine learning comput

Assignees

Inventors

Classifications

  • ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks · CPC title

  • G16B10/00Primary

    ICT specially adapted for evolutionary bioinformatics, e.g. phylogenetic tree construction or analysis · CPC title

  • G16B20/00Primary

    ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations · CPC title

  • Probabilistic models · CPC title

  • G16B20/20Primary

    Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10658068B2 cover?
A system, device and method for receiving multiple aligned genetic sequences obtained from genetic samples of multiple organisms of one or more different species. A measure of evolutionary variation may be computed for one or more alleles at each of one or more aligned genetic loci. The aligned genetic loci in the multiple organisms may be derived from one or more common ancestral genetic loci …
Who is the assignee on this patent?
Ancestry Com Dna Llc
What technology area does this patent fall under?
Primary CPC classification G16B10/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 19 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).