Who is the assignee on this patent?

Hubbell Earl A, Cawley Simon, Affymetrix Inc

What technology area does this patent fall under?

Primary CPC classification G06F19/18. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 12 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System, method, and computer software product for genotype determination using probe array data

US9760675B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9760675-B2
Application number	US-201213468604-A
Country	US
Kind code	B2
Filing date	May 10, 2012
Priority date	May 18, 2007
Publication date	Sep 12, 2017
Grant date	Sep 12, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An embodiment of a method of analyzing data from processed images of biological probe arrays is described that comprises receiving a plurality of files comprising a plurality of intensity values associated with a probe on a biological probe array; normalizing the intensity values in each of the data files; determining an initial assignment for a plurality of genotypes using one or more of the intensity values from each file for each assignment; estimating a distribution of cluster centers using the plurality of initial assignments; combining the normalized intensity values with the cluster centers to determine a posterior estimate for each cluster center; and assigning a plurality of genotype calls using a distance of the one or more intensity values from the posterior estimate.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for genotyping a plurality of single nucleotide polymorphisms (SNPs) in a nucleic acid sample using seed genotype cluster estimates derived without requiring mismatch probe data, the method comprising: hybridizing a nucleic acid sample with a plurality of allele-specific perfect-match probes provided in an array of perfect-match probes for a plurality of target sequences which the array is designed to genotype, wherein, for substantially all of the plurality of target sequences, the array is without corresponding mismatch probes; acquiring intensity data associated with the hybridizing, wherein the intensity data comprises intensity values; summarizing the intensity values to obtain a signal value for each allele for each of the plurality of SNPs; transforming the signal values by discarding size information from the signal values, thereby generating transformed signal values represented in one-dimensional contrast space; evaluating all plausible divisions of the transformed signal values into seed genotypes by applying a Gaussian likelihood model; averaging the plausible divisions over most likely plausible divisions to derive a plurality of seed genotype clusters; and genotyping the plurality of SNPs, wherein genotyping comprises a comparison of the transformed signal values with a set of typical values for each genotype, wherein the set of typical values comprises prior values, wherein the prior values further comprise estimates of genotype cluster center locations and genotype cluster center variances of the plurality of seed genotype clusters determined from the clustering properties of the transformed signal values; wherein the steps of summarizing, transforming, evaluating, averaging, and genotyping are performed on a computer, and wherein the computer comprises a computer processor. 2. The method of claim 1 , wherein summarizing comprises quantile normalization of the intensity values, and wherein summarizing does not include background adjustment. 3. The method of claim 1 , wherein transforming comprises generating a contrast value for each of the signal values, wherein each of the contrast values is associated with a contrast value range. 4. The method of claim 3 , wherein the contrast value range of each of the contrast values is between −1 and 1. 5. The method of claim 4 , wherein the contrast values that correspond to heterozygous genotypes are stretched, and wherein the contrast values that correspond to homozygous genotypes are compressed. 6. The method of claim 1 , wherein the plurality of seed genotype clusters are derived from the clustering properties of the data for each of the plurality of SNPs and not in reliance on initial cluster estimates from any mismatch probe analysis. 7. The method of claim 6 , wherein the plurality of seed genotype clusters consists of three or fewer seed genotype clusters. 8. The method of claim 6 , wherein the plurality of seed genotype clusters consists of two seed genotype clusters when evaluating SNPs of individuals with one X chromosome and one Y chromosome. 9. The method of claim 6 , wherein each of the transformed signal values is assigned to exactly one of the plurality of seed genotype clusters, thereby generating a plurality of initial assignments. 10. The method of claim 9 , wherein the plurality of initial assignments is evaluated under a Gaussian cluster model, thereby generating a plurality of final assignments. 11. The method of claim 10 , wherein each of the plurality of final assignments is combined with the transformed signal values to generate a posterior distribution of genotype clusters. 12. The method of claim 1 , wherein the plausible divisions are determined by restricting possible clusters to an expected number of genotypes of increasing contrast. 13. The method of claim 12 , wherein the plausible divisions are determined by restricting possible clusters to an expected number of genotypes of increasing contrast, with a minimum distance allowed between cluster centers. 14. The method of claim 1 , further comprising: evaluating the likelihood for each plausible assignment of seed genotypes based on a posterior likelihood of the clusters; from the likelihoods of all assignments, computing a relative probability assignment for each transformed signal value to be assigned as each genotype; computing a posterior distribution of centers and spread for each genotype using the resulting relative probability assignment to seed the final computation; and determining genotypes and confidences for each SNP using the posterior distribution of centers and spread for each genotype. 15. The method of claim 1 , wherein transforming the signal values increases contrast between seed genotype clusters and provides a genotype order relation that holds between cluster centers. 16. The method of claim 15 , wherein the genotype order relation is of the form BB left of AB left of AA, and wherein the genotype order relation is required of all fits to the data. 17. The method of claim 1 , further comprising: by using a Bayesian procedure, combining a prior estimate of genotype cluster centers and variances for each SNP with the transformed signal values and genotype seed assignments for the transformed signal values determined from the cluster properties of the transformed signal values to obtain a posterior estimate of cluster centers and variances; calling genotypes of the transformed signal values for an SNP using the posterior estimate. 18. The method of claim 17 , wherein determining genotype seed assignments for an SNP comprises: determining the likelihood for all plausible seed genotype clusters of the transformed signal values and averaging over the most likely seeds. 19. The method of claim 18 , wherein determining the likelihood for all plausible seed genotype clusters of the transformed signal values and averaging over the most likely seeds further comprises: repeatedly assigning each of the transformed signal values to seed genotype clusters, wherein each transformed signal value is assigned to exactly one seed genotype cluster resulting in a plausible hard assignment for each transformed signal value; evaluating the likelihood of each plausible hard assignment under a Gaussian cluster model to evaluate the quality of the hard assignment; combining most likely plausible hard assignments into a soft assignment that allows transformed signal values to be partially assigned to more than one seed genotype cluster; and using this soft assignment of genotypes as a reliable seed. 20. The method of claim 18 , wherein the plausible seed genotype clusters are assigned based on two dividing contrast values corresponding to vertical lines in contrast that determine the transitions between genotypes. 21. The method of claim 1 , further comprising: restricting a weighting of cluster centers and variances from prior values in order to accommodate potential shifting of the cluster centers in the transformed signal values. 22. The method of claim 1 , further comprising: controlling the amount of mixing between cluster centers by counting transformed signal values in each cluster more towards the estimate of that cluster, without requiring all clusters to have the same variance. 23. The method of claim 1 , further comprising: applying a likelihood penalty for each cluster observed to reduce clusters from erroneously splitting. 24. The method of claim

Assignees

Inventors

Classifications

G06F19/24
Physics · mapped topic
G06F19/18Primary
Physics · mapped topic
G06F19/20
Physics · mapped topic
G16B20/10
Ploidy or copy number detection · CPC title
G16B40/30
Unsupervised data analysis · CPC title

Patent family

Related publications grouped by family.

View patent family 40028104

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9760675B2 cover?: An embodiment of a method of analyzing data from processed images of biological probe arrays is described that comprises receiving a plurality of files comprising a plurality of intensity values associated with a probe on a biological probe array; normalizing the intensity values in each of the data files; determining an initial assignment for a plurality of genotypes using one or more of the i…
Who is the assignee on this patent?: Hubbell Earl A, Cawley Simon, Affymetrix Inc
What technology area does this patent fall under?: Primary CPC classification G06F19/18. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 12 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).