Systems and methods for multi-label cancer classification

US11527323B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11527323-B2
Application numberUS-202015930234-A
CountryUS
Kind codeB2
Filing dateMay 12, 2020
Priority dateMay 14, 2019
Publication dateDec 13, 2022
Grant dateDec 13, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are provided for identifying a diagnosis of a cancer condition for a somatic tumor specimen of a subject. The method receives sequencing information comprising analysis of a plurality of nucleic acids derived from the somatic tumor specimen. The method identifies a plurality of features from the sequencing information, including two or more of RNA, DNA, RNA splicing, viral, and copy number features. The method provides a first subset of features and a second subset of features from the identified plurality of features as inputs to a first classifier and a second classifier, respectively. The method generates, from two or more classifiers, two or more predictions of cancer condition based at least in part on the identified plurality of features. The method combines, at a final classifier, the two or more predictions to identify the diagnosis of the cancer condition for the somatic tumor specimen of the subject.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for identifying a diagnosis of a cancer condition for a subject from among at least 50 different cancer conditions, the method comprising: sequencing a plurality of DNA molecules from a sample of a somatic tumor from the subject, thereby obtaining a first plurality of sequence reads of DNA from a somatic tumor of the subject; aligning each respective sequence read in the first plurality of sequence reads to a reference human genome, thereby generating a corresponding first plurality of aligned sequence reads; sequencing a plurality of mRNA molecules from the sample of the somatic tumor from the subject, thereby obtaining a second plurality of sequence reads of RNA from the somatic tumor of the subject; aligning each respective sequence read in the second plurality of sequence reads to a reference human transcriptome, thereby generating a corresponding second plurality of aligned sequence reads; identifying a plurality of features from the first plurality of aligned sequence reads and second plurality of aligned sequence reads, collectively, wherein the plurality of features comprises three or more subsets of features including a first subset of features comprising RNA expression features, a second subset of features comprising copy number features, and a third subset of features comprising DNA features, wherein: each RNA expression feature is associated with an expression level of a respective target region of the reference human transcriptome and represents a corresponding abundance of sequence reads, in the second plurality of aligned sequence reads, that map to the respective target region of the reference human transcriptome; each DNA feature is associated with a respective allele status in a respective target region of the reference human genome and represents a corresponding abundance of sequence reads with a corresponding reference or variant allele, in the first plurality of aligned sequence reads, that map to the respective target region of the reference human genome; and each copy number feature is associated with a target region of the reference human genome and represents a corresponding abundance of sequence reads, in the first plurality of aligned sequence reads, that map to the respective target region of the reference human genome; and evaluating the plurality of features using an ensemble classifier comprising (i) a set of intermediate classifiers that includes a first classifier, a second classifier, and a third classifier, and (ii) a final classifier, wherein the ensemble classifier uses the plurality of features to form: (a) for each respective classifier in the set of intermediate classifiers, a corresponding intermediate prediction by: obtaining a first intermediate prediction from among a first plurality of predictions for the cancer condition associated with the first classifier, by providing the first subset of features from the identified plurality of features as inputs to the first classifier, wherein the first classifier evaluates the first subset of features against each cancer condition in the at least 50 different cancer conditions to provide the first intermediate prediction; obtaining a second intermediate prediction from among a second plurality of predictions for the cancer condition associated with the second classifier, by providing the second subset of features from the identified plurality of features as inputs to the second classifier; and obtaining a third intermediate prediction from among a third plurality of predictions for the cancer condition associated with the third classifier, by providing the third subset of features from the identified plurality of features as inputs to the third classifier, thereby forming a plurality of intermediate predictions; and (b) a determination of the cancer condition of the subject by combining, at the final classifier, the plurality of intermediate predictions that includes the first, second, and third intermediate predictions to identify the cancer condition for the subject from among the at least 50 different cancer conditions, wherein the determination of the cancer condition of the subject formed by the ensemble classifier comprises differentiating between general sarcomas, ependymoma, ewing sarcoma, gliosarcoma, leiomyosarcoma, meningioma, mesothelioma, and Rosai-Dorfman. 2. The method of claim 1 , wherein combining, at the final classifier, the plurality of intermediate predictions further comprises: scaling each intermediate prediction of the plurality of intermediate predictions based at least in part on a respective confidence level in each respective prediction to form a corresponding scaled prediction in a corresponding plurality of scaled predictions; and generating a combined prediction based at least in part on each scaled prediction by inputting each respective scaled prediction in the corresponding plurality of scaled predictions into the final classifier. 3. The method of claim 1 , wherein: the target regions of the reference human transcriptome associated with each RNA expression feature collectively represent a plurality of genes, and the plurality of genes comprises ten or more genes selected from the group consisting of GPM6A, CDX1, SOX2, NAPSA, CDX2, MUC12, SLAMF7, HNF4A, ANXA10, TRPS1, GATA3, SLC34A2, NKX2-1, SLC22A31, ATP10B, STEAP2, CLDN3, SPATA6, NRCAM, USH1C, SOX17, TMPRSS2, MECOM, WT1, CDHR1, HOXA13, SOX10, SALL1, CPE, NPR1, CLRN3, THSD4, ARL14, SFTPB, COL17A1, KLHL14, EPS8L3, NXPE4, FOXA2, SYT11, SPDEF, GRHL2, GBP6, PAX8, ANO1, KRT7, HOXA9, TYR, DCT, LYPD1, MSLN, TP63, CDH1, ESR1, HNF1B, HOXA10, TJP3, NRG3, TMC5, PRLR, GATA2, DCDC2, INS, NDUFA4L2, TBX5, ABCC3, FOLH1, HIST1H3G, S100A1, PTHLH, ACER2, RBBP8NL, TACSTD2, C19orf77, PTPRZ1, BHLHE41, FAM155A, MYCN, DDX3Y, FMN1, HIST1H3F, UPK3B, TRIM29, TXNDC5, BCAM, FAM83A, TCF21, MIA, RNF220, AFAP1, KRT5, SOX21, KANK2, GPM6B, C1orf116, FOXF1, MEIS1, EFHD1, and XKRX. 4. The method of claim 1 , wherein the first plurality of sequence reads was generated by low pass, whole genome sequencing. 5. The method of claim 1 , wherein the second plurality of sequence reads was generated from sequencing of cDNA. 6. The method of claim 1 , wherein the ensemble classifier is trained by a method comprising: obtaining, for each respective training subject in a plurality of training subjects, (i) the plurality of features, (ii) a respective training label for each respective classifier in the set of intermediate classifiers, and (iii) a respective label for the cancer condition of the respective training subject; training, for each respective classifier in the set of intermediate classifiers, a respective initial model for the respective classifier that provides a respective initial intermediate prediction for each respective training subject based on at least, for each respective training subject in the plurality of training subjects, (i) a respective subset of features in the three or more subsets of features, and (ii) the respective training label for the respective classifier; training a respective initial model for the final classifier that provides a corresponding initial diagnosis for the cancer condition based on at least, for each respective training subject in the plurality of training subjects, (i) for each respective classifier in the set of intermediate classifiers, a respective initial classification output from the respective initial model for the respective classifier for the respective training subject, and (ii) the respective label for the cancer condition of the respective training subject; calculating, for each respective training subject in the plurality of training subjects, a respective entropy score for the respective training subject based at least in part on a respective initial dia

Assignees

Inventors

Classifications

  • relating to pathologies · CPC title

  • ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding · CPC title

  • ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations · CPC title

  • Ploidy or copy number detection · CPC title

  • for data related to laboratory analysis, e.g. patient specimen analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11527323B2 cover?
Systems and methods are provided for identifying a diagnosis of a cancer condition for a somatic tumor specimen of a subject. The method receives sequencing information comprising analysis of a plurality of nucleic acids derived from the somatic tumor specimen. The method identifies a plurality of features from the sequencing information, including two or more of RNA, DNA, RNA splicing, viral, …
Who is the assignee on this patent?
Tempus Labs Inc
What technology area does this patent fall under?
Primary CPC classification G16H50/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 13 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).