A Computational Imaging Approach to Characterizing Genomic Phenotypes of Pancreatic Ductal Adenocarcinoma in resected PDAC patients Hsiyun Wei1, S. Dickinson2, L. Adriana Escobar Hoyos2, Marc A. Attiyeh2, Jayasree Chakraborty2, Caitlin A. McIntyre2, Christine A. Iacobuzio-Donahue2, Eileen M. O’Reilly2, Richard K. G. Do2, Qingling Duan1, Amber L. Simpson1 1Queen’s University 2Memorial Sloan Kettering Cancer Center
Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal cancers with a five-year survival rate of less than 8%. For patients with surgical resection, the rate only improves to 20%. A hallmark of PDAC contributing to its aggressive biology is the variable and often extensive stromal involvement. Recent work in molecular subtyping may better capture the molecular landscape[1]; and cancer imaging provides insights to intertumoral heterogeneity of PDAC[2,3]. However, the relationship between molecular subtyping and cancer imaging is an under-explored area of research. Both genomic and imaging phenotypes are strongly correlated with PDAC survival outcomes. Thus, our hypothesis is that PDAC imaging phenotypes correlate with genomic phenotypes, which would help our model achieve high accuracy in predicting overall survival. Predicting subgroups of PDAC patients with poor prognosis would better select patients for surgery or chemotherapy. MSK-IMPACT next generation sequencing and preoperative CT were performed prospectively on patients undergoing pancreas resection at Memorial Sloan Kettering Cancer Center (n=172). Genomic analysis revealed alterations in known PDAC driver genes are as follows: KRAS in 157 (91%) patients, and changes in expression of TP53 in 123 (71%), CDKN2A in 75 (44%), and SMAD4 in 34 (20%). Overall, we aim to investigate the relationship between CT imaging phenotypes and genomic phenotypes of PDAC patients; and create a model to predict overall survival and identify subgroups of PDAC patients with enhanced response to chemotherapy. Keywords Pancreatic neoplasm · Computational biology · Survival · Radiogenomics · Genomics · Machine learning · Pancreatic ductal adenocarcinoma · patient survival prediction · texture analysis Introduction The pancreatic ductal adenocarcinoma (PDAC) is the most common pancreatic cancer histological subtype with high mortality due to the lack of symptoms in the initial phase of the disease and its aggressive progression[4,5]. PDAC has a poor prognosis, and the 5-year survival rate for all stages is approximately 6%; whereas after surgical resection, the 5-year survival rate can reach 25% [6,7,8,9]. On the one hand, there is an amount of research that has shown a significant impact of genetic landscape on overall survival of PDAC[10,11,12]. On the other hand, PDAC radiological imaging modalities such as CT have shown a great potential for early prediction of molecular landscape, response to various therapies, and overall survival [13,14,15]. However, the relationship between PDAC genotype and imaging phenotypes is an under- tapped area. In this study, we aimed to investigate the value of radiomics features extracted from Contrast- enhanced CT, combined with NGS-sequenced genetic data and clinical information, for the preoperative prediction of PDAC overall-survival, well-known PDAC driver genes’ status (SMAD4,TP53), and the cluster group of number of mutated genes. Also, to divide the PDAC examples into different subgroups according to radiomic expression. According to these efforts, we hope to help with PDAC early prediction and to provide precise therapeutic optics for PDAC patients for improving their prognosis. Materials and methods Patient This retrospective study was approved by the Memorial Sloan Kettering Cancer Center. The study cohort consisted of 172 patients with histologically proven PDAC, who underwent surgical resection from April 2015 to June 2016. During the study period, 172 patients underwent a preoperative CT angiogram required for image analysis, had their tumors’ sequencing data and
clinical data successfully collected after informed consent. Targeted-sequencing We used a targeted sequencing panel that analyzes all exons and selected introns of 390 cancer-associated genes. An established pipeline was used for DNA extraction, sequencing, and analysis as previously described [16]. Data were analyzed through a custom bioinformatics pipeline. Point mutations were filtered for quality by the following criteria: (tumor variant allele frequency [VAF]/normal VAF) ≥ 5, tumor coverage ≥ 20, tumor VAF ≥ 0.02 (hotspot) or ≥ 0.05 (non-hotspot), tumor mutant reads ≥ 8 (hotspot) or ≥ 10 (non-hotspot). Finally, all filtered called mutations were manually reviewed by a bioinformatician to identify potential false positives. Copy number analysis was performed using FACETS, a software tool optimized for detecting copy number alterations (CNA) while incorporating variations in tumor purity, ploidy, and clonal heterogeneity [17]. Tissue samples and immunohistochemistry We used a targeted sequencing panel that analyzes all second and selected introns of 390 cancer-associated genes. T o validate the sequencing results, IHC staining for those cancer related genes was performed on resected tumor specimens. All hematoxylin and eosin-stained slides of each case were examined by a gastrointestinal pathologist under low power (4×) objective to identify the best representative tumor section to perform IHC staining. Then unstained 5-μm slides were cut from paraffin-embedded tumor blocks and then deparaffinized by standard techniques. Mutation status determination The point mutation, copy number, and IHC results were interpreted together to make a final determination of the allele status. The results of 390 genes were categorized as "mutated" and "not mutated". CT image acquisition and segmentation Contrast-enhanced CT images were used for quantitative image analysis. Following administration of 150 mL of iodinated contrast (Omnipaque 300, GE Healthcare) at 4.0 mL/s, CT images were obtained using a multidetector CT (Light- speed 16 and VCT, GE Healthcare) during the pancreatic parenchymal phase (scan delay 40 s) and portal venous phase. The scan parameters for the portal venous phase were as follows: pitch/table speed 0.984–1.375/39.37– 27.50 mm; autoMA 220–380; noise index 12.5– 14; rotation time 0.7–0.8 ms; scan delay 80–85 s. Axial slices reconstructed at 2.5-mm intervals were used. Quantitative CT image analysis The tumor imaged in the portal venous phase was manually segmented over the entire tumor volume using Scout Liver Software (Pathfinder T echnologies Inc., Analogic Corporation) by research study assistants with prior experience in tumor segmentation. A diagnostic radiologist specializing in pancreatic tumors (with 8 years of experience on the pancreas tumor board) reviewed the segmentations and adjusted tumor contours to ensure tumor region accuracy (Online Resource 1), using the pancreatic parenchymal phase as a guide when necessary. All were blinded to clinical and genetic variables. The decision to use the portal venous phase was due to the variability in use of dual-energy CT for the pancreatic parenchymal phase. 255 radiomic features describing image heterogeneity were extracted by computer scientists from the segmented volume as described previously [references blinded for review]. Briefly, the features were extracted using gray-level co- occurrence matrices (GLCM), run-length matrices (RLM), local binary patterns (LBP), fractal dimension (FD), intensity histogram (IH), and angle co-occurrence matrices (ACM) [18-22]. ACMs describe the directional edge patterns present in a tumor, whereas the other types quantify intensity patterns. A set of statistical features from each type are computed as follows: 19 statistical features from GLCM, 11 from RLM, 128 from LBP, 54 from FD, 5 from IH, and 38 from ACM. Radiomic features were extracted from each CT axial slice of the tumor region and averaged to a single value for the entire tumor volume. All image analysis was performed in MATLAB 2015a (MathWorks, Natick, MA). Statistical analysis T o investigate the relationship between PDAC imaging phenotypes and genetic phenotypes, radiomic features extracted from CT images
were analyzed for significance in these 4 outcomes: survival months, SMAD4 status, TP53 status , number of genes altered. Survival months were analyzed as continuous variables, while SMAD4 status, TP53 status and number of genes altered were analyzed as categorical variables. To compare the best feature selection method for each model, multiple feature selection algorithms were implemented [i.e., univariate analysis and fuzzy minimum- redundancy-maximum-relevance (fMRMR)].
The first predictive algorithm used radiomic features analysis to predict patient’s survival months. Overall survival (OS) was defined as the time interval between date of operation and date of death or date last known alive. Features associated with survival months significantly were selected by univariate analysis. a multivariate analysis with the selected features was then performed using linear regression to observe their efficacy in predicting the percentage of stromal content.
Besides, three algorithms using radiomic feature analysis to predict SMAD4 status, TP53 status (mutated vs. not mutated), and number of altered genes (group 1: < 4 genes altered; group2: >= 4 genes altered) were created. Radiomic features found to be significant in these analyses were selected using univariate feature selection. Naive Bayes models were then implemented for each algorithm to predict the categorical variables.
In addition, to identify subgroups of PDAC patients with enhanced response to chemotherapy, all imaging phenotypes of pancreatic cancer samples were analyzed by unsupervised hierarchical clustering to explore the optimal number of radiomic subgroups with Euclidean distance matrices and the average linkage method. Firstly, overall survival in four subgroups were analyzed to investigate inter- group differences of survival months. Clinical information of samples in various subgroups was then statistically analyzed. This includes the patient's diabetes’ status, three PDAC well- known driver genes’ status (mutated vs. not- mutated). Also, chemotherapeutic drugs’ bases [i.e FOLFIRINOX based and gemcitabine based] in different subgroups were also analyzed to further confirm the characteristics in each subgroup.
Result
Patient population
During the study period, 172 patients’ preoperative CT angiogram for image analysis, tumors’ sequencing data and clinical data are collected after informed consent. There were 55 patients whose tumor’s sizes were too small to be detected well. Therefore, our final cohort consisted of 117 patients. The enrolled subjects were separated into a training set (70% of the dataset) for radiomic feature selection and the establishment of a model, and a test set (the remaining cases) for validation of the model. The demographic and clinical characteristics for these participants are summarized in Table 1.
Table 1. Cohort Demographics
Radiogenomic analysis
The results of the genetic analysis are presented in Table 2 As the graph shows, 91% (157/172) of tumor samples harbored mutations in KRAS, 20% (34/172) of samples had mutations in SMAD4, and 72% (123/172) of samples had alterations in TP53. Given the high proportion of alterations in the gene, KRAS was not pursued in further predictive analyses.
To investigate which regions are mostly located by those genes that mutated. After selecting genes that have more than 0.6% (1/172) mutated samples from 390 genes, 68 genes were being analyzed. As Figure 1 showed, chromosomes 17, 7, 9 were the regions that most mutated genes were located. This matched the result of
Amundadottir's “the current landscape of inherited pancreatic cancer risk variants”. Table 2. Genetic Analysis Figure 1. Distribution of genes’ region and the current landscape of inherited pancreatic cancer risk variants. Correlation of survival prediction and radiomic features The linear regression performance with the best univariate selected features for overall survival months was presented in Figure 2 Using cross- validation for algorithm assessment on the testing set, the linear regression model has achieved the best performance where Mean Absolute Error and Root Mean Squared Error were 15.2 and 19.6, respectively. Figure 2. Correlation heatmap of high-correlated radiomic features and overall-survival months. Correlation of radiomic features and immunohistochemistry In this study, we investigated whether radiomic features of CT image data can accurately predict SMAD4 and TP53 gene expression status. Of all patients included, 34 patients (20%) were found to have an alteration in SMAD4. 255 radiomic features were extracted from CT scans for analysis, and feature selection using univariate analysis resulted in 12 significant features. Using cross-validation for algorithm assessment on the testing set, the Naive Bayes model has reached its best performance where AUC, accuracy, specificity, and sensitivity were 64%,81%,84%,7%, respectively. In our cohort, 123 patients (72%) had alterations in TP53 expression. Following univariate analysis, 11 features were significant from 255 texture features. The classification performance with the best selecting features by univariate feature selection for TP53 has its AUC, accuracy, specificity, and sensitivity: 59%, 75%, 11%, 96%, respectively. In terms of accuracy, the selected features perform generally well, However, the performance was very sensitive to the size of training data. Note that even the AUCs scores were acceptable, but sensitivity and specificity are low due to imbalanced data. The same holds true for SMAD4.
Table 3. Three classification models’ performance on SMAD4 status, TP53 status, and number of mutated genes. Correlation of radiomic features and the numbers of mutated genes The results have also shown that there is a significant relationship between radiomic features and the numbers of mutated genes. Firstly, we converted the numbers of genes altered into binary variables. (the patients with less than 4 genes mutated are group 1; those with more than or equal to 4 genes mutated are group2). The Naive Bayes classification classifier has been used to predict the numbers of mutated genes. The results reached AUC, accuracy, specificity, and sensitivity: 70%, 75%, 87%, 54%, respectively. Using unsupervised hierarchical clustering to explore the optimal radiomic subgroups in PDAC Unsupervised hierarchical clustering analysis of 172 PDAC patients was implemented with 255 texture radiomic features. These PDAC samples were divided into 4 subgroups based on differences of radiomic features. Specifically, subgroup 1 contained 55 samples, subgroup 2 contained 22 samples, subgroup 3 contained 36 samples, and subgroup 4 contained 8 samples. Subgroup-specific clinical features were confirmed To assure that each 4 subgroups were mutually independent, overall-survival, clinical manifestations, and types of neoadjuvant chemotherapy were analzed. The radiomic features with significant relationship with overall survival were being selected by univariate analysis. After clustering on the 10 selected radiomic features, 117 patients are divided into 4 subgroups due to differences of radiomic features. Firstly, average overall- survival months of 4 subgroups were analyzed. As Figure 3 shows, subgroup 4 has higher survival months than the others. The chemotherapeutic drugs in 4 subgroups were then analyzed Figure 4 Specifically, FOLFIRINOX based chemotherapy was the most prevalent drug in all most of the subgroups. Moreover, usage percentages of FOLFIRINOX based chemotherapy in subgroup 1 and subgroup 3 were obviously higher than that in subgroup 2. Although there was not a specific pattern shown by the graph, specific radiomic texture features in subgroup 4 should be looked into further. It might reveal that a specific texture of tumor could experience longer overall survival. To investigate the molecular differences among the four subgroups, therapeutic indicators of samples were analyzed Figure 5 including diabetes status, well-known PDAC driver genes’ status (SMAD4, TP53, KRAS). However, no significant difference between four groups was shown in the analysis. Figure 3. Survival months among 4 subgroups.
Figure 4. The comparison of clinical features among 4 subgroups. (a) neoadjuvant chemotherapy drug’s base. (b) diabetes status. f Figure 5. The molecular differences among 4 subgroups. This includes 3 well-known PDAC driver genes, SMAD4, TP53, and KRAS. Discussion Our study demonstrated that CT texture features can serve as imaging biomarkers for early detection of PDAC overall survival before resection surgery treatment, the status of PDAC driver genes (SMAD4, TP53), and the number of mutated genes. The determined set of imaging features after feature selection successfully predicted PDAC patients’ overall
survival months. The linear regression classifier Mean Absolute Error and Root Mean Squared Error reached 15.2 and 19.6, respectively. Besides, CT texture features were identified during univariate analyses to predict the status of SMAD4, TP53 and the mutated number of genes. The classification accuracy reached 81%, 75%, and 75%, respectively. However, the imbalanced dataset has significantly affected the sensitivity and specificity of the model. Nevertheless, the exploration of our imaging genetics has revealed the impact of genetic variation on PDAC patients and the structure of pancreas. This included association analyses between pancreas morphology and PDAC genetic phenotypes. [13]
To identify the subgroups of PDAC samples with enhanced chemotherapy response, unsupervised hierarchical clustering was implemented on 117 patients’ selected radiomic features. Firstly, to exclude the features that cause noise to the clustering model, univariate feature selection was implemented to the radiomic dataset. After feature selection, 10 radiomic features that strongly correlated to survival months were selected, which were all local binary patterns (LBP) radiomic features. Secondly, to set a proper number of clusters, dendrograms were plotted [dendrograms’ graph] as visualization of the samples. According to these four different radiomics subgroups of PDAC identified in the present study, analysis of
References
subgroup-specific clinical features was performed. This includes their diabetes status, PDAC driver genes’ mutated status, and bases of neoadjuvant chemotherapy drugs. There were no statistically significant differences in those clinical features among four subgroups. However, the overall survival among these subgroups was relatively discrete with large variance.
There were several limitations to our study. First, the control group with healthy pancreatic examples is not included in this study. Thus, the imbalanced data problem might happen to our classification model. Second, there is a significant difference in survival between four subgroups divided by unsupervised hierarchical clustering. However, no significant histological marker or clinical manifestation (PDAC driver genes, diabetes status, neoadjuvant therapies) trend was shown. This may be relevant to the limited number of subjects in each group.
Zehir, A. et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat. Med. 23, 703–713 (2017).
Gatenby, R. A., Grove, O. & Gillies, R. J. Quantitative imaging in cancer evolution and ecology. Radiology 269, 8–15 (2013).
Attiyeh, M. A. et al. Survival Prediction in Pancreatic Ductal Adenocarcinoma by Quantitative Computed Tomography Image Analysis. Ann. Surg. Oncol. 25, 1034–1042 (2018).
Siegel RL, Miller KD, Jemal A. Cancer statistics, 2018. CA Cancer J Clin. 2018;68:7–30. https://doi.org/10.3322/caac.21442.
Hong S-M, Park JY, Hruban RH, Goggins M. Molecular signatures of pancreatic cancer. Arch Pathol Lab Med. 2011;135:716–27. https://doi.org/10.1043/2010-0566-RA.1.
6. Hidalgo M. Pancreatic cancer. N Engl J Med. 2010;362:1605–17.
7. Zhang Q, Zeng L, Chen Y, Lian G, Qian C, Chen S, Li J, Huang K. Pancreatic Cancer
epidemiology, detection, and management. Gastroenterol Res Pract. 2016;2016:8962321. 8. Dimastromatteo J, Houghton JL, Lewis JS, Kelly KA. Challenges of pancreatic Cancer.
Cancer J. 2015;21:188–93.
9. Conroy T, Bachet JB, Ayav A, Huguet F, Lambert A, Caramella C, Marechal R, Van Laethem
JL, Ducreux M. Current standards and new innovative approaches for treatment of
pancreatic cancer. Eur J Cancer. 2016;57:10–22.
10. Smith, J. P., Whitcomb, D. C., Matters, G. L., Brand, R. E., Liao, J., Huang, Y. J., & Frazier,
M. L. (2015). Distribution of cholecystokinin-B receptor genotype between patients with pancreatic cancer and controls and its impact on survival. Pancreas, 44(2), 236–242. https://doi.org/10.1097/MPA.0000000000000263
11. Waddell, N., Pajic, M., Patch, A. M., Chang, D. K., Kassahn, K. S., Bailey, P., Johns, A. L., Miller, D., Nones, K., Quek, K., Quinn, M. C., Robertson, A. J., Fadlullah, M. Z., Bruxner, T. J., Christ, A. N., Harliwong, I., Idrisoglu, S., Manning, S., Nourse, C., Nourbakhsh, E., ... Grimmond, S. M. (2015). Whole genomes redefine the mutational landscape of
12. pancreatic cancer. Nature, 518(7540), 495–501. https://doi.org/10.1038/nature14169 13. Almeida, P.P., Cardoso, C.P. & de Freitas, L.M. PDAC-ANN: an artificial neural network to predict pancreatic ductal adenocarcinoma based on gene expression. BMC Cancer 20, 82
(2020). https://doi.org/10.1186/s12885-020-6533-0
14. Eresen, A., Yang, J., Shangguan, J. et al. MRI radiomics for early prediction of response to
vaccine therapy in a transgenic mouse model of pancreatic ductal adenocarcinoma. J
Transl Med 18, 61 (2020). https://doi.org/10.1186/s12967-020-02246-7
15. Li, K., Yao, Q., Xiao, J. et al. Contrast-enhanced CT radiomics for predicting lymph node metastasis in pancreatic ductal adenocarcinoma: a pilot study. Cancer Imaging 20, 12
(2020). https://doi.org/10.1186/s40644-020-0288-3
16.Holmes, L.; LaHurd, A.; Wasson, E.; McClarin, L.; Dabney, K. Racial and Ethnic
Heterogeneity in the Association Between Total Cholesterol and Pediatric Obesity. Int. J.
Environ. Res. Public Health 2016,
13, 19 (article number: 19.
https://www.mdpi.com/1660-4601/13/1/19).
Cheng DT, Mitchell TN, Zehir A, Shah RH, Benayed R, Syed A, Chandramohan R, Liu ZY, Won HH, Scott SN, Brannon AR, O’Reilly C, Sadowska J, Casanova J, Yannes A, Hechtman JF, Yao J, Song W, Ross DS, Oultache A, Dogan S, Borsu L, Hameed M, Nafa K, Arcila ME, Ladanyi M, Berger MF (2015) Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT): A Hybridization Capture-Based Next- Generation Sequencing Clinical Assay for Solid Tumor Molecular Oncology. J Mol Diagn 17 (3):251-264. https://doi. org/10.1016/j.jmoldx.2014.12.006
Shen R, Seshan VE (2016) FACETS: allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing. Nucleic Acids Res 44 (16):e131. https://doi. org/10.1093/nar/gkw520
Haralick RM SK, Dinstein I (1973) Textural Features for Image Classification. IEEE Trans Syst Man Cybern SMC-3 (6):610-621
Tang X (1998) Texture information in run-length matrices. IEEE Trans Image Process 7 (11):1602-1609. https://doi. org/10.1109/83.725367
Buczkowski S, Hildgen P, Cartilier L (1998) Measurements of fractal dimension by box- counting: A critical analysis of data scatter. 252 (1-2):23-34. https://doi.org/10.1016/S0378 -4371(97)00581-5
Chakraborty J, Rangayyan RM, Banik S, Mukhopadhyay S, Desautels JEL (2012) Statistical measures of orientation of texture for the detection of architectural distortion in prior mammograms of interval-cancer. J Electron Imaging 21 (3):12
Ojala T, Pietikäinen M, Harwood D (1996) A Comparative Study of Texture Measures with Classification Based on Feature Distributions. 29 (1):51-59. https://doi.org/10.1016/0031- 3203(95)00067-4