The quantile normalization method was used to normalize expression values at the probe level. We then computed the Pearson correlation coefficient based on the normalized expression values. Finally, we mapped the PCC value of all protein protein pairs encoded by genes in the above microarray gene expression data set to the abovementioned PIN to build CePIN based on a previous study. Somatic mutations of the cancer cell lines We downloaded the somatic mutations of 1,651 genes across approximately 1,000 cancer cell lines from the CCLE database at the web site. All mutations were determined through tar geted, massive parallel sequencing, as described in a previous study. Drug pharmacological data We downloaded drug pharmacological data from two previous studies. First, Barretina et al.
tested the pharmacological profiles of 24 anticancer drugs across 504 cell lines. Second, Garnett et al. assayed 48,178 drug cell line combinations with a range of 275 to 507 cell lines per drug and 130 anticancer drugs. The pharmacological data across cell lines, based on the half maximal inhibitory concentration, were converted to the natural log value. In addition, we compiled 458 genes from a previous study that react with sensitivity or resistance to 130 anticancer drugs. Inferring putative cancer genes We wrote a computer program to analyze all the pocket mutations and to obtain the number of mis sense mutations inside each pocket region of each pro tein. The script also calculates the number of missense mutations outside of the pocket region of each protein by subtracting the pocket mutations from the somatic mutation dataset.
This R script is provided in Additional file 2. In this study, the null hypothesis is that there is no significant association between the two category variables. The al ternative hypothesis of our computational approach is that if a gene has more somatic mutations in its protein pocket region in comparison to its non pocket region, this gene will more likely be cancer related. We defined a background mutation as the total number of missense mutations in the non pocket regions of all proteins. Then, we performed Fishers exact test, based on numbers in a 2 2 contingency table for each protein. To identify the proteins that were significantly enriched with missense mutations in pocket regions versus at random, we required that the proteins have an adjusted Dacomitinib P value of less than 0.
1 after applying the Benjamini Hochberg correction for multiple testing. We per formed the abovementioned Fishers exact test for each protein harboring pocket mutations in all cancer types and again on each of the top 10 cancer types measured by the largest number of som atic mutations in the pocket regions. All statistical ana lyses were performed using the R platform.