Title: | Differential Coexpressed Networks |
---|---|
Description: | Estimation of DIFferential COexpressed NETworks using diverse and user metrics. This package is basically used for three functions related to the estimation of differential coexpression. First, to estimate differential coexpression where the coexpression is estimated, by default, by Spearman correlation. For this, a metric to compare two correlation distributions is needed. The package includes 6 metrics. Some of them needs a threshold. A new metric can also be specified as a user function with specific parameters (see difconet.run). The significance is be estimated by permutations. Second, to generate datasets with controlled differential correlation data. This is done by either adding noise, or adding specific correlation structure. Third, to show the results of differential correlation analyses. Please see <http://bioinformatica.mty.itesm.mx/difconet> for further information. |
Authors: | Elpidio-Emmanuel Gonzalez-Valbuena [aut], Victor Trevino [aut, cre] |
Maintainer: | Victor Trevino <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0-4 |
Built: | 2024-10-27 05:34:53 UTC |
Source: | https://github.com/cran/difconet |
This function takes a normal dataset and generate simulated tumor stages by adding progressive levels of noise. It may add artificial networks of genes connected at given correlations that can progressively increase or decrease their level of correlation.
difconet.build.controlled.dataset(data, noise.genes = round(nrow(data)*0.1), noise.sigma = c(0.0, 0.1, 0.2), nonoise.sigma = c(0.0, 0.01, 0.01), netcov = matrix(c( 0.90, 0.90, 0.75, 0.75, 0.60, 0.60, 0.45, 0.45, 0.30, 0.30, 0.15, 0.15, 0.30, 0.30, 0.45, 0.45, 0.60, 0.60, 0.75, 0.75, 0.95, 0.95, 0.80, 0.80, 0.65, 0.65, 0.50, 0.50, 0.35, 0.35, 0.10, 0.10, 0.25, 0.25, 0.40, 0.40, 0.55, 0.55, 0.70, 0.70, 1.00, 1.00, 0.85, 0.85, 0.70, 0.70, 0.55, 0.55, 0.40, 0.40, 0.05, 0.05, 0.20, 0.20, 0.35, 0.35, 0.50, 0.50, 0.65, 0.65 ), ncol=3), genes.nets = 10, corfunc=function(a,b) cor(a,b,method="spearman"), verbose = TRUE)
difconet.build.controlled.dataset(data, noise.genes = round(nrow(data)*0.1), noise.sigma = c(0.0, 0.1, 0.2), nonoise.sigma = c(0.0, 0.01, 0.01), netcov = matrix(c( 0.90, 0.90, 0.75, 0.75, 0.60, 0.60, 0.45, 0.45, 0.30, 0.30, 0.15, 0.15, 0.30, 0.30, 0.45, 0.45, 0.60, 0.60, 0.75, 0.75, 0.95, 0.95, 0.80, 0.80, 0.65, 0.65, 0.50, 0.50, 0.35, 0.35, 0.10, 0.10, 0.25, 0.25, 0.40, 0.40, 0.55, 0.55, 0.70, 0.70, 1.00, 1.00, 0.85, 0.85, 0.70, 0.70, 0.55, 0.55, 0.40, 0.40, 0.05, 0.05, 0.20, 0.20, 0.35, 0.35, 0.50, 0.50, 0.65, 0.65 ), ncol=3), genes.nets = 10, corfunc=function(a,b) cor(a,b,method="spearman"), verbose = TRUE)
data |
data.frame or matrix representing the normal dataset. Rows are genes and columns are samples. |
noise.genes |
the number of genes from data that will noised. |
noise.sigma |
Levels of gaussian noise to be added (at zero mean) expressed in a cumulative manner. |
nonoise.sigma |
Levels of gaussian noise to be added (at zero mean) for the rest of the genes. |
netcov |
numeric matrix of correlation levels for networks, rows represent networks and columns represent stages. |
genes.nets |
The number of genes in each generated network. |
corfunc |
Correlation method used. |
verbose |
Print progress. |
This function generates a simulated tumor progression dataset based on normal data. The progression is done by stages. The number of stages is given by the length of noise.sigma. Each stage will have the same dimensions than data (plus the networks). The stages will be N, T1, T2, and so on. The N is meant to be the data itself with no noise but for generality, the first element of noise.sigma specifies the level of noise for N (default to 0). The next values of noise.sigma will be used to generate T1, T2, and so on. Thus the returned data will be estimated by N=data+noise.sigma[1], T1=N+noise.sigma[2], T2=T1+noise.sigma[3], and so on. Note that noise.sigma will be added only to a specific number of rows given by noise.genes. The value returned is a list of the generated matrices. In top of that, the nonoise.sigma specify the level of noise added to those genes not selected to be noised. This is meant to be lower levels of noise than noise.sigma to avoid that data in stages is just a copy of previous data. This function also adds full connected networks of genes connected at netcov levels. The data added has mean=0 and sd=1. The number of rows represent the networks added. The columns represent the stages.
List of stages.
Elpidio Gonzalez and Victor Trevino [email protected]
Gonzalez-Valbuena and Trevino 2017 Metrics to Estimate Differential Co-Expression Networks Journal Pending volume 00–10
difconet.noise.inspection
.
difconet.run
.
## Not run: difconet.noise.inspection(normaldata, tumordata, sigma=0:15/10)
## Not run: difconet.noise.inspection(normaldata, tumordata, sigma=0:15/10)
Plots the estimated correlation distribution of a normal dataset after adding different levels of gaussian noise. It is used to estimate the level of noise needed to be added to a normal dataset to match the correlation distribution of a tumor dataset. This assumes that the correlation distribution of the tumor dataset is sharper around zero.
difconet.noise.inspection(ndata, tdata, sigma=c(0.5, 0.75, 1.25), maxgenes=5000, corfunc=function(a,b) cor(a,b,method="spearman"))
difconet.noise.inspection(ndata, tdata, sigma=c(0.5, 0.75, 1.25), maxgenes=5000, corfunc=function(a,b) cor(a,b,method="spearman"))
ndata |
The normal dataset. Rows are genes and columns are samples. |
tdata |
The tumor dataset. Rows are genes and columns are samples. Rows of tumor and normal datasets should be the same. |
sigma |
Levels of gaussian noise to be added (at zero mean). |
maxgenes |
Number of genes used to estimate the correlation distribution. If the number of rows in normal/tumor datasets are larger than maxgenes, maxgenes random genes are used for the estimation. |
corfunc |
Correlation method used. |
Plots the estimated density of correlation distributions of normal, tumor, and normal after adding sigma levels of noise.
Nothing.
Elpidio Gonzalez and Victor Trevino [email protected]
Gonzalez-Valbuena and Trevino 2017 Metrics to Estimate Differential Co-Expression Networks Journal Pending volume 00–10
difconet.build.controlled.dataset
.
difconet.run
.
## Not run: difconet.noise.inspection(normaldata, tumordata, sigma=0:15/10)
## Not run: difconet.noise.inspection(normaldata, tumordata, sigma=0:15/10)
Draw scatter plots of the correlations of a specific gene.
difconet.plot.gene.correlations(dObj, gene, stages=1:length(dObj$stages.data), type=c("density","scatter")[1], main=rownames(dObj$stages.data[[1]])[gene], legends=TRUE, plot=TRUE, ... )
difconet.plot.gene.correlations(dObj, gene, stages=1:length(dObj$stages.data), type=c("density","scatter")[1], main=rownames(dObj$stages.data[[1]])[gene], legends=TRUE, plot=TRUE, ... )
dObj |
The difconet object. |
gene |
Numeric or character. The gene index/rowname whose correlations will be drawn. |
stages |
Numeric or character. The stages to be included. If type="scatter" and more than two stages, a call to pairs is used instead of plot. |
type |
Character. The type of plot density or scatter. |
main |
Character. The main title passed to plot. |
legends |
Logical. Specifies whether the legends are drawn when type="density". |
plot |
Logical. Specifies whether the plots are actually drawn (to get the correlations). |
... |
Further parameters passed to plot/pairs. |
Run the whole process of estimation differences in correlations for a given dataset. The estimations are done for all metric values, all cutoff values across all comparisons.
The correlations of the gene across stages (invisible).
Elpidio Gonzalez and Victor Trevino [email protected]
Gonzalez-Valbuena and Trevino 2017 Metrics to Estimate Differential Co-Expression Networks Journal Pending volume 00–10
xdata <- matrix(rnorm(1000), ncol=100) xpredictor <- sample(c("A","B","C","D"),100,replace=TRUE) dObj <- difconet.run(xdata, xpredictor, metric = 4, num_perms = 10, comparisons = list(c("A","D"), c("A","B"), c("B","D")), perm_mode = "columns") #Top highest metric in first comparison but showing correlations in only 3 stages difconet.plot.gene.correlations(dObj, order(dObj$combstats[[1]][,"M4.dist"], decreasing=TRUE)[1], type="s", stages=1:3) #Bottom lowest metric in second comparison showing all stages difconet.plot.gene.correlations(dObj, order(dObj$combstats[[2]][,"M4.dist"], decreasing=TRUE)[1], type="d") #Another specific gene (3), showing densities of correlations difconet.plot.gene.correlations(dObj, 3, type="d")
xdata <- matrix(rnorm(1000), ncol=100) xpredictor <- sample(c("A","B","C","D"),100,replace=TRUE) dObj <- difconet.run(xdata, xpredictor, metric = 4, num_perms = 10, comparisons = list(c("A","D"), c("A","B"), c("B","D")), perm_mode = "columns") #Top highest metric in first comparison but showing correlations in only 3 stages difconet.plot.gene.correlations(dObj, order(dObj$combstats[[1]][,"M4.dist"], decreasing=TRUE)[1], type="s", stages=1:3) #Bottom lowest metric in second comparison showing all stages difconet.plot.gene.correlations(dObj, order(dObj$combstats[[2]][,"M4.dist"], decreasing=TRUE)[1], type="d") #Another specific gene (3), showing densities of correlations difconet.plot.gene.correlations(dObj, 3, type="d")
Draw a heatmap whose rows are genes and columns are segments of the histogram of the distribution of correlations per gene. The height/density of the histogram is shown in colors.
difconet.plot.histograms.heatmap2(dObj, genes=1:10, stages=1:length(dObj$stages.data), qprobs=c(0,.50,.975,.995), ...)
difconet.plot.histograms.heatmap2(dObj, genes=1:10, stages=1:length(dObj$stages.data), qprobs=c(0,.50,.975,.995), ...)
dObj |
The difconet object. |
genes |
Numeric or character. The gene indexes/rownames included. |
stages |
Numeric or character. The stages to be included. |
qprobs |
The quantiles used to draw the heatmap. Should be 4 points. Each has specific color codes. |
... |
Further parameters passed to plot/pairs. |
A heatmap is draw representing the distribution of correlations of several genes across stages.
Nothing.
Elpidio Gonzalez and Victor Trevino [email protected]
Gonzalez-Valbuena and Trevino 2017 Metrics to Estimate Differential Co-Expression Networks Journal Pending volume 00–10
xdata <- matrix(rnorm(1000), ncol=100) xpredictor <- sample(c("A","B","C","D"),100,replace=TRUE) dObj <- difconet.run(xdata, xpredictor, metric = 4, num_perms = 10, comparisons = list(c("A","D"), c("A","B"), c("B","D")), perm_mode = "columns") #Top highest metric in first comparison but showing correlations in only 3 stages difconet.plot.gene.correlations(dObj, order(dObj$combstats[[1]][,"M4.dist"], decreasing=TRUE)[1], type="s", stages=1:3) #Bottom lowest metric in second comparison showing all stages difconet.plot.gene.correlations(dObj, order(dObj$combstats[[2]][,"M4.dist"], decreasing=TRUE)[1], type="d") #Another specific gene (1), showing densities of correlations difconet.plot.gene.correlations(dObj, 1, type="d")
xdata <- matrix(rnorm(1000), ncol=100) xpredictor <- sample(c("A","B","C","D"),100,replace=TRUE) dObj <- difconet.run(xdata, xpredictor, metric = 4, num_perms = 10, comparisons = list(c("A","D"), c("A","B"), c("B","D")), perm_mode = "columns") #Top highest metric in first comparison but showing correlations in only 3 stages difconet.plot.gene.correlations(dObj, order(dObj$combstats[[1]][,"M4.dist"], decreasing=TRUE)[1], type="s", stages=1:3) #Bottom lowest metric in second comparison showing all stages difconet.plot.gene.correlations(dObj, order(dObj$combstats[[2]][,"M4.dist"], decreasing=TRUE)[1], type="d") #Another specific gene (1), showing densities of correlations difconet.plot.gene.correlations(dObj, 1, type="d")
Estimates the DIFferential COrrelation NETworks analysis from a given dataset.
difconet.run(data, predictor, metric=c(1,2,3,4,5,6), cutoff=0.3, blocs=5000, num_perms=10, comparisons="all", perm_mode="columns", use_all_perm = TRUE, save_perm=FALSE, speedup=0, verbose=TRUE, metricfunc=NULL, corfunc=function(a,b) cor(a,b,method="spearman") )
difconet.run(data, predictor, metric=c(1,2,3,4,5,6), cutoff=0.3, blocs=5000, num_perms=10, comparisons="all", perm_mode="columns", use_all_perm = TRUE, save_perm=FALSE, speedup=0, verbose=TRUE, metricfunc=NULL, corfunc=function(a,b) cor(a,b,method="spearman") )
data |
data.frame or matrix represent the dataset. Genes in rows, samples in columns. |
predictor |
Factor or numeric vector representing the classes of each column in data. The correlations will be estimated for each class separately. |
metric |
The metrics needed to be calculated. Valid values are 1 to 6 and 8. 1 to 6 are already implemented and shown in details. 8 specifies a user-defined metric specified in metricfunc. |
cutoff |
Cut off values used for metric 1 and/or 3. |
blocs |
Number of rows per block. Because of memory issues, the correlations are estimated by blocks of genes. This value represent the size of the block. Larger values requires more memory if needed. Lower values requiere more cycles and therefore it is slower but makes it computable depending on database size and memory. |
num_perms |
Number of permutations. |
comparisons |
Character or list. If character, it could be "all" to specify all possible combinations of classes. If set to "seq", classes are taken in order and comparisons are done by first versus second, second versus third, and so on. If this is a list containing vectors of two elements, the estimations are done for the specific comparisons included (numeric or character). |
perm_mode |
Character. It determines the how the permutated data is generated. It can be permutated by "columns", permutated by "rows" (all classes/stages), or permutated by rows within each class separately using "rows.class", or "all" in which all data is shuffled. |
use_all_perm |
Logical. If TRUE, it uses all permutated data to estimate the p-value, otherwise it uses only the same row permutations to estimate the p-value (it requires a lot more permutations). |
save_perm |
Logical. If TRUE, it save all permutated data. It may require more memory. |
speedup |
Numeric. Determines whether the calculation will be sped up. This is experimental. The value specify which metric will be used to speed up. This is done by modeling the dependency of the metric and p-value using 1 percent of the rows. |
verbose |
Logical. Determines if printing progress information. |
metricfunc |
Function. Specify the function to be used if a metric==8 is included. The function should receive dObj, a, and b which correspond to the difconet object and the a and b vectors of correlations needed to estimate the value of the metric. It is assumed a distance-like measure (non-negative) and values close to 0 means no difference whereas larger values represent more dissimilar correlations. |
corfunc |
Function. Specify the function that estimates the correlations, similar to the cor function. The default uses cor and spearman coefficients. |
Run the whole process of estimation differences in correlations for a given dataset. The estimations are done for all metric values, all cutoff values across all comparisons.
A difconet object represented as a list. The items are the followings:
stage |
Vector. A copy of predictor (classes). |
labels |
Vector. The levels or values of the different classes. |
comparisons |
The specified comparisons parameter. |
num_perms |
The specified number of permutations num_perms parameter. |
perm_mode |
The specified number of permutations perm_mode parameter. |
use_all_perm |
The specified number of permutations use_all_perm parameter. |
speedup |
The specified speedup parameter. |
verbose |
The specified verbose parameter. |
metricfunc |
The specified metricfunc parameter. |
combinations |
A data.frame of the combinations that were compared. |
stages.data |
A list of datasets. This is only the original data split by classes. |
combstats |
A list of all comparisons made. Each element contains a matrix whose rows represent the genes and columns represent the results of all metrics (metric.dist : metric value, metric.p : p-value, metric.q : q-value, metric.expr.p : p-value of differential expression for comparison purposes, metric.expr.q : q-value of differential expression.) |
combdens |
A list of the densities of the metric for observed data and permutations. This can be used to compare the estimated metric statistics. |
permutations |
List. If save_perm==TRUE, it saves all permutated data. |
Elpidio Gonzalez and Victor Trevino [email protected]
Gonzalez-Valbuena and Trevino 2017 Metrics to Estimate Differential Co-Expression Networks Journal Pending volume 00–10
difconet.build.controlled.dataset
.
xdata <- matrix(rnorm(1000), ncol=100) xpredictor <- sample(c("A","B","C","D"),100,replace=TRUE) dObj <- difconet.run(xdata, xpredictor, metric = 4, num_perms = 10, comparisons = list(c("A","D"), c("A","B"), c("B","D")), perm_mode = "columns") ## Not run: #xpredictor contains A, B, C, and D. #xdata contains the data matrix dObj <- difconet.run(xdata, xpredictor, metric = c(1,2,4), cutoff = 0.6, blocs = 7000, num_perms = 10, comparisons = list(c("A","D"), c("A","B"), c("B","D")), perm_mode = "columns") ## End(Not run)
xdata <- matrix(rnorm(1000), ncol=100) xpredictor <- sample(c("A","B","C","D"),100,replace=TRUE) dObj <- difconet.run(xdata, xpredictor, metric = 4, num_perms = 10, comparisons = list(c("A","D"), c("A","B"), c("B","D")), perm_mode = "columns") ## Not run: #xpredictor contains A, B, C, and D. #xdata contains the data matrix dObj <- difconet.run(xdata, xpredictor, metric = c(1,2,4), cutoff = 0.6, blocs = 7000, num_perms = 10, comparisons = list(c("A","D"), c("A","B"), c("B","D")), perm_mode = "columns") ## End(Not run)