API Reference¶
Core Functions¶
lmFit(object, design=None, **kwargs)¶
Linear model fitting function, the first step in the limma analysis pipeline.
Parameters:
-
object: Gene expression data matrix with shape (genes × samples), can be numpy array or pandas DataFrame -
design: Design matrix with shape (samples × coefficients), creates intercept-only model if None -
ndups: Number of duplicates for each gene (used when the same gene is represented by multiple probes) -
spacing: Spacing between duplicates -
weights: Observation weights, can be a vector or matrix matching data dimensions -
method: Fitting method, currently only "ls" (least squares) is supported
Returns:
Dictionary containing linear model fitting results:
- coefficients: Coefficient matrix (genes × coefficients)
-
stdev_unscaled: Unscaled standard errors matrix -
sigma: Residual standard deviations -
df_residual: Residual degrees of freedom -
cov_coefficients: Coefficient covariance matrix -
rank: Rank of the design matrix -
Amean: Average expression levels across samples -
design: Design matrix used for fitting
make_contrasts(*args, contrasts=None, levels)¶
Generate contrast matrix, supports direct input of pandas DataFrame as levels parameter.
Parameters:
-
*args: Used to pass contrast expression strings (e.g., "groupD - groupC"), functionally same as contrasts parameter, choose one or use both Example: make_contrasts("A - B", "C - (A+B)/2", levels=levels) -
contrasts: Used to pass contrast expression strings (e.g., "groupD - groupC"), functionally same as *args parameter, choose one or use both Example: make_contrasts("A - B", "C - (A+B)/2", levels=levels) -
levels: Design matrix (can directly input pd.DataFrame) or list of column names
Returns:
- Contrast matrix: (np.ndarray) Shape (n_levels, n_contrasts), where:
n_levels is the number of levels (from levels parameter)
n_contrasts is the number of contrast expressions (from *args and contrasts parameters)
contrasts_fit(fit, contrasts=None, coefficients=None)¶
Extract results of specified group contrasts from linear model fitting results.
Parameters:
fit: Linear model fitting results dictionary, must contain the following keys:
coefficients: Gene×coefficient expression coefficient matrix
stdev_unscaled: Gene×coefficient unscaled standard deviation matrix
cov_coefficients: Coefficient covariance matrix (optional)
-
contrasts: Contrast matrix (rows = number of coefficients in fit, columns = number of contrasts) -
coefficients: Column indices/names of coefficients to retain (mutually exclusive with contrasts)
Returns: Updated fitting results dictionary with core fields adjusted according to the contrast matrix.
eBayes(fit, proportion=0.01, trend=False, **kwargs)¶
Empirical Bayes moderation of standard errors for linear model fits.
Parameters:
- fit: Linear model fitting results dictionary from lmFit
-
proportion: Prior proportion of genes expected to be differentially expressed, default 0.01 -
trend: Whether to account for trend in variances with expression level -
winsor_tail_p: Tail probabilities for Winsorization in robust estimation
Returns:
Updated fit dictionary containing empirical Bayes results:
- df_prior: Prior degrees of freedom
-
s2_prior: Prior variance -
t: Moderated t-statistics -
p_value: Moderated p-values -
lods: Log-odds of differential expression -
F: F-statistics (if design permits) -
F_p_value: F-test p-values
toptable(fit, coef=[0], number=10, adjust_method="BH", **kwargs)¶
Extract a table of the top-ranked genes from linear model fit.
Parameters:
- fit: Typically linear model fitting results dictionary from eBayes
-
coef: Coefficient(s) to display, can be column index or list of indices. Note: Difference from R - to display the first coefficient, use coef=[0] in this library (vs coef=1 in R) -
number: Number of top genes to display, default 10 -
adjust_method: Multiple testing correction methods: "BH" (Benjamini-Hochberg), "BY", "bonferroni", "holm", "none" -
sort_by: Column to sort results by: "B" (log-odds), "logFC", "AveExpr", "P" (p-value), "t", "none" -
p_value: P-value cutoff for filtering, default 1.0 (no filtering) -
lfc: Log-fold-change cutoff (log2 scale) -
confint: Whether to compute confidence intervals
Returns:
Pandas DataFrame containing top-ranked genes with columns:
- logFC: Log-fold change
-
AveExpr: Average expression level -
t: Moderated t-statistic -
P.Value: P-value -
adj.P.Val: Adjusted p-value (FDR) -
B: Log-odds of differential expression -
CI.L,CI.R: Confidence interval lower/upper bounds (if confint=True)