Examples
Data Format Requirements¶
limma-py requires input data in CSV format with the following structure:
Gene,Sample1,Sample2,Sample3,Sample4,Sample5,Sample6,Sample7,Sample8,Sample9,Sample10
Gene_1,8.45,7.89,8.12,15.67,14.89,16.23,8.33,7.95,8.21,8.09
Gene_2,12.34,11.89,12.56,11.45,10.98,12.11,25.67,24.89,26.01,25.34
Gene_3,5.67,6.01,5.89,18.90,19.45,18.23,6.12,5.78,6.34,5.95
Format Specifications:¶
-
First column: Gene/protein names (any identifier)
-
Subsequent columns: Numerical expression matrix, each column represents a sample
-
Must be in CSV format, other formats need to be converted first
Practical Example¶
import pandas as pd
import numpy as np
import limma_py
# Read CSV file, header=0 indicates first row contains column names
data = pd.read_csv("data/Harmine_iTSA.csv", header=0)
# Extract gene names (first column)
gene_names = data.iloc[:, 0].values
# Extract expression matrix (all columns from second column onward)
expr_data = data.iloc[:, 1:]
# Define experimental groups: first 5 samples as control (V_group), last 5 as treatment (D_group)
group = np.array(["V_group"] * 5 + ["D_group"] * 5)
# Create design matrix using one-hot encoding
design_df = pd.get_dummies(group, drop_first=False)[["V_group", "D_group"]]
# Ensure design matrix is integer type
design = design_df.astype(int)
# Create expression matrix copy and set gene names as row index
expr_matrix = expr_data.copy()
expr_matrix.index = gene_names
# Perform linear model fitting using limma
fit_python = limma_py.lmFit(expr_matrix, design)
# Set up contrast matrix: compare treatment group (D_group) vs control group (V_group)
contrasts = limma_py.make_contrasts('D_group - V_group', levels=design)
# Perform contrast analysis on fitted results
fit_python = limma_py.contrasts_fit(fit_python, contrasts)
# Apply empirical Bayes moderation of standard errors
eb_python = limma_py.eBayes(fit_python)
# Extract differential expression analysis results table
res = limma_py.toptable(eb_python)
Common Issues¶
❌ Error message: "Data dimension mismatch" - Check if number of rows in design matrix matches number of columns in expression data (number of samples)
- Check if length of group vector matches number of samples
✅ Data Preprocessing Checklist - CSV file first column contains gene names
-
First row contains sample names
-
All expression values are numerical
-
No missing values or missing values have been handled
-
Sample order matches group definition