Skip to content

Examples

Data Format Requirements

limma-py requires input data in CSV format with the following structure:

Gene,Sample1,Sample2,Sample3,Sample4,Sample5,Sample6,Sample7,Sample8,Sample9,Sample10
Gene_1,8.45,7.89,8.12,15.67,14.89,16.23,8.33,7.95,8.21,8.09
Gene_2,12.34,11.89,12.56,11.45,10.98,12.11,25.67,24.89,26.01,25.34
Gene_3,5.67,6.01,5.89,18.90,19.45,18.23,6.12,5.78,6.34,5.95

Format Specifications:

  • First column: Gene/protein names (any identifier)

  • Subsequent columns: Numerical expression matrix, each column represents a sample

  • Must be in CSV format, other formats need to be converted first

Practical Example

import pandas as pd
import numpy as np
import limma_py

# Read CSV file, header=0 indicates first row contains column names
data = pd.read_csv("data/Harmine_iTSA.csv", header=0)

# Extract gene names (first column)
gene_names = data.iloc[:, 0].values

# Extract expression matrix (all columns from second column onward)
expr_data = data.iloc[:, 1:]

# Define experimental groups: first 5 samples as control (V_group), last 5 as treatment (D_group)
group = np.array(["V_group"] * 5 + ["D_group"] * 5)

# Create design matrix using one-hot encoding
design_df = pd.get_dummies(group, drop_first=False)[["V_group", "D_group"]]

# Ensure design matrix is integer type
design = design_df.astype(int)

# Create expression matrix copy and set gene names as row index
expr_matrix = expr_data.copy()
expr_matrix.index = gene_names

# Perform linear model fitting using limma
fit_python = limma_py.lmFit(expr_matrix, design)

# Set up contrast matrix: compare treatment group (D_group) vs control group (V_group)
contrasts = limma_py.make_contrasts('D_group - V_group', levels=design)

# Perform contrast analysis on fitted results
fit_python = limma_py.contrasts_fit(fit_python, contrasts)

# Apply empirical Bayes moderation of standard errors
eb_python = limma_py.eBayes(fit_python)

# Extract differential expression analysis results table
res = limma_py.toptable(eb_python)

Common Issues

❌ Error message: "Data dimension mismatch" - Check if number of rows in design matrix matches number of columns in expression data (number of samples)

  • Check if length of group vector matches number of samples

✅ Data Preprocessing Checklist - CSV file first column contains gene names

  • First row contains sample names

  • All expression values are numerical

  • No missing values or missing values have been handled

  • Sample order matches group definition