1. Introduction
Enrichr provides a comprehensive collection of over 240 predefined gene set libraries compiled from diverse biological resources, ranging from gene ontologies and curated pathways to experimental perturbation datasets and disease associations. Each library is a collection of gene sets representing biological knowledge: pathways, transcription factor targets, drug perturbation responses, disease signatures, and more.
These libraries allow you to perform enrichment analysis by testing whether your input gene list significantly overlaps with known biological categories. Understanding what each library represents, and when to use it, is key to interpreting results correctly.
2. Gene Ontology (GO) Libraries
Gene Ontology (GO) libraries categorize genes based on biological process, molecular function, and cellular component annotations. They are ideal for broad biological interpretation.
- GO_Biological_Process_2021/2023/2025: Genes grouped by biological roles (e.g., cell cycle, immune response).
- GO_Molecular_Function_2021/2023/2025: Genes grouped by their activities at the molecular level (e.g., kinase activity).
- GO_Cellular_Component_2021/2023/2025: Genes grouped by subcellular localization (e.g., mitochondrion, ribosome).
When to use: For broad, unbiased exploration of gene function, especially in discovery-driven projects.
3. Pathway Libraries
These libraries contain curated gene sets representing canonical biological pathways from multiple databases.
- KEGG_2013-2021: Pathways from KEGG (metabolism, signaling, disease).
- Reactome_2022/2024: Detailed manually curated pathways with hierarchical structure.
- WikiPathways_2013-2024: Community-curated pathway gene sets.
- BioCarta_2013-2016: Classic signaling and regulatory pathways.
- Panther_2015/2016: Functional classification pathways from Panther DB.
- BioPlanet_2019: Broad coverage of metabolic and regulatory pathways.
- Elsevier_Pathway_Collection: Commercially curated high-quality pathway definitions.
- HumanCyc_2015/2016: Human metabolic pathways.
- PFOCR_Pathways_2023: Pathways mined from pathway figures in literature.
When to use: For mechanistic interpretation and pathway-level insights in gene expression or differential analysis studies.
4. Transcription Factor Targets & Regulatory Networks
These libraries collect genes regulated by transcription factors based on ChIP-seq, motif prediction, or co-expression.
- ChEA_2013-2022: Targets inferred from ChIP-X data.
- ENCODE_TF_ChIP-seq_2014/2015: Genome-wide TF binding from ENCODE.
- TRANSFAC_and_JASPAR_PWMs / JASPAR_PWM_Human_2025: Predicted targets based on motif enrichment.
- TRRUST_Transcription_Factors_2019: Manually curated TF-target interactions.
- TF_Perturbations_Followed_by_Expression: Expression changes following TF perturbation.
- TF-LOF_Expression_from_GEO: Loss-of-function TF perturbation signatures.
When to use: To identify upstream regulators of your gene list or infer transcriptional programs driving your data.
5. Perturbation Libraries
These collections capture transcriptional responses to various perturbations, such as drugs, ligands, gene knockouts, and environmental conditions.
- Drug_Perturbations_from_GEO / DrugMatrix: Expression changes after drug treatment.
- LINCS_L1000_Chem_Pert_*: Large-scale chemical perturbation signatures from the LINCS L1000 project.
- L1000_Kinase_and_GPCR_Perturbations_*: Perturbations targeting kinases or GPCRs.
- Kinase_Perturbations_from_GEO_*: Expression responses to kinase perturbations.
- Ligand_Perturbations_from_GEO_*: Ligand-induced gene expression changes.
- Virus_Perturbations_from_GEO_*: Host responses to viral infection.
- Microbe_Perturbations_from_GEO_*: Responses to microbial stimuli.
- MCF7_Perturbations_from_GEO_*: Drug and gene perturbations in the MCF7 cell line.
- RummaGEO_* and PerturbAtlas: Large integrated perturbation signature databases.
When to use: To identify compounds, genetic perturbations, or stimuli that mimic or reverse your gene signature.
6. Disease and Phenotype Libraries
These libraries group genes by disease association, phenotype, or clinical evidence.
- OMIM_Disease / OMIM_Expanded: Gene-disease associations from OMIM.
- DisGeNET: Curated and predicted gene-disease links.
- ClinVar_2019/2025: Genes with known clinical variants.
- Orphanet_Augmented_2021: Rare disease gene associations.
- Jensen_DISEASES / Jensen_DISEASES_Curated: Text-mined and curated disease associations.
- Human_Phenotype_Ontology: Genes grouped by phenotype annotations.
- KOMP2_Mouse_Phenotypes_2022: Mouse knockout phenotype associations.
- MGI_Mammalian_Phenotype_Level_4: Mammalian phenotypic classifications.
When to use: For linking gene lists to disease contexts, interpreting clinical relevance, or exploring phenotype-driven biology.
7. GWAS and Genetic Association Libraries
- GWAS_Catalog_2019-2025: Genes associated with genome-wide significant variants.
- PheWeb_2019: Gene-trait associations from phenome-wide association studies.
- UK_Biobank_GWAS_v1: Large-scale GWAS associations from the UK Biobank.
- MAGMA_Drugs_and_Diseases: MAGMA-based gene-level association results.
- PhenGenI_Association_2021: Integrative gene-trait associations.
When to use: To connect gene lists to genetic evidence and identify traits or diseases with shared molecular signatures.
8. Cell Type and Tissue Expression Libraries
- ARCHS4_Tissues / Cell-lines: Expression-based gene sets across tissues and cell types.
- GTEx_Tissues / GTEx_Aging_Signatures: Gene expression signatures across tissues and age groups.
- Allen_Brain_Atlas_* / Azimuth_* / Tabula_Muris / Tabula_Sapiens: Single-cell transcriptomics-derived cell-type signatures.
- CellMarker_2024: Curated cell-type marker genes.
- HuBMAP_ASCTplusB_* : Human tissue and cell-type coexpression modules.
When to use: For identifying tissue or cell-type specificity of your gene list.
9. Protein-Protein Interactions, Complexes & PTMs
- CORUM: Experimentally verified protein complexes.
- PPI_Hub_Proteins: Highly connected proteins in interaction networks.
- BioPlex_2017: High-throughput human protein-protein interaction network.
- Phosphatase_Substrates_from_DEPOD: Known phosphatase-substrate relationships.
- Transcription_Factor_PPIs: TF interaction networks.
- Virus-Host_PPI_P-HIPSTer: Host-virus protein-protein interactions.
When to use: For studying network topology or contextualizing your gene set within interaction maps.
10. Specialized and Emerging Libraries
- DSigDB: Drug-gene associations for repurposing studies.
- HMDB_Metabolites / Metabolomics_Workbench: Metabolite-related gene sets.
- TargetScan_microRNA / miRTarBase: miRNA-target interactions.
- GlyGen_Glycosylated_Proteins: Protein glycosylation data.
- HomoloGene: Cross-species orthology-based gene groupings.
- Pfam_Domains / InterPro_Domains: Protein domain annotations.
- Genome_Browser_PWMs: Predicted motif-based gene sets.
- RummaGene_*: Aggregated kinase and TF signatures.
- SysMyo_Muscle_Gene_Sets: Muscle-specific gene sets.
When to use: For niche analyses focused on specific regulatory, structural, or post-translational features.
11. Choosing the right library
With hundreds of libraries available, it's crucial to select those that match your biological question. For example:
- Mechanistic pathway discovery: KEGG, Reactome, WikiPathways
- Regulatory inference: ChEA, ENCODE_TF_ChIP-seq, JASPAR
- Disease context: DisGeNET, OMIM, ClinVar
- Drug repurposing: DSigDB, LINCS_L1000_Chem_Pert
- Cell-type specificity: ARCHS4_Tissues, CellMarker, Azimuth
- Genetic evidence: GWAS_Catalog, UK_Biobank_GWAS
Often, the most powerful analyses combine results from multiple complementary libraries, for example, using GO for functional context, ChEA for regulatory insights, and DSigDB for drug associations.