pathXcite: Literature-Based Gene Enrichment

pathXcite links scientific literature directly to functional interpretation. From curated articles, it identifies and ranks genes, then tests them against any gene set library: from pathways and phenotypes to transcription factor targets and drug signatures.

Install Quick Start Tutorials

What pathXcite does

pathXcite is a standalone application for literature-based functional genomics. Instead of requiring expression data, it mines published articles to extract, rank, and analyze genes relevant to your topic: and performs enrichment against built-in or custom gene set libraries.

All data remain local in self-contained project folders, enabling reproducibility, portability, and offline analysis once literature is cached.

First analysis in five steps

  1. Create a project: Start a new project; a dedicated folder stores corpus, annotations, and results.
  2. Add articles: Paste PMIDs/PMCIDs or use the built-in browser to search and select papers.
  3. Extract genes: Annotate articles to identify gene mentions and map them to NCBI Gene IDs.
  4. Rank and filter: Choose absolute frequency or GF-IDF to rank genes by prominence or specificity.
  5. Enrich: Select one or more gene set libraries, from pathways and phenotypes to drug responses, and run enrichment.

See the Quick Start guide for a full walkthrough.

Gene ranking strategies

Absolute Frequency

Most mentioned genes

Counts how many times a gene appears in your corpus. Highlights widely studied, canonical genes.

Intuitive and straightforward Captures well-established core genes Can overrepresent highly cited genes

GF-IDF

Topic-specific genes

Adjusts for how often each gene appears across the entire literature, emphasizing those unusually enriched in your corpus.

Highlights context-specific genes Reduces literature popularity bias May downrank broadly studied genes

Running enrichment with both strategies often reveals complementary patterns: canonical and topic-specific gene landscapes.

Broad support for gene set libraries

pathXcite supports enrichment against a wide range of curated libraries. More than 240 are included, and additional libraries can be loaded via .gmt files.

See the tutorials on Gene Set Libraries for details.

Inputs and outputs

Input

  • PMIDs or PMCIDs (or search results from the integrated browser)
  • (Optional) Species filters for gene annotation
  • (Optional) Custom gene set libraries (.gmt)

Output

  • Ranked gene lists (frequency and GF-IDF) with NCBI Gene IDs
  • Enrichment results (p-values, FDR, overlap counts)
  • Interactive visualizations (bubble plots, bar charts)
  • Self-contained project folder with databases and configurations

Analysis tips

System requirements

Get started with pathXcite Install Run your first analysis Then explore the Tutorials.