What you'll achieve
- Read section-level gene distribution plots and interpret where and how gene mentions occur in a paper.
- Distinguish background mentions (e.g. in Abstracts or Abbreviations) from evidence-rich sections (Results, Discussion).
- Evaluate the depth and focus of a paper's gene content before including it in downstream over-representation analysis (ORA).
Section plots help you look beyond the raw list of extracted genes. They show you where those genes occur in the text, whether they appear in the scientific core of the article or just in superficial contexts. This distinction is critical because genes mentioned in Results or Discussion are more likely to represent experimentally supported findings rather than incidental mentions. As such, these plots allow you to prioritise signal-rich articles for pathway analysis, increasing the biological relevance of your enrichment results.
Steps: How to read the section plot
-
Open Document Insights and select a paper. Each vertical line in the plot represents a segmented section of the article, coloured by type (e.g. Abstract, Methods, Results, Discussion). The teal curve shows how many gene mentions occur in each segment.
Let's interpret five real examples to illustrate the range of patterns you might see:
Plot 1: Heavy clustering of gene mentions in Results, minimal elsewhere. Plot 2: Few gene mentions overall, narrowly concentrated in a handful of Results subsections. Plot 3: Gene mentions are distributed across Abstract, Methods, Results, Discussion, and even References are highly gene-centric. Plot 4 (Article A): Mentions peak in the Discussion, showing strong interpretation around gene findings, with additional signals in Intro and Tables. Plot 5 (Article B): Nearly all mentions appear in the Abbreviations section; these are likely superficial and unsuitable for enrichment. -
Look for where peaks occur.
- Peaks in Results and Discussion → likely reflect experimentally supported gene findings and biological interpretation (e.g. Plot 1, Plot 3, and Plot 4).
- Peaks in Abstract or Introduction → often background mentions or hypotheses, not necessarily experimental results (e.g. Plot 3's early peaks).
- Peaks only in Abbreviations or References → generally low-value for enrichment; they reflect definitions or citations, not evidence (e.g. Plot 5).
-
Assess the distribution breadth.
- Broad distributions across many sections (Plot 3) indicate deeply gene-centric articles; potentially highly valuable for ORA.
- Sharp, isolated peaks (Plot 2) suggest gene mentions are secondary or context-specific; include only if relevant.
- Peaks concentrated in the Discussion (Plot 4) signal interpretive strength; useful if you're looking for mechanistic insights.
-
Use this insight in document selection.
When choosing which papers to include for over-representation analysis, prioritise those whose section plots show concentrated activity in Results and Discussion. These indicate that the extracted genes are discussed as part of the study's core findings. Avoid or down-weight documents like Plot 5, where gene mentions are superficial.
Why section plots are powerful for enrichment analysis
When performing over-representation analysis (ORA), the biological relevance of your results depends heavily on the quality of the input gene list. If your gene set is contaminated with incidental mentions (e.g. from abbreviations, citations, or background text), enrichment results may reflect literature noise rather than true biological signal.
The section plot helps you avoid this by revealing context, not just which genes appear, but where and how they are used within a paper. This has several key advantages:
- Signal vs. noise discrimination: A gene mentioned 20 times in the Results likely reflects actual experimental findings, while one appearing once in a reference list is probably incidental.
- Document triage: You can quickly exclude low-value papers (e.g. Plot 5) or focus on gene-rich, evidence-driven ones (Plot 3) before enrichment, improving signal-to-noise ratio.
- Context-aware curation: Articles like Plot 4, where mentions peak in the Discussion, are ideal for exploring mechanistic insights, while Plot 1-type papers highlight primary discoveries.
- Downstream reliability: ORA results built from contextually strong documents are more likely to reflect pathways and processes that are truly implicated in the underlying biology.
In short, section plots allow you to move from a “bag of genes” to a contextualised, evidence-weighted gene set, a critical step for producing biologically meaningful enrichment results.
Next steps
Continue to these related tutorials to apply what you learned: