Development Guide

This guide covers contributing to MCBO, running quality control checks, and understanding the evaluation framework.

Repository Structure

mcbo/
├── ontology/           # MCBO ontology (TBox)
│   └── mcbo.owl.ttl
├── python/             # Python package (pip install -e python/)
│   ├── mcbo/           # Core library + CLI modules
│   │   ├── __init__.py      # Package exports
│   │   ├── namespaces.py    # Shared RDF namespaces
│   │   ├── graph_utils.py   # Graph loading/creation utilities
│   │   ├── csv_to_rdf.py    # CSV-to-RDF conversion
│   │   ├── build_graph.py   # Graph building
│   │   ├── run_eval.py      # SPARQL evaluation
│   │   └── stats_eval_graph.py  # Statistics
│   └── pyproject.toml  # Package metadata and entry points
├── scripts/            # Shell scripts
│   └── run_all_checks.sh    # Full QC + evaluation runner
├── eval/               # Competency question evaluation
│   └── queries/        # SPARQL query files (*.rq)
├── sparql/             # QC queries for ROBOT
├── reports/            # QC reports
│   └── robot/
├── data.sample/        # Demo data (public)
└── .data/              # Real data (git-ignored)

Quality Control

ROBOT QC Queries

MCBO uses ROBOT for ontology quality control.

Run all QC checks:

make qc

This executes three QC queries:

Query

Purpose

orphan_classes.rq

Finds classes without parent classes (orphans)

duplicate_labels.rq

Finds classes with duplicate rdfs:label values

missing_definitions.rq

Finds classes missing obo:IAO_0000115 (definition) annotations

Manual QC Commands

# Check for orphan classes
java -jar .robot/robot.jar query \
  --input ontology/mcbo.owl.ttl \
  --query sparql/orphan_classes.rq \
  reports/robot/orphan_classes.tsv

# Check for duplicate labels
java -jar .robot/robot.jar query \
  --input ontology/mcbo.owl.ttl \
  --query sparql/duplicate_labels.rq \
  reports/robot/duplicate_labels.tsv

# Check for missing definitions
java -jar .robot/robot.jar query \
  --input ontology/mcbo.owl.ttl \
  --query sparql/missing_definitions.rq \
  reports/robot/missing_definitions.tsv

Interpreting Results

  • QC passes if the output TSV contains only a header row (no data rows)

  • QC warns if the output TSV contains data rows (issues found)

View results:

# Count issues (subtract 1 for header)
wc -l reports/robot/*.tsv

# View specific issues
cat reports/robot/orphan_classes.tsv

Competency Question Evaluation

Query Directory

All 8 CQ queries are in eval/queries/:

CQ

Description

cq1

Culture conditions (pH, DO, temperature) for peak recombinant protein productivity

cq2

Cell lines engineered to overexpress gene Y

cq3

Nutrient concentrations associated with high viable cell density

cq4

Expression variation of gene X between clones

cq5

Differentially expressed pathways under Fed-batch vs Perfusion

cq6

Top genes correlated with recombinant protein productivity in stationary phase

cq7

Genes with highest fold change between high/low viability cells

cq8

Cell lines suited for specific glycosylation profiles

Evaluation Results

Real Data Statistics (724 cell culture processes):

Cell Culture Process Instances: 724
  Batch culture process: 518
  Fed-batch culture process: 135
  Perfusion culture process: 49
  Unknown culture process: 22

Bioprocess Sample Instances: 326

Demo vs Real Data Comparison:

CQ

Real Data (724)

Demo Data (10)

Notes

CQ1

161

13

Culture conditions for productivity

CQ2

3

2

Overexpression engineering

CQ3

0

4

Requires CollectionDay, ViableCellDensity

CQ4

0

144

Requires expression matrix

CQ5

4

3

Process type distribution

CQ6

0

38

Requires expression data

CQ7

0

7

Requires ViabilityPercentage

CQ8

0

3

Requires TiterValue, QualityType

CQs returning 0 on real data reflect ongoing curation; the queries are validated and functional. The demo data includes all required fields to demonstrate complete functionality.

Running Evaluations

# Demo data
mcbo-run-eval --data-dir data.sample

# Real data
mcbo-run-eval --data-dir .data

# Verify graph parses without running queries
mcbo-run-eval --data-dir data.sample --verify

Alternative Query Runners

ROBOT:

robot query \
  --input data.sample/graph.ttl \
  --query eval/queries/cq1.rq \
  --output data.sample/results/cq1.tsv

Apache Jena (arq):

arq --data data.sample/graph.ttl --query eval/queries/cq1.rq

CI/CD Pipeline

GitHub Actions Workflow

The repository includes a CI/CD workflow at .github/workflows/qc.yml that runs:

  1. Ontology parsing verification

  2. ROBOT QC queries

  3. Demo data build and evaluation

Run the CI pipeline locally:

make ci

This executes:

make install   # Install mcbo package
make qc        # Run ROBOT QC checks
make demo      # Build and evaluate demo data
make verify-demo  # Verify graph parses

Running All Checks

For a complete QC and evaluation run:

bash scripts/run_all_checks.sh

This runs:

  1. Ontology parsing verification

  2. All ROBOT QC queries

  3. Demo data build and evaluation

  4. Real data build and evaluation (if .data/ exists)

Contributing

Submitting Changes

  1. Fork the repository

  2. Create a feature branch

  3. Make your changes

  4. Run QC checks: make qc

  5. Run demo evaluation: make demo

  6. Submit a pull request

Term Requests

To request new ontology terms:

  1. Go to GitHub Issues

  2. Click “New Issue”

  3. Select “MCBO Term Request”

  4. Fill in the template

Coding Standards

  • Python code should follow PEP 8

  • Use type hints where practical

  • Add docstrings to public functions

  • Keep functions focused and small

Testing Changes

Before submitting:

# Install package in development mode
pip install -e python/

# Run full QC + demo
make all

# Verify no regressions
cat data.sample/results/SUMMARY.txt

Makefile Targets

Target

Description

make all

Run demo + qc (default)

make demo

Build and evaluate demo data

make real

Build and evaluate real data (.data/)

make qc

Run ROBOT QC checks on ontology

make clean

Remove generated files

make install

Install mcbo package

make robot

Download ROBOT jar

make ci

Run full CI pipeline

make docs

Build Sphinx documentation

See make help for the complete list.