Development Guide
This guide covers contributing to MCBO, running quality control checks, and understanding the evaluation framework.
Repository Structure
mcbo/
├── ontology/ # MCBO ontology (TBox)
│ └── mcbo.owl.ttl
├── python/ # Python package (pip install -e python/)
│ ├── mcbo/ # Core library + CLI modules
│ │ ├── __init__.py # Package exports
│ │ ├── namespaces.py # Shared RDF namespaces
│ │ ├── graph_utils.py # Graph loading/creation utilities
│ │ ├── csv_to_rdf.py # CSV-to-RDF conversion
│ │ ├── build_graph.py # Graph building
│ │ ├── run_eval.py # SPARQL evaluation
│ │ └── stats_eval_graph.py # Statistics
│ └── pyproject.toml # Package metadata and entry points
├── scripts/ # Shell scripts
│ └── run_all_checks.sh # Full QC + evaluation runner
├── eval/ # Competency question evaluation
│ └── queries/ # SPARQL query files (*.rq)
├── sparql/ # QC queries for ROBOT
├── reports/ # QC reports
│ └── robot/
├── data.sample/ # Demo data (public)
└── .data/ # Real data (git-ignored)
Quality Control
ROBOT QC Queries
MCBO uses ROBOT for ontology quality control.
Run all QC checks:
make qc
This executes three QC queries:
Query |
Purpose |
|---|---|
|
Finds classes without parent classes (orphans) |
|
Finds classes with duplicate |
|
Finds classes missing |
Manual QC Commands
# Check for orphan classes
java -jar .robot/robot.jar query \
--input ontology/mcbo.owl.ttl \
--query sparql/orphan_classes.rq \
reports/robot/orphan_classes.tsv
# Check for duplicate labels
java -jar .robot/robot.jar query \
--input ontology/mcbo.owl.ttl \
--query sparql/duplicate_labels.rq \
reports/robot/duplicate_labels.tsv
# Check for missing definitions
java -jar .robot/robot.jar query \
--input ontology/mcbo.owl.ttl \
--query sparql/missing_definitions.rq \
reports/robot/missing_definitions.tsv
Interpreting Results
QC passes if the output TSV contains only a header row (no data rows)
QC warns if the output TSV contains data rows (issues found)
View results:
# Count issues (subtract 1 for header)
wc -l reports/robot/*.tsv
# View specific issues
cat reports/robot/orphan_classes.tsv
Competency Question Evaluation
Query Directory
All 8 CQ queries are in eval/queries/:
CQ |
Description |
|---|---|
cq1 |
Culture conditions (pH, DO, temperature) for peak recombinant protein productivity |
cq2 |
Cell lines engineered to overexpress gene Y |
cq3 |
Nutrient concentrations associated with high viable cell density |
cq4 |
Expression variation of gene X between clones |
cq5 |
Differentially expressed pathways under Fed-batch vs Perfusion |
cq6 |
Top genes correlated with recombinant protein productivity in stationary phase |
cq7 |
Genes with highest fold change between high/low viability cells |
cq8 |
Cell lines suited for specific glycosylation profiles |
Evaluation Results
Real Data Statistics (724 cell culture processes):
Cell Culture Process Instances: 724
Batch culture process: 518
Fed-batch culture process: 135
Perfusion culture process: 49
Unknown culture process: 22
Bioprocess Sample Instances: 326
Demo vs Real Data Comparison:
CQ |
Real Data (724) |
Demo Data (10) |
Notes |
|---|---|---|---|
CQ1 |
161 |
13 |
Culture conditions for productivity |
CQ2 |
3 |
2 |
Overexpression engineering |
CQ3 |
0 |
4 |
Requires CollectionDay, ViableCellDensity |
CQ4 |
0 |
144 |
Requires expression matrix |
CQ5 |
4 |
3 |
Process type distribution |
CQ6 |
0 |
38 |
Requires expression data |
CQ7 |
0 |
7 |
Requires ViabilityPercentage |
CQ8 |
0 |
3 |
Requires TiterValue, QualityType |
CQs returning 0 on real data reflect ongoing curation; the queries are validated and functional. The demo data includes all required fields to demonstrate complete functionality.
Running Evaluations
# Demo data
mcbo-run-eval --data-dir data.sample
# Real data
mcbo-run-eval --data-dir .data
# Verify graph parses without running queries
mcbo-run-eval --data-dir data.sample --verify
Alternative Query Runners
ROBOT:
robot query \
--input data.sample/graph.ttl \
--query eval/queries/cq1.rq \
--output data.sample/results/cq1.tsv
Apache Jena (arq):
arq --data data.sample/graph.ttl --query eval/queries/cq1.rq
CI/CD Pipeline
GitHub Actions Workflow
The repository includes a CI/CD workflow at .github/workflows/qc.yml that runs:
Ontology parsing verification
ROBOT QC queries
Demo data build and evaluation
Run the CI pipeline locally:
make ci
This executes:
make install # Install mcbo package
make qc # Run ROBOT QC checks
make demo # Build and evaluate demo data
make verify-demo # Verify graph parses
Running All Checks
For a complete QC and evaluation run:
bash scripts/run_all_checks.sh
This runs:
Ontology parsing verification
All ROBOT QC queries
Demo data build and evaluation
Real data build and evaluation (if
.data/exists)
Contributing
Submitting Changes
Fork the repository
Create a feature branch
Make your changes
Run QC checks:
make qcRun demo evaluation:
make demoSubmit a pull request
Term Requests
To request new ontology terms:
Go to GitHub Issues
Click “New Issue”
Select “MCBO Term Request”
Fill in the template
Coding Standards
Python code should follow PEP 8
Use type hints where practical
Add docstrings to public functions
Keep functions focused and small
Testing Changes
Before submitting:
# Install package in development mode
pip install -e python/
# Run full QC + demo
make all
# Verify no regressions
cat data.sample/results/SUMMARY.txt
Makefile Targets
Target |
Description |
|---|---|
|
Run demo + qc (default) |
|
Build and evaluate demo data |
|
Build and evaluate real data (.data/) |
|
Run ROBOT QC checks on ontology |
|
Remove generated files |
|
Install mcbo package |
|
Download ROBOT jar |
|
Run full CI pipeline |
|
Build Sphinx documentation |
See make help for the complete list.