MarocRail Optimizer — ML-Driven Railway Scheduling
About this project
MarocRail Optimizer is a data-driven train scheduling system that predicts delays and optimizes railway operations using machine learning. The project was developed during my internship at ONCF, Morocco's national railway operator, as a complete full-stack ML application.
Backend Architecture
A Python / Flask REST API exposes over ten endpoints for schedule management, delay analytics, and predictions. Persistence is provided by SQLite (8.75 MB) through a normalized schema that handles 34,160 delay records and 2,112 weekly schedules. A modular script layer covers synthetic-data generation, database management, and model training.
Machine Learning Pipeline
A Random Forest classifier achieves approximately 79–80% accuracy on delay prediction using 31 engineered features. A dual-model design pairs classification of delay probability with regression of expected duration. Features encode temporal patterns (hour, day, season), weather conditions, route history, and cascade effects. Training data spans six months across 10 stations, 25 routes, and 80 trains.
Optimization Engine
A heuristic conflict-detection algorithm identifies platform conflicts, turnaround-time violations, and maintenance windows. An automated resolver then adjusts departure times, reassigns platforms, and redistributes train capacity, yielding a 15%+ reduction in predicted delays after optimization.
Frontend Interface
The client is a responsive bilingual dashboard (English / French) built with vanilla JavaScript and Chart.js. Five modules provide system overview, schedule viewing, visual analytics, ML prediction, and optimization control.
Data Engineering
The synthetic dataset reproduces realistic railway dynamics: 45% passenger-related delays, 24% cascade, 16% weather, 11% technical, and 4% maintenance. Temporal modeling captures peak hours (6–9 AM, 5–8 PM), seasonal variation, and day-of-week patterns. A complete pipeline transforms raw JSON into indexed SQLite tables and model-ready feature matrices.
Technology Stack
Python 3.9+ · Flask 3.0 · Pandas · NumPy · Scikit-learn · SQLite · Chart.js · Joblib
Key Outcomes
- ~79% prediction accuracy with 10.86-minute MAE.
- RESTful API supporting concurrent requests and session management.
- Schema scaling to over 100,000 records with sub-second query response.
- Bilingual interface (FR / EN).
- Complete architecture, API, and deployment documentation.
Demonstrated Skills
Full-stack ML deployment, relational database design, transportation-logistics modeling, and a modular architecture extensible toward real-time tracking and predictive maintenance.
Backend Architecture
A Python / Flask REST API exposes over ten endpoints for schedule management, delay analytics, and predictions. Persistence is provided by SQLite (8.75 MB) through a normalized schema that handles 34,160 delay records and 2,112 weekly schedules. A modular script layer covers synthetic-data generation, database management, and model training.
Machine Learning Pipeline
A Random Forest classifier achieves approximately 79–80% accuracy on delay prediction using 31 engineered features. A dual-model design pairs classification of delay probability with regression of expected duration. Features encode temporal patterns (hour, day, season), weather conditions, route history, and cascade effects. Training data spans six months across 10 stations, 25 routes, and 80 trains.
Optimization Engine
A heuristic conflict-detection algorithm identifies platform conflicts, turnaround-time violations, and maintenance windows. An automated resolver then adjusts departure times, reassigns platforms, and redistributes train capacity, yielding a 15%+ reduction in predicted delays after optimization.
Frontend Interface
The client is a responsive bilingual dashboard (English / French) built with vanilla JavaScript and Chart.js. Five modules provide system overview, schedule viewing, visual analytics, ML prediction, and optimization control.
Data Engineering
The synthetic dataset reproduces realistic railway dynamics: 45% passenger-related delays, 24% cascade, 16% weather, 11% technical, and 4% maintenance. Temporal modeling captures peak hours (6–9 AM, 5–8 PM), seasonal variation, and day-of-week patterns. A complete pipeline transforms raw JSON into indexed SQLite tables and model-ready feature matrices.
Technology Stack
Python 3.9+ · Flask 3.0 · Pandas · NumPy · Scikit-learn · SQLite · Chart.js · Joblib
Key Outcomes
- ~79% prediction accuracy with 10.86-minute MAE.
- RESTful API supporting concurrent requests and session management.
- Schema scaling to over 100,000 records with sub-second query response.
- Bilingual interface (FR / EN).
- Complete architecture, API, and deployment documentation.
Demonstrated Skills
Full-stack ML deployment, relational database design, transportation-logistics modeling, and a modular architecture extensible toward real-time tracking and predictive maintenance.
Technologies
PythonFlaskSQLiteScikit-learnRandom ForestChart.jsJavaScript