Data Collection and Staging Process Automation Precision, Speed and Scalability for Machine Learning Modelling of Algorithmic Trading Stocks-Price Prediction
Data Collection and Staging Process Automation Precision, Speed and Scalability for Machine Learning Modelling of Algorithmic Trading Stocks-Price Prediction
Abstract—
This presentation discusses an automated data collection and staging pipeline for high-frequency stock price prediction using machine learning. The system integrates scalable ELT processes, data deduplication, and distributed training with XGBoost on high-performance computing infrastructure. Designed for precision, speed, and scalability, the framework enables efficient handling of large financial time-series datasets while maintaining robust predictive performance and optimized resource utilization.
📢 Public Presentation Announcement
Join us for a live presentation on:
Data Collection and Staging Process Automation for Machine Learning in Algorithmic Trading
🗓 Wednesday, April 22
⏰ 9:00 AM – 11:00 AM
📍 Okanagan College E-301 / Hybrid
This project brings together three teams—Data Collection, Data Warehousing, and Machine Learning—into a unified, end-to-end system for high-frequency stock price prediction.
Learn how we designed a scalable pipeline using distributed computing and XGBoost, covering system architecture, data engineering, and real-world ML applications in algorithmic trading.
This work also establishes a foundation for ongoing research and extended large-scale evaluation.
Open to students, faculty, and anyone interested in machine learning, data systems, or fintech.
Date and Time
Location
Hosts
Registration
-
Add Event to Calendar
Loading virtual attendance info...
- 1000 K. L. O. Rd
- Kelowna, British Columbia
- Canada V1Y 4X8
- Building: 1000 K. L. O. Rd
- Room Number: E-301
- Click here for Map
Speakers
Contributors
This project was developed through a collaborative effort across three specialized teams:
📊 Data Collection Team
Responsible for sourcing, aggregating, and preprocessing raw financial and market data.
-
Andrew Johnson
-
Emilio Iturbide
-
Reilly Mager
-
Lian Heckrodt
-
Cade Dempsey
-
Kristina Cormier
https://github.com/KristinaCormier/2026COSC471DataCollection
Address:British Columbia, Canada
🏗️ Data Warehouse Team
Designed and implemented the data storage architecture, ETL/ELT pipelines, and database systems.
-
Alex Anthony
-
Hayden Nikkel
-
Daemon Lewis
-
John Cortez
-
Jackson Rosco
https://github.com/Okanagan-College-Cosc471-Winter-2026/the-project-data-warehouse-team.git
🤖 Machine Learning Team (XGBoost)
Developed, trained, and evaluated machine learning models for predictive analytics and trading strategies.
-
Harsh Saw
-
Zane Tessmer
-
Kavaljeet Singh
-
Dante Bertolutti
-
Guntash Brar
-
Parag Jindal
https://github.com/Okanagan-College-Cosc471-Winter-2026/the-project-maverick
🤝 Acknowledgements
We thank all contributors and collaborators who supported the development, testing, and deployment of this system.