Data Collection and Staging Process Automation Precision, Speed and Scalability for Machine Learning Modelling of Algorithmic Trading Stocks-Price Prediction

#data #processing #warehouse #machine #learning #xgboost #OC #students #software #engineering
Share

Data Collection and Staging Process Automation Precision, Speed and Scalability for Machine Learning Modelling of Algorithmic Trading Stocks-Price Prediction

 

Abstract—

This presentation discusses an automated data collection and staging pipeline for high-frequency stock price prediction using machine learning. The system integrates scalable ELT processes, data deduplication, and distributed training with XGBoost on high-performance computing infrastructure. Designed for precision, speed, and scalability, the framework enables efficient handling of large financial time-series datasets while maintaining robust predictive performance and optimized resource utilization.

 

 

📢 Public Presentation Announcement

Join us for a live presentation on:

Data Collection and Staging Process Automation for Machine Learning in Algorithmic Trading

🗓 Wednesday, April 22
9:00 AM – 11:00 AM
📍 Okanagan College E-301 / Hybrid

This project brings together three teams—Data Collection, Data Warehousing, and Machine Learning—into a unified, end-to-end system for high-frequency stock price prediction.

Learn how we designed a scalable pipeline using distributed computing and XGBoost, covering system architecture, data engineering, and real-world ML applications in algorithmic trading.

This work also establishes a foundation for ongoing research and extended large-scale evaluation.

Open to students, faculty, and anyone interested in machine learning, data systems, or fintech.



  Date and Time

  Location

  Hosts

  Registration



  • Add_To_Calendar_icon Add Event to Calendar

Loading virtual attendance info...

  • 1000 K. L. O. Rd
  • Kelowna, British Columbia
  • Canada V1Y 4X8
  • Building: 1000 K. L. O. Rd
  • Room Number: E-301
  • Click here for Map

  • Contact Event Hosts
  • Starts 17 April 2026 07:00 AM UTC
  • Ends 22 April 2026 06:00 PM UTC
  • No Admission Charge


  Speakers

Contributors

This project was developed through a collaborative effort across three specialized teams:

📊 Data Collection Team

Responsible for sourcing, aggregating, and preprocessing raw financial and market data.

  • Andrew Johnson

  • Emilio Iturbide

  • Reilly Mager

  • Lian Heckrodt

  • Cade Dempsey

  • Kristina Cormier

https://github.com/KristinaCormier/2026COSC471DataCollection

Address:British Columbia, Canada

🏗️ Data Warehouse Team

Designed and implemented the data storage architecture, ETL/ELT pipelines, and database systems.

  • Alex Anthony

  • Hayden Nikkel

  • Daemon Lewis

  • John Cortez

  • Jackson Rosco

https://github.com/Okanagan-College-Cosc471-Winter-2026/the-project-data-warehouse-team.git


🤖 Machine Learning Team (XGBoost)

Developed, trained, and evaluated machine learning models for predictive analytics and trading strategies.

  • Harsh Saw

  • Zane Tessmer

  • Kavaljeet Singh

  • Dante Bertolutti

  • Guntash Brar

  • Parag Jindal

https://github.com/Okanagan-College-Cosc471-Winter-2026/the-project-maverick 

 


 

🤝 Acknowledgements

We thank all contributors and collaborators who supported the development, testing, and deployment of this system.