Network Management in the Machine Learning Era
Network management is essential for maintaining high performance, security, and reliability in large scale networks. Network management involves measuring network state, making decisions for various management applications, and enforcing the decisions into individual devices. In lieu of recent progress in machine learning, one natural question is how should we build network management systems to better leverage these emerging techinques.
In this talk, I will discuss three key challenges in building future management systems driven by ML: How to collect a large amount of diverse data to feed into ML systems, how to enable fast and accurate management decisions using ML, and how to enable fast reaction to network events. I'll show a few example systems we built to address these challenges: The first system is DTA, a direct telemetry access framework that aggregates and moves hundreds of millions of reports per second from switches into queryable data structures in collectors’ memory; the second system is Teal, a learning-based algorithm that leverages the parallel processing power of GPUs to accelerate traffic engineering control; finally, I'll discuss our preliminary ideas on enabling fast reactions to network events.
Date and Time
Location
Hosts
Registration
-
Add Event to Calendar
Speakers
Minlan Yu of Harvard University, USA
Network Management in the Machine Learning Era
Network management is essential for maintaining high performance, security, and reliability in large scale networks. Network management involves measuring network state, making decisions for various management applications, and enforcing the decisions into individual devices. In lieu of recent progress in machine learning, one natural question is how should we build network management systems to better leverage these emerging techinques.
In this talk, I will discuss three key challenges in building future management systems driven by ML: How to collect a large amount of diverse data to feed into ML systems, how to enable fast and accurate management decisions using ML, and how to enable fast reaction to network events. I'll show a few example systems we built to address these challenges: The first system is DTA, a direct telemetry access framework that aggregates and moves hundreds of millions of reports per second from switches into queryable data structures in collectors’ memory; the second system is Teal, a learning-based algorithm that leverages the parallel processing power of GPUs to accelerate traffic engineering control; finally, I'll discuss our preliminary ideas on enabling fast reactions to network events.
Biography:
Minlan Yu is a Gordon McKay professor at Harvard School of Engineering and Applied Science. She’s the assistant director of the SRC/DARPA JUMP 2.0 ACE Center for Evolvable Computing. She received her B.A. in computer science and mathematics from Peking University and her M.A. and PhD in computer science from Princeton University. She has actively collaborated with companies such as Google, AT&T, Microsoft, Facebook, and Intel. Her research interests include data networking, distributed systems, enterprise and data center networks, and software-defined networking. She received the ACM-W rising star award, NSF CAREER award, and ACM SIGCOMM doctoral dissertation award. She served as PC co-chair for SIGCOMM, NSDI, HotNets, and several other conferences and workshops.