Getting the Best of Both Worlds (End Devices and Edge/Cloud) Using Hierarchical Inference - IEEE DLT

#mobile #data #networks
Share

In the past decade, Deep Learning (DL) has achieved
unprecedented improvement in the inference accuracy for several
hard-to-tackle applications such as natural language processing, image
classification, object detection and identification, etc. The
state-of-the-art DL models that achieve close to 100% inference
accuracy are large requiring gigabytes of memory to load them. On the
other end of the spectrum, the tinyML community is pushing the limits
of compressing DL models to embed them on memory-limited IoT devices.
Performing local inference for data samples on the end devices reduces
delay, saves network bandwidth, and improves the energy efficiency of
the system, but it suffers in terms of low QoE as the small-size DL
models have low inference accuracy. To reap the benefits of doing
local inference while not compromising on the inference accuracy, we
explore the idea of Hierarchical Inference (HI), wherein the local
inference is accepted only when it is correct, otherwise, the data
sample is offloaded. However, it is generally impossible to know if
the local inference is correct or not a priori. In this talk, for the
prototypical image classification application, I will present the HI
online learning framework for identifying incorrect local inferences.
The resulting problem turns out to be a novel partitioning experts
problem with continuous action space. I will present algorithms with
sub-linear regret analysis for both adversarial and stochastic
arrivals of experts and use simulation to demonstrate the efficacy of
HI on ImageNet and CIFAR-10 datasets.



  Date and Time

  Location

  Hosts

  Registration



  • Add_To_Calendar_icon Add Event to Calendar
  • 2356 Main Mall
  • Vancouver, British Columbia
  • Canada
  • Building: MacLeod Building
  • Room Number: 3038

  • Contact Event Host


  Speakers

Topic:

Getting the Best of Both Worlds (End Devices and Edge/Cloud) Using Hierarchical Inference

Abstract: In the past decade, Deep Learning (DL) has achieved
unprecedented improvement in the inference accuracy for several
hard-to-tackle applications such as natural language processing, image
classification, object detection and identification, etc. The
state-of-the-art DL models that achieve close to 100% inference
accuracy are large requiring gigabytes of memory to load them. On the
other end of the spectrum, the tinyML community is pushing the limits
of compressing DL models to embed them on memory-limited IoT devices.
Performing local inference for data samples on the end devices reduces
delay, saves network bandwidth, and improves the energy efficiency of
the system, but it suffers in terms of low QoE as the small-size DL
models have low inference accuracy. To reap the benefits of doing
local inference while not compromising on the inference accuracy, we
explore the idea of Hierarchical Inference (HI), wherein the local
inference is accepted only when it is correct, otherwise, the data
sample is offloaded. However, it is generally impossible to know if
the local inference is correct or not a priori. In this talk, for the
prototypical image classification application, I will present the HI
online learning framework for identifying incorrect local inferences.
The resulting problem turns out to be a novel partitioning experts
problem with continuous action space. I will present algorithms with
sub-linear regret analysis for both adversarial and stochastic
arrivals of experts and use simulation to demonstrate the efficacy of
HI on ImageNet and CIFAR-10 datasets.

Biography:

Jaya Prakash Champati is an Assistant Professor at IMDEA Networks
Institute, where he leads the Edge Networks group. His current
research focus is on efficient inference in Edge AI systems. Before
joining IMDEA, he was a post-doctoral researcher at EECS, KTH Royal
Institute of Technology, Sweden, where he significantly contributed to
the Age of Information Analysis and Optimization. He obtained his PhD
in Electrical and Computer Engineering from the University of Toronto,
Canada in 2017. His PhD work on generalizations for scheduling on
parallel processors was recognized through the Doctoral Completion
Award and the Paul Biringer Scholarship, both awarded by the
Department of Electrical and Computer Engineering, University of
Toronto. He obtained his master of technology degree from the Indian
Institute of Technology (IIT) Bombay, India in 2010, and worked at
Broadcom Communications for two years, where he contributed to the 4G
LTE MAC layer development. He was a Marie Skłodowska-Curie Actions
(MSCA) postdoctoral fellow and recipient of the best paper award at
the IEEE National Conference on Communications, India, 2011.