Getting the Best of Both Worlds (End Devices and Edge/Cloud) Using Hierarchical Inference

IEEE.org | IEEE Xplore Digital Library | IEEE Standards | IEEE Spectrum | More Sites

Getting the Best of Both Worlds (End Devices and Edge/Cloud) Using Hierarchical Inference - IEEE DLT

#mobile #data #networks

In the past decade, Deep Learning (DL) has achieved
unprecedented improvement in the inference accuracy for several
hard-to-tackle applications such as natural language processing, image
classification, object detection and identification, etc. The
state-of-the-art DL models that achieve close to 100% inference
accuracy are large requiring gigabytes of memory to load them. On the
other end of the spectrum, the tinyML community is pushing the limits
of compressing DL models to embed them on memory-limited IoT devices.
Performing local inference for data samples on the end devices reduces
delay, saves network bandwidth, and improves the energy efficiency of
the system, but it suffers in terms of low QoE as the small-size DL
models have low inference accuracy. To reap the benefits of doing
local inference while not compromising on the inference accuracy, we
explore the idea of Hierarchical Inference (HI), wherein the local
inference is accepted only when it is correct, otherwise, the data
sample is offloaded. However, it is generally impossible to know if
the local inference is correct or not a priori. In this talk, for the
prototypical image classification application, I will present the HI
online learning framework for identifying incorrect local inferences.
The resulting problem turns out to be a novel partitioning experts
problem with continuous action space. I will present algorithms with
sub-linear regret analysis for both adversarial and stochastic
arrivals of experts and use simulation to demonstrate the efficacy of
HI on ImageNet and CIFAR-10 datasets.

Date and Time

Location

Hosts

Registration

Add Event to Calendar
iCal
Google Calendar

2356 Main Mall
Vancouver, British Columbia
Canada
Building: MacLeod Building
Room Number: 3038

Contact Event Host

Speakers

Topic:

Abstract: In the past decade, Deep Learning (DL) has achieved
unprecedented improvement in the inference accuracy for several
hard-to-tackle applications such as natural language processing, image
classification, object detection and identification, etc. The
state-of-the-art DL models that achieve close to 100% inference
accuracy are large requiring gigabytes of memory to load them. On the
other end of the spectrum, the tinyML community is pushing the limits
of compressing DL models to embed them on memory-limited IoT devices.
Performing local inference for data samples on the end devices reduces
delay, saves network bandwidth, and improves the energy efficiency of
the system, but it suffers in terms of low QoE as the small-size DL
models have low inference accuracy. To reap the benefits of doing
local inference while not compromising on the inference accuracy, we
explore the idea of Hierarchical Inference (HI), wherein the local
inference is accepted only when it is correct, otherwise, the data
sample is offloaded. However, it is generally impossible to know if
the local inference is correct or not a priori. In this talk, for the
prototypical image classification application, I will present the HI
online learning framework for identifying incorrect local inferences.
The resulting problem turns out to be a novel partitioning experts
problem with continuous action space. I will present algorithms with
sub-linear regret analysis for both adversarial and stochastic
arrivals of experts and use simulation to demonstrate the efficacy of
HI on ImageNet and CIFAR-10 datasets.

Biography:

Jaya Prakash Champati is an Assistant Professor at IMDEA Networks
Institute, where he leads the Edge Networks group. His current
research focus is on efficient inference in Edge AI systems. Before
joining IMDEA, he was a post-doctoral researcher at EECS, KTH Royal
Institute of Technology, Sweden, where he significantly contributed to
the Age of Information Analysis and Optimization. He obtained his PhD
in Electrical and Computer Engineering from the University of Toronto,
Canada in 2017. His PhD work on generalizations for scheduling on
parallel processors was recognized through the Doctoral Completion
Award and the Paul Biringer Scholarship, both awarded by the
Department of Electrical and Computer Engineering, University of
Toronto. He obtained his master of technology degree from the Indian
Institute of Technology (IIT) Bombay, India in 2010, and worked at
Broadcom Communications for two years, where he contributed to the 4G
LTE MAC layer development. He was a Marie Skłodowska-Curie Actions
(MSCA) postdoctoral fellow and recipient of the best paper award at
the IEEE National Conference on Communications, India, 2011.