Transform Quantization for CNN Compression

#Deep #learning #Convolutional #Neural #Network #(CNN) #CNN #compression #quantization #transform
Share

In this work, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN performance at a given quantization bit-rate, or consider their joint statistics during training only and do not facilitate efficient compression of already trained CNN models. We optimally transform (decorrelate) and quantize the weights post-training using a rate-distortion framework to improve compression at any given quantization bit-rate. Transform quantization unifies quantization and dimensionality reduction (decorrelation) techniques in a single framework to facilitate low bit-rate compression of CNNs and efficient inference in the transform domain. We first introduce a theory of rate and distortion for CNN quantization, and pose optimum quantization as a rate-distortion optimization problem. We then show that this problem can be solved using optimal bit-depth allocation following decorrelation by the optimal End-to-end Learned Transform (ELT) we derive in this paper. Experiments demonstrate that transform quantization advances the state of the art in CNN compression in both retrained and non-retrained quantization scenarios. In particular, we find that transform quantization with retraining is able to compress CNN models such as AlexNet, ResNet and DenseNet to very low bit-rates (1-2 bits). 

This talk is based on joint published work with Zhe Wang, David Taubman and Bernd Girod. Preprint is available at https://arxiv.org/abs/2009.01174.

 



  Date and Time

  Location

  Hosts

  Registration



  • Date: 28 Jun 2021
  • Time: 12:00 PM to 01:00 PM
  • All times are (GMT-08:00) Canada/Pacific
  • Add_To_Calendar_icon Add Event to Calendar
If you are not a robot, please complete the ReCAPTCHA to display virtual attendance info.
  • Contact Event Host
  • Starts 06 June 2021 09:32 AM
  • Ends 28 June 2021 11:50 AM
  • All times are (GMT-08:00) Canada/Pacific
  • No Admission Charge


  Speakers

Dr. Sean I. Young Dr. Sean I. Young of Harvard Medical School

Topic:

Transform Quantization for CNN Compression

In this work, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN performance at a given quantization bit-rate, or consider their joint statistics during training only and do not facilitate efficient compression of already trained CNN models. We optimally transform (decorrelate) and quantize the weights post-training using a rate-distortion framework to improve compression at any given quantization bit-rate. Transform quantization unifies quantization and dimensionality reduction (decorrelation) techniques in a single framework to facilitate low bit-rate compression of CNNs and efficient inference in the transform domain. We first introduce a theory of rate and distortion for CNN quantization, and pose optimum quantization as a rate-distortion optimization problem. We then show that this problem can be solved using optimal bit-depth allocation following decorrelation by the optimal End-to-end Learned Transform (ELT) we derive in this paper. Experiments demonstrate that transform quantization advances the state of the art in CNN compression in both retrained and non-retrained quantization scenarios. In particular, we find that transform quantization with retraining is able to compress CNN models such as AlexNet, ResNet and DenseNet to very low bit-rates (1-2 bits). 

This talk is based on joint published work with Zhe Wang, David Taubman and Bernd Girod. Preprint is available at https://arxiv.org/abs/2009.01174.

 

Biography:

Sean I. Young is currently a research fellow at Harvard Medical School where he works on computational neuroimaging problems. Prior to this he was a postdoctoral researcher at Stanford University where he worked on the compression of neural networks and algorithms for non-line-of-sight imaging. He holds a PhD in Electrical Engineering from the University of New South Wales. In 2018, he received the Australian Pattern Recognition Society best paper award for his work entitled “fast optical flow extraction from compressed video”.