Deep compression: compressing deep neural networks
speaker: Brahim event: Papers We Love SG ** extend deep neural networks to embedded systems reduce power consumption, space needed ** pruning: remove some weights remove low weights, retrain the remaining weights, repeat ** quantization: less bits per weight scalar quantization and centroids fine-tuning ** huffman encoding ** design custom ASICs for the compressed network