Distillation by duplication: The importance of layer selection
A downloadable research
As layers are chained together in a pipeline where each layer has knowledge on how to decode the information passed to it from the previous layer and how to process it to gain value that ultimately leads to a prediction. Thus, we hypothesise that on one hand it may be beneficial to copy consecutive layers from the teacher to the student, as they can already decode each other's output. However, copying layers that are very separated may copy knowledge on different processing steps while their connections can be learnt more easily.
Status | Released |
Category | Other |
Author | roksanagow |
Download
Download
Neural_Network_Knowledge_Distillation_Importance_of_Layer_Selection (2).pdf 278 kB
Leave a comment
Log in with itch.io to leave a comment.