Multi-teacher self-distillation based on adaptive weighting and activation pattern for enhancing lightweight arrhythmia recognition
Wang Z., Ma C., Zhang S., Zhao M., Liu Y., Zhao L., Zhu T., Li J., Liu C.
Wearable electrocardiogram (ECG) devices, with their outstanding advantages of comfort and portability, play a vital role in daily arrhythmia monitoring outside the hospital. However, the embedded CPU used in most devices greatly limits the deployment of high-performance models. Therefore, it is increasingly important to develop lightweight neural networks and reduce computing requirements to achieve edge deployment of wearable devices. Knowledge distillation (KD) offers a promising solution for compressing and deploying lightweight neural networks by transferring knowledge from complex teacher models to enhance the performance of compact student models. However, conventional KDs give less thought to selecting strong and accessible teachers for students, which can lead to suboptimal outcomes. To mitigate such limitations, in this study, we propose a multi-teacher self-distillation (MTSD) framework to improve the performance of lightweight arrhythmia detection models in wearable ECG monitoring. Specifically, we first leverage representations from teacher models via similarity of activation patterns in the intermediate layer, to capture inter-category and inter-channel relationships, which then incorporates an MTSD framework to ensure the correctness and acceptability of teacher supervision. Furthermore, the self-distillation framework facilitates knowledge sharing across different layers within the model, thereby enhancing overall performance. Extensive experiments conducted on three medical signal datasets demonstrate the superiority of the proposed method over existing state-of-the-art distillation methods, achieving the AUC/accuracy by 0.922, 0.908 and 87.05%. Notably, the model processed a 12-lead 10-s ECG signal in only 1 ms on an NVIDIA Jetson Orin NX.