CV | Sidong Zhang

General Information

Full Name	Sidong Zhang
Languages	English, Mandarin

Education

Sep. 2020 - present
Doctor of Philosophy

UMass Amherst’s College of Information and Computer Sciences, Amherst, MA, USA
- Teaching assistant of graduate level CS 589 machine learning and CS 651 optimization class
- Research assistant in Information Fusion lab, currently funded by an NIH RO3
Sep. 2018 - Jan. 2021

Master of Science

UMass Amherst’s College of Information and Computer Sciences, Amherst, MA, USA
Sep. 2018 - Jan. 2021

Bachelor of Engineering

Nanjing University, Software Institute, Nanjing, China

Experience

Feb. 2024 - Sep. 2024
Audio-Visual Speech Separation via Bottleneck Iterative Network

UMass Amherst’s College of Information and Computer Sciences & Dolby Laboratories
- Accepted to ICML 2025 Workshop on Machine Learning for Audio
- We work on audio-visual speech separation on noisy audio mixtures of NTCD-TIMIT and LRS3 data set
- We propose a novel multimodal fusion framework on audio and video modality with a bottleneck iterative structure
- Our proposed model outperforms state-of-the-art on SI-SDRi and only requires 50\% of SOTA training time
Sep. 2020 - July 2025
Longitudinal Multimodal Modeling for Alzheimer’s Early Detection in the Wild

UMass Amherst’s College of Information and Computer Sciences
- We introduce information theory-based unsupervised representation learning on brain MRIs as a complement to risk factors, with an estimated mutual information value to evaluate the strength of the dependency to risk factors
- We both train CNN from scratch and finetune foundation models as two representation learning models
- We optimize the procedure of training Alzheimer's forecasting model to include two stages with partial model parameter freezing for stabler validation performance during training process
- We constructed a representative subset from the ADNI dataset aligned with the TADPOLE challenge, consisting of 1,183 patients for training, 169 for validation, and 336 for testing
- We evaluate the forecasting performance using both micro F1-score on CN-MCI-AD forecasting and the precision of capturing MCI to AD transition timing across 100 repeated experiments to ensure statistical significance
Jan. 2025 – May 2025
Encoding Domain Insights into Multi-modal Fusion: Improved Performance at the Cost of Robustness

UMass Amherst’s College of Information and Computer Sciences
- Accepted to ICML 2025 Workshop on Methods and Opportunities at Small Scale
- We work on comparing fusion methods with and without domain knowledge embedded in the model structure
- We conduct experiments on MMSD dataset with both the original task of detecting sarcasm and a synthetic task to control the level of domain knowledge, with additional Gaussian noise added to the inputs to examine robustness
- We find that aligning fusion design with priors boosts clean-data accuracy with limited data but significantly diminishes robustness to noisy inputs
Feb. 2019 - May. 2019
Clustered Vertical Attention for Irregular Time Series Modelling

UMass Amherst’s College of Information and Computer Sciences
- Results were submitted to ICML 2019 TimeSeriesWorkshop
- We worked on PhysioNet Challenge 2012 data set
- We improved the prediction accuracy of the in-hospital survival of ICU patients via existing imputation methods
- We ran Minimum Spanning Tree algorithm to determine the most correlated imputed clusters
- We trained separate Attention models for clusters and predicted on a Long Short-term Memory Attention model
- The model got accuracy improvements of 1.5%, 1.3% and 1.8% on 3 different imputation methods

Skill

{"Languages"=>"English (Proficient), Mandarin (Native)"}
{"Programming language"=>"Java, Python, C, Lisp, Markdown, Latex"}
{"Development tools"=>"Pytorch"}

General Information

Education

Doctor of Philosophy

Master of Science

Bachelor of Engineering

Experience

Audio-Visual Speech Separation via Bottleneck Iterative Network

Longitudinal Multimodal Modeling for Alzheimer’s Early Detection in the Wild

Encoding Domain Insights into Multi-modal Fusion: Improved Performance at the Cost of Robustness

Clustered Vertical Attention for Irregular Time Series Modelling

Skill