Journal of Advanced Robotics, Autonomous Systems and Human-Machine Interaction
Self-Supervised Learning of Cardiac Dynamics Using Masked Volume Modeling (MVM)
Abstract
Kunal Roy and Annavarapu Chandra Sekhara Rao
Cardiac function estimation from echocardiographic data generally involves supervised learning that is costly in terms of clinical labels. In this work, we present a novel selfsupervised learning scheme based on Masked Volume Modeling (MVM), motivated by Masked Autoencoders and SimMIM. The proposed framework aims to learn latent representations from volume time-series obtained from echocardiograms. In contrast to earlier works, which treat raw 2D videos or static images, we treat cardiac volume data as 1D signals, masking parts of the time-series and then reconstructing them in a Transformer encoder-decoder framework. This technique eliminates the need for labeled data, enabling strong downstream efficacy on ejection fraction (EF) prediction, performing unsupervised clustering, and stratifying illnesses. Our work is two-fold: (1) We establish the mathematical underpinnings of MVM through theoretical error bounds on reconstruction and convergence guarantees, and (2) We set up a comparative platform for MVM and traditional signal processing methods—like Fourier and Wavelet transforms—to exhibit the specialties of learned representations for cardiac signal reconstruction. Our model performs better than conventional CNNs and LSTMs and offers both physiological interpretability as well as computational efficiency. In addition, we provide a comparison with state-of-the- art approaches and highlight our contributions, including phase-aware masking and interpretability by SHAP analysis.

