Presented Generative Action Tell-Tales: Assessing human motion in synthesized videos at the ML Collective DLCT reading group.
👋 Hi! I’m currently a Grad Student at Boston University. I completed my undergrad at Manipal Institute of Technology, India, and had developed a keen interest in all things ML during my first year and was fortunate to gain research experience along the way. Prior to joining BU, I worked with the Content and User Understanding team at ShareChat, and was fortunate to work on projects with the Serre Lab (Brown University), Human Dynamics Group (MIT Media Lab, Massachusetts Institute of Technology), ETS, Montreal and FOR.ai (now Cohere for AI).
At BU, I am fortunate to be advised by Prof. Deepti Ghadiyaram, and I’m currently exploring topics in computer vision, with a broad interest in representation learning and generative models.
CV / Email MeEducation
News
Generative Action Tell-Tales: Assessing human motion in synthesized videos accepted as an oral at VGBE and PhysHuman workshops, CVPR 2026.
Some Modalities are More Equal Than Others: Decoding and Architecting Multimodal Integration in MLLMs accepted to CVPR 2026 Findings.
Presented Generative Action Tell-Tales: Assessing human motion in synthesized videos at NECV 2025 (oral).
Started my PhD at Boston University, advised by Prof. Deepti Ghadiyaram.
What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization and Revelio: Interpreting and leveraging semantic information in diffusion models accepted at ICCV 2025.
What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization accepted at the VisCon Workshop, CVPR 2025.
Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models accepted as an oral at the AI4CC Workshop, CVPR 2025.
Revelio: Interpreting and leveraging semantic information in diffusion models accepted as an oral at the MIV Workshop, CVPR 2025.
Research
For more see Google Scholar
Experience
- Vision in Multimodal Large Language Models (MLLMs): Investigating limitations of visual understanding in MLLMs and developing methods to improve cross-modal alignment for robust multimodal reasoning.
- Evaluation of Video Generation Models: Designing and implementing novel evaluation metrics to assess human action fidelity, temporal consistency, and motion coherence in generative video models.
- Internal Representations of Diffusion Models: Analyzing diffusion models as representation learners by probing their intermediate states; demonstrating their effectiveness for downstream tasks such as classification, multi-modal reasoning, and domain generalization.