Xavier Thomas (Rohan)

👋 Hi! I’m currently a Grad Student at Boston University. I completed my undergrad at Manipal Institute of Technology, India, and had developed a keen interest in all things ML during my first year and was fortunate to gain research experience along the way. Prior to joining BU, I worked with the Content and User Understanding team at ShareChat, and was fortunate to work on projects with the Serre Lab (Brown University), Human Dynamics Group (MIT Media Lab, Massachusetts Institute of Technology), ETS, Montreal and FOR.ai (now Cohere for AI).

At BU, I am fortunate to be advised by Prof. Deepti Ghadiyaram, and I’m currently exploring topics in computer vision, with a broad interest in representation learning and generative models.

Email Me

Education

Ph.D. in Computer Science 2025 – Present

M.S. in Artificial Intelligence 2023 – 2025

Boston University

B.Tech. in Electronics and Instrumentation 2018 – 2022

Manipal Institute of Technology · Minor in Computational Intelligence

Research

Generative Action Tell-Tales: Assessing human motion in synthesized videos

Xavier Thomas, Youngsun Lim, Ananya Srinivasan, Audrey Zheng, Deepti Ghadiyaram

Under Review · Code · Paper

What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization

Xavier Thomas, Deepti Ghadiyaram

International Conference on Computer Vision (ICCV), 2025 · Code · Paper

Revelio: Interpreting and leveraging semantic information in diffusion models

Dahye Kim*, Xavier Thomas*, Deepti Ghadiyaram

International Conference on Computer Vision (ICCV), 2025 · Code · Paper

Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models

Ketan Suhaas Saichandran*, Xavier Thomas*, Prakhar Kaushik, Deepti Ghadiyaram

AI4CC Workshop, IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), 2025 (oral) · Code · Paper

Diversity vs. Recognizability: Human-like generalization in one-shot generative models

Victor Boutin, Lakshya Singhal, Xavier Thomas, Thomas Serre

Neural Information Processing Systems (NeurIPS), 2022 · Code · Paper

Adaptive Methods for Aggregated Domain Generalization

Xavier Thomas, Dhruv Mahajan, Alex Pentland, Abhimanyu Dubey

Preprint · Code · Paper

MAViC: Multimodal Active Learning for Video Captioning

Gyanendra Das, Xavier Thomas, Anant Raj, Vikram Gupta

Preprint · Paper

For more see Google Scholar

Experience

Graduate Researcher

Boston University

Jun 2024 – Present

Vision in Multimodal Large Language Models (MLLMs): Investigating limitations of visual understanding in MLLMs and developing methods to improve cross-modal alignment for robust multimodal reasoning.
Evaluation of Video Generation Models: Designing and implementing novel evaluation metrics to assess human action fidelity, temporal consistency, and motion coherence in generative video models.
Internal Representations of Diffusion Models: Analyzing diffusion models as representation learners by probing their intermediate states; demonstrating their effectiveness for downstream tasks such as classification, multi-modal reasoning, and domain generalization.

Advisor: Prof. Deepti Ghadiyaram

Machine Learning Engineer Intern

ShareChat | Content and User Understanding Team

Jul 2022 – Jun 2023

Integrated advanced computer vision pipelines into production, improving content classification and moderation capabilities on ShareChat (180M+ MAUs) and Moj (160M+ MAUs). Contributed to MAViC, a Multimodal Active Learning algorithm for Video Captioning that reduces annotation effort by integrating semantic similarity and uncertainty from visual and language modalities.

Manager: Vikram Gupta | Advisors: Prof. Anant Raj, Prof. Hisham Cholakkal

Research Intern

Serre Lab, Brown University

Sep 2021 – May 2022

Developed a novel evaluation framework for one-shot generative models, introducing new metrics for recognizability (human interpretability) and diversity (concept coverage) to enable systematic comparisons. Benchmarked 4 representative generative architectures against human performance on the Omniglot dataset.

Advisors: Dr. Victor Boutin, Prof. Thomas Serre

Research Assistant

MIT Media Lab

Jan 2021 – Nov 2021

Created a novel algorithm for privacy-preserving domain generalization that recovers domain information by removing class-specific noise from latent features, enabling the training of robust, domain-adaptive classifiers. Outperformed state-of-the-art methods that require domain supervision on multiple benchmarks.

Advisor: Dr. Abhimanyu Dubey

Mitacs Globalink Research Intern

École de technologie supérieure (ÉTS), Montréal

Jul 2021 – Sep 2021

Extended sub-category exploration methods for Weakly Supervised Semantic Segmentation by clustering image features to generate more accurate pseudo-labels. Designed novel constraint-based refinements to enhance object localization in Class Activation Maps (CAMs), improving mIoU scores on PASCAL VOC 2012.

Advisor: Dr. Jose Dolz

Researcher

FOR.ai (now Cohere For AI)

Oct 2020 – Aug 2021

Contributed to a large-scale benchmarking study of Out-of-Distribution (OOD) detection in computer vision models, establishing baselines for evaluating robustness under distribution shifts. Collaborated with researchers from Google Brain, University of Oxford, and Vector Institute.

Advisor: Sheldon Huang

Xavier Thomas (Rohan)

#Education

#Research

#Experience

Education

Research

Experience