Taeyoun Kim

I graduated with an MS in Machine Learning at Carnegie Mellon University, funded by the Kwanjeong Educational Foundation. I am advised by Aditi Raghunathan and also work with Aviral Kumar and Maarten Sap.

I am interested in AI safety, adverarial robustness, and human alignment of language models. I aim to understand how to impose better safety specifications in language models and agents that generalize to broader human values. I am also interested in understanding how inference-time compute can be leveraged for better alignment.

Before CMU, I earned my Bachelor's degree in Electrical & Electronic Engineering from Yonsei University. My undergraduate studies was funded through the National Science and Technology Scholarship of South Korea.

CV / Google Scholar / GitHub / LinkedIn / Email

Publications

My recent work is on understanding how reasoning helps AI safety improve through reinforcement learning.

	Mitigating Social Bias in RAG Taeyoun Kim, Jacob Springer, Aditi Raghunathan, Maarten Sap Preprint, 2025 arxiv / code / We decompose a RAG system into three components: the LLM, the embedder, and the corpus. Each component can introduce bias its own bias in the RAG system accumulating complex bias. We find that it is possible to mitigate bias just by reverse biasing the embedder. Furthermore, we empirically find a linear relationship between the embedder’s bias and RAG system’s bias which has varying sensitivity for different LLMs. We investigate the three different methods of fine-tuning, projecting, and stochastic ranking to mitigate bias and fine that fine-tuning maintains utility while reducing bias. We also find that a reverse-biased embedder makes the entire RAG system robust to variations in corpus bias.
	Testing the Limits of Jailbreaking Defenses with the Purple Problem Taeyoun Kim, Suhas Kotha, Aditi Raghunathan NeurIPS Safe GenAI, 2024 arxiv / code / The Purple Problem: Can jailbreaking defenses succeed in defending against the simplest definition of preventing the word purple? All defenses we consider fail to enforce the Purple Problem. Moreover, adaptive attacks and increased compute reveal that existing defenses are weaker than reported.
	Predicting the Performance of Foundation Models via Agreement-on-the-Line Rahul Saxena, Taeyoun Kim, Aman Mehra, Christina Baek, Zico Kolter, Aditi Raghunathan NeurIPS*, 2024 arxiv / We apply Agreement-on-the-Line to predicting the OOD performance of foundation models. Interestingly, we fine that randomly initializing the linear head for fine-tuning leads to the highest diversity for an ensemble of models to exhibit AGL. This even applies to linear probing.
	The Application of Local Sub-voxel Shifting on Multi-echo GRE-based Myelin Water Imaging Taeyoun Kim, Muyul Park, Jaeuk Yi, Dong-Hyun Kim ICMRI (Oral), 2021 We apply local sub-voxel shifting to reduce Gibbs noise in multi-echo GRE-based Myelin Water Imaging. To do this we create a new exponential saddle filter. This removes Gibbs noise while preserving higher image quality without blurring compared to Tukey filtering.

Projects

Ongoing and past projects

Generalizing Point-and-Click Behavior to Vision

Ongoing

We model human point-and-click behavior with Soft-Actor Critic to understand human motor control within the BUMP process. We generalize point-and-click to perform any visual task as long as a target region and avoidance region is provided.

Impact of Different Joints on Creating a 3D Hand Mesh

Taeyoun Kim*, Hoseok Tong*, Jinoh Lee*
Undergraduate Thesis

We use a PointNet to reconstruct hand meshes from 26 hand joints extracted from Microsoft Hololens 2. We study the impact of the fingertips and metacarpals to show that fingertips are crucial for optimization of the network.

Taeyoun Kim

Publications

Mitigating Social Bias in RAG

Testing the Limits of Jailbreaking Defenses with the Purple Problem

Predicting the Performance of Foundation Models via Agreement-on-the-Line

The Application of Local Sub-voxel Shifting on Multi-echo GRE-based Myelin Water Imaging

Projects

Generalizing Point-and-Click Behavior to Vision

Impact of Different Joints on Creating a 3D Hand Mesh