About
I am a Ph.D. candidate at SNU MILAB focusing on LLM evaluation—especially bias and robustness. I study how LLM-as-a-Judge behaves under framing and uncertainty cues, and I design methods and protocols that make judgments more reliable and interpretable.
Beyond evaluation, I work on automatic data generation for benchmarking and training (e.g., dialogue and evaluation datasets), and I am actively engaged in Korean NLP, including building Korean-specific datasets and metrics and analyzing model behavior in Korean settings.
Publications
Selected (recent)
We formalize instructional distraction—inputs that look like instructions— and show that even advanced LLMs frequently follow the distracting input rather than the user’s true instruction.
We systematically characterize visual biases in LVLM-based judgment and show how they distort alignment evaluations.
We demonstrate that persuasive perturbations can shift LLM-judge decisions and discuss practical defenses.
We measure how uncertainty expressions influence LLM-based evaluation outcomes.
View full list
- Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation — Y. Hwang*, D. Lee*, K. Min, T. Kang, Y. Kim, K. Jung. EMNLP 2025.
- Can You Trick the Grader? Adversarial Persuasion of LLM Judges — Y. Hwang, D. Lee, T. Kang, Y. Kim, K. Jung. Findings of EMNLP 2025.
- LLMs can be easily Confused by Instructional Distractions — Y. Hwang, Y. Kim, J. Koo, T. Kang, H. Bae, K. Jung. ACL 2025.
- Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the Effect of Epistemic Markers on LLM-based Evaluation — D. Lee*, Y. Hwang*, Y. Kim, J. Park, K. Jung. NAACL 2025.
- SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models — J. Koo, Y. Hwang, Y. Kim, T. Kang, H. Bae, K. Jung. Findings of NAACL 2025.
- MP2D: An Automated Topic Shift Dialogue Generation Framework Leveraging Knowledge Graphs — Y. Hwang, Y. Kim, Y. Jang, J. Bang, H. Bae, K. Jung. EMNLP 2024.
- Kosmic: Korean Text Similarity Metric Reflecting Honorific Distinctions — Y. Hwang, Y. Kim, J. Bang, H. Bae, H. Lee, K. Jung. COLING 2024.
- Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources — Y. Hwang*, Y. Kim*, H. Bae, H. Lee, J. Bang, K. Jung. EMNLP 2023.
- PR-MCS: Perturbation Robust Metric for Multilingual Image Captioning — Y. Kim, Y. Hwang, H. Yun, S. Yoon, T. Bui, K. Jung. Findings of EMNLP 2023.
- Injecting Comparison Skills in Task-Oriented Dialogue Systems for Database Search Results Disambiguation — Y. Kim*, Y. Hwang*, J. Shin, H. Bae, K. Jung. Findings of ACL 2023.
- Improving Cross-Modal Attention via Object Detection — Y. Kim, Y. Hwang, S. Yoon, H. Yun, K. Jung. NeurIPS Workshops 2022.
- Flowlogue: A Novel Framework for Synthetic Dialogue Generation with Structured Flow from Text Passages — Y. Kim, Y. Hwang, H. Bae, T. Kang, K. Jung. IEEE Access, 2024.
- A Study on the Evaluation Consistency of Korean LLM-as-a-Judge Models in Mathematical Problems — Y. Hwang, D. Lee, J. Moon, K. Min, K. Jung. KCC 2025.
- Analysis of Stylistic Bias in Korean LLM-as-a-Judge — Y. Hwang, D. Lee, J. Moon, K. Min, K. Jung. KCC 2025.
- Evaluating the Robustness of LLM-Judges to Epistemic Markers in Korean — D. Lee, Y. Hwang, J. Moon, K. Min, K. Jung. KCC 2025.
- TSDG: A Framework for Generating Natural Topic-Shift Dialogue Data — Y. Hwang, D. Lee, Y. Kim, K. Jung. KSC 2024.
- Error-Correction Chain-of-Thought (ECOCoT): Enhancing Accuracy in Mathematical Reasoning through Error-Correction Framework — Y. Hwang, Y. Kim, D. Lee, T. Kang, H. Bae, K. Jung. KSC 2024.
- Reference-Centric QA Evaluation Leveraging Contrastive Decoding — D. Lee, K. Min, Y. Hwang, J. Park, K. Jung. KSC 2024.
- KLIPScore: A Highly Human-Correlated Korean Image Captioning Metric (Oral) — Y. Kim, Y. Hwang, Y. Chae, S. Yoon, K. Jung. KCC 2023.
- Thinking Fast and Slow in Multimodal Emotion Recognition Task — Y. Hwang, Y. Kim, Y. Chae, K. Jung. KCC 2023.
- Improving Cross-Modal Attention via Object Detection — Y. Kim, H. Yun, Y. Hwang, K. Jung. KCC 2022.
- COVID-19 Severity Prediction using Deep Transfer Learning — Y. Hwang, Y. Kim, K. Jung. KCC 2022.
* equal contribution.