Empirical Inference
Conference Paper
2024
What Makes Safety Fine-tuning Methods Safe? A Mechanistic Study
| Author(s): | Jain, S. and Lubana, E. S. and Oksuz, K. and Joy, T. and Torr, P. H. S. and Sanyal, A. and Dokania, P. K. |
| Book Title: | ICML 2024 Workshop on Mechanistic Interpretability (Spotlight) |
| Year: | 2024 |
| Month: | July |
| BibTeX Type: | Conference Paper (conference) |
| Event Place: | Vienna, Austria |
| State: | Published |
| URL: | https://openreview.net/forum?id=BS2CbUkJpy |
| Electronic Archiving: | grant_archive |
BibTeX
@conference{Jainetal24b,
title = {What Makes Safety Fine-tuning Methods Safe? A Mechanistic Study},
booktitle = {ICML 2024 Workshop on Mechanistic Interpretability (Spotlight)},
month = jul,
year = {2024},
author = {Jain, S. and Lubana, E. S. and Oksuz, K. and Joy, T. and Torr, P. H. S. and Sanyal, A. and Dokania, P. K.},
url = {https://openreview.net/forum?id=BS2CbUkJpy},
month_numeric = {7}
}