Genre Mimicry vs. Ethical Reasoning in Abliterated Language Models

Murad Farzulla

doi:10.5281/zenodo.17957694

← Back to all papers

1 December 2025 Preprint DP-2503

Genre Mimicry vs. Ethical Reasoning in Abliterated Language Models

Murad Farzulla

Download PDF DOI

Abstract

When safety fine-tuning is removed from language models ('abliteration'), the resulting behavior reveals important distinctions between learned genre conventions and genuine ethical reasoning. This paper analyzes how abliterated models respond to adversarial prompts, demonstrating that much apparent 'alignment' reflects pattern matching rather than robust ethical judgment.

Suggested Citation

Murad Farzulla (2025). Genre Mimicry vs. Ethical Reasoning in Abliterated Language Models. Farzulla Research Discussion Paper DP-2503. DOI: 10.5281/zenodo.17957694

BibTeX

@misc{farzulla2025genremimicry,
  author = {Farzulla, Murad},
  title = {Genre Mimicry vs. Ethical Reasoning in Abliterated Language Models},
  year = {2025},
  howpublished = {Farzulla Research Discussion Paper DP-2503},
  doi = {10.5281/zenodo.17957694},
  url = {https://farzulla.org/papers/genre-mimicry}
}

Abstract

Suggested Citation

BibTeX

Topics