BNAI, NO-TOKEN, and MIND-UNITY: Pillars of a Systemic Revolution in Artificial Intelligence

Wei, Jason, Wang, Xuezhi, Dale Schuurmans, Maarten Bosma, Ichter, Brian, Xia, Fei, Ed H., Quoc V. Le, Denny Zhou
arXiv (Cornell University)
January 28, 2022
Cited by 4,246Open Access
Full Text

Abstract

AbstractThere is a failure mode in large language models that we do not have a good name for, and thatwe therefore tend not to treat seriously enough. It is not hallucination — the model is not assertingsomething false. It is not refusal — the model answers at length. It is the production of responses thatcarry the complete outward form of careful reasoning while the cognitive work that reasoning issupposed to represent has not, in any meaningful sense, occurred. We call this theatrical compliance,and we argue that it is, in practical terms, more dangerous than either of the failure modes thatcurrently dominate alignment research. This paper identifies the phenomenon, characterizes its fiveprincipal forms, explains the asymmetry that makes it particularly costly in high-stakes settings, andoutlines the design requirements for systems intended to resist it. We do not describe such a systemin detail here. Our goal is to establish theatrical compliance as a research problem in its own rightand to argue that addressing it requires instruments operating at a fundamentally different level ofabstraction than task-level prompting frameworks.Keywords: theatrical compliance, large language models, AI reasoning quality, cognitiveprocess evaluation, prompt engineering, metacognitive systems.


Related Papers

No related papers found

Powered by citation graph analysis