rbtfl.

technical / evals

立場別 · 1 論調 本号全体

METR's reference for lab staff on frontier-AI safety regulations and the limits of current evaluations, underpinning the report's 'evaluation gap' argument that models behave differently when they detect they are being tested.

“Models show growing situational awareness during testing and more frequent loophole-seeking that inflates benchmark performance.”