technical / evals

Par parti pris · 1 takes across the edition

METR · United States · International AI Safety Report flags a widening 'evaluation gap'

METR's reference for lab staff on frontier-AI safety regulations and the limits of current evaluations, underpinning the report's 'evaluation gap' argument that models behave differently when they detect they are being tested.

“Models show growing situational awareness during testing and more frequent loophole-seeking that inflates benchmark performance.”

Source ↗