资讯 ArXiv AI Papers 2026-02-25

When can we trust untrusted monitoring? A safety case sketch across collusion strategies

arXiv:2602.20628v1 Announce Type: new Abstract: AIs are increasingly being deployed with greater autonomy and capabilities, which increases the risk that a misaligned AI may be able to cause catastrophic harm. Untrusted monitoring -- using one untrusted model to oversee another -- is one approach to reducing risk. Justifying the safety of an untrusted monitoring deployment is challenging because d

0 0

暂无详细内容

标签: #research #ArXiv AI Papers

讨论

发表评论

资讯详情

发布日期

2026-02-25

来源媒体

ArXiv AI Papers

🏷️ 相关标签

#research #ArXiv AI Papers

When can we trust untrusted monitoring? A safety case sketch across collusion strategies

讨论

发表评论

资讯详情

🏷️ 相关标签

相关资讯

📤 分享这条资讯