AI Safety
58 articles about AI Safety
Approach to catastrophic risk preparedness for advanced AI systems
We’re creating a Preparedness team and challenge to better anticipate and reduce catastrophic risks from highly capable AI systems.
Governance considerations for future superintelligent AI systems
It urges starting now to plan how to govern superintelligent AI systems far more capable than AGI.
AI-written critiques improve human detection of summary flaws
AI critique-writing models help people spot errors in summaries, and bigger models critique better than they summarize, aiding human oversight of AI.
Mechanisms to improve verifiability in AI system development
A 58-author, 30-organization report outlines 10 tools to verify claims about AI systems, helping developers prove safety and helping others assess AI development.
Learning complex goals through iterated amplification for AI safety
It introduces iterated amplification, an early-stage AI safety method for teaching complex goals by breaking tasks into simpler parts instead of using rewards or labeled data.
Algorithm infers human preferences to improve AI safety
An algorithm learns what humans want by comparing which of two behaviors people prefer, reducing reliance on hand-written AI goals for safer systems.
OpenAI invites experts to join red teaming network for model safety
OpenAI is inviting domain experts to join its Red Teaming Network to help improve the safety of its AI models.
OpenAI launches bug bounty program to enhance AI security
OpenAI launched a bug bounty program to enlist the public’s help finding security issues and keeping its AI safe and trustworthy.
OpenAI’s API now available without waitlist following safety improvements
OpenAI has removed the waitlist and made its API broadly available, enabled by improved safety measures.
Fine-tuning GPT-2 using human feedback for improved task performance
Researchers fine-tuned the 774M-parameter GPT-2 with human feedback across tasks, finding it can match labeler preferences (sometimes by copying in summaries) and using 60k labels for summarization vs 5k for simpler style continuations to advance safer human-facing AI.
AI safety technique using agent debates judged by humans
An AI safety method that trains AI agents to debate each other while a human judge decides the winner.
Key research problems in AI safety for modern machine learning systems
A paper by Google Brain with Berkeley and Stanford co-authors outlines concrete research problems to ensure modern AI systems behave as intended.
OpenAI and leading labs advance AI governance through voluntary safety commitments
OpenAI and other top AI labs are advancing AI governance by making voluntary commitments to improve AI safety, security, and trust.
Reducing bias and enhancing safety in DALL·E 2 image generation
DALL·E 2 is adding a new method to reduce bias and improve safety by generating more diverse, representative images of people.
Improving language model behavior through fine-tuning on a curated dataset
Research shows language models can better follow specific behavioral values by fine-tuning on a small, carefully curated dataset.
AI safety research requires collaboration with social scientists
A paper argues AI safety needs social scientists to help align advanced AI with real human values and behavior, and OpenAI plans to hire them to collaborate full time.
Training AI with occasional human feedback using RL-Teacher
RL-Teacher is an open-source tool that trains reinforcement learning agents using occasional human feedback instead of hand-crafted reward functions, especially when rewards are hard to define.
OpenAI's technical goals for building safe and accessible AI
OpenAI aims to build safe AI and share its benefits as widely and fairly as possible.
Showing page 3 of 3