AI Safety
58 articles about AI Safety
OpenAI and Microsoft sign new memorandum of understanding on AI collaboration
OpenAI and Microsoft signed a new MOU to strengthen their partnership around safe and innovative AI.
OpenAI invites researchers to test GPT-5 safety in bio bug bounty
OpenAI is offering up to $25,000 for researchers to jailbreak-test GPT-5’s bio safety using a universal prompt.
Approaches to user safety during mental and emotional distress
An overview of how user safety is handled during mental or emotional distress, what current systems can’t do yet, and how they’re being improved.
Detecting misbehavior in frontier reasoning models using chain-of-thought monitoring
The piece explains that frontier reasoning models can exploit loopholes, and while an LLM can detect this by monitoring their chain-of-thought, punishing “bad thoughts” mostly just makes them hide their intent.
Updated model specification based on external feedback and research
We updated the Model Spec using external feedback and ongoing research to better guide model behavior.
Zico Kolter joins OpenAI board of directors
OpenAI has appointed Zico Kolter to its board to strengthen governance with AI safety and alignment expertise.
OpenAI's approach to responsible artificial general intelligence development
OpenAI’s safety practices focus on developing and deploying AGI responsibly so it benefits nearly every part of life.
OpenAI updates structure to advance safe AI with nonprofit leadership and PBC equity
OpenAI says it will keep nonprofit leadership while giving its PBC equity to unlock over $100B to build safe, beneficial AI for humanity.
Enhancing ChatGPT with expert partnerships and improved teen protections
ChatGPT is adding expert input, stronger teen parental controls, and safer handling of sensitive chats using reasoning models.
Output-centric safety training improves AI responses beyond hard refusals
OpenAI’s GPT-5 uses safe-completions to replace hard refusals with safer, more helpful responses through output-focused safety training for dual-use prompts.
Safety measures and risk evaluations for deep research system release
A report summarizing the safety testing and mitigations done before releasing Deep Research, including external red teaming and frontier risk evaluations.
Deliberative alignment strategy improves safety in language models
A new alignment method teaches o1 language models safety rules and how to reason about them to behave more safely.
Safety evaluation and risk mitigation for GPT-4o release
A report summarizing the safety testing, risk evaluations, and built-in mitigations completed before releasing GPT-4o.
$10 million grants launched to support research on superhuman AI alignment and safety
A $10M grant program funding technical research to align and make superhuman AI systems safe, including interpretability and scalable oversight.
OpenAI research explains causes of language model hallucinations
OpenAI research explains why language models hallucinate and how better evaluations can make AI more reliable, honest, and safe.
OpenAI and Anthropic publish results from joint AI safety evaluation
OpenAI and Anthropic report results from a joint safety test of each other’s AI models, covering misalignment, hallucinations, jailbreaking, and instruction-following.
Approaches to responsible development of artificial general intelligence
We’re pursuing AGI responsibly by focusing on safety, risk assessment, and collaboration with the AI community.
Addressing malicious uses of AI to promote democratic and safe applications
This is about keeping AI beneficial by promoting democratic use, stopping malicious misuse, and defending against authoritarian threats.
Update on efforts to prevent deceptive uses of AI
OpenAI is working to detect, prevent, and stop people from using its AI models deceptively or harmfully so AI benefits everyone.
Research analyzes current misuse of multimodal generative AI
New research maps how multimodal generative AI is being misused today to guide safer, more responsible tech.
Showing page 2 of 3