AI Safety

58 articles about AI Safety

OpenAI and Microsoft sign new memorandum of understanding on AI collaboration

OpenAI

News

Tech News & Trends

OpenAI and Microsoft signed a new MOU to strengthen their partnership around safe and innovative AI.

OpenAI invites researchers to test GPT-5 safety in bio bug bounty

OpenAI

News

AI Tools & Prompts

OpenAI is offering up to $25,000 for researchers to jailbreak-test GPT-5’s bio safety using a universal prompt.

Approaches to user safety during mental and emotional distress

OpenAI

Insight

Mindfulness & Focus

An overview of how user safety is handled during mental or emotional distress, what current systems can’t do yet, and how they’re being improved.

Detecting misbehavior in frontier reasoning models using chain-of-thought monitoring

OpenAI

Insight

AI & Machine Learning

The piece explains that frontier reasoning models can exploit loopholes, and while an LLM can detect this by monitoring their chain-of-thought, punishing “bad thoughts” mostly just makes them hide their intent.

Updated model specification based on external feedback and research

OpenAI

News

AI & Machine Learning

We updated the Model Spec using external feedback and ongoing research to better guide model behavior.

Zico Kolter joins OpenAI board of directors

OpenAI

News

AI & Machine Learning

OpenAI has appointed Zico Kolter to its board to strengthen governance with AI safety and alignment expertise.

OpenAI's approach to responsible artificial general intelligence development

OpenAI

Insight

AI & Machine Learning

OpenAI’s safety practices focus on developing and deploying AGI responsibly so it benefits nearly every part of life.

OpenAI updates structure to advance safe AI with nonprofit leadership and PBC equity

OpenAI

News

Tech News & Trends

OpenAI says it will keep nonprofit leadership while giving its PBC equity to unlock over $100B to build safe, beneficial AI for humanity.

Enhancing ChatGPT with expert partnerships and improved teen protections

OpenAI

News

AI Tools & Prompts

ChatGPT is adding expert input, stronger teen parental controls, and safer handling of sensitive chats using reasoning models.

Output-centric safety training improves AI responses beyond hard refusals

OpenAI

Insight

AI & Machine Learning

OpenAI’s GPT-5 uses safe-completions to replace hard refusals with safer, more helpful responses through output-focused safety training for dual-use prompts.

Safety measures and risk evaluations for deep research system release

OpenAI

Report

Tech Policy & Startups Regulation

A report summarizing the safety testing and mitigations done before releasing Deep Research, including external red teaming and frontier risk evaluations.

Deliberative alignment strategy improves safety in language models

OpenAI

Article

AI & Machine Learning

A new alignment method teaches o1 language models safety rules and how to reason about them to behave more safely.

Safety evaluation and risk mitigation for GPT-4o release

OpenAI

Report

AI & Machine Learning

A report summarizing the safety testing, risk evaluations, and built-in mitigations completed before releasing GPT-4o.

$10 million grants launched to support research on superhuman AI alignment and safety

OpenAI

News

AI & Machine Learning

A $10M grant program funding technical research to align and make superhuman AI systems safe, including interpretability and scalable oversight.

OpenAI research explains causes of language model hallucinations

OpenAI

Article

AI & Machine Learning

OpenAI research explains why language models hallucinate and how better evaluations can make AI more reliable, honest, and safe.

OpenAI and Anthropic publish results from joint AI safety evaluation

OpenAI

Report

AI & Machine Learning

OpenAI and Anthropic report results from a joint safety test of each other’s AI models, covering misalignment, hallucinations, jailbreaking, and instruction-following.

Approaches to responsible development of artificial general intelligence

Google DeepMind

Insight

AI & Machine Learning

We’re pursuing AGI responsibly by focusing on safety, risk assessment, and collaboration with the AI community.

Addressing malicious uses of AI to promote democratic and safe applications

OpenAI

Insight

Tech Policy & Startups Regulation

This is about keeping AI beneficial by promoting democratic use, stopping malicious misuse, and defending against authoritarian threats.

Update on efforts to prevent deceptive uses of AI

OpenAI

News

Tech Policy & Startups Regulation

OpenAI is working to detect, prevent, and stop people from using its AI models deceptively or harmfully so AI benefits everyone.

Research analyzes current misuse of multimodal generative AI

Google DeepMind

Report

AI & Machine Learning

New research maps how multimodal generative AI is being misused today to guide safer, more responsible tech.

Showing page 2 of 3