AI Safety

58 articles about AI Safety

Approach to catastrophic risk preparedness for advanced AI systems

OpenAI

Insight

Future of Work & AI Automation

We’re creating a Preparedness team and challenge to better anticipate and reduce catastrophic risks from highly capable AI systems.

Governance considerations for future superintelligent AI systems

OpenAI

Insight

Tech Policy & Startups Regulation

It urges starting now to plan how to govern superintelligent AI systems far more capable than AGI.

AI-written critiques improve human detection of summary flaws

OpenAI

Analysis

AI & Machine Learning

AI critique-writing models help people spot errors in summaries, and bigger models critique better than they summarize, aiding human oversight of AI.

Mechanisms to improve verifiability in AI system development

OpenAI

Report

Tech Policy & Startups Regulation

A 58-author, 30-organization report outlines 10 tools to verify claims about AI systems, helping developers prove safety and helping others assess AI development.

Learning complex goals through iterated amplification for AI safety

OpenAI

Article

AI & Machine Learning

It introduces iterated amplification, an early-stage AI safety method for teaching complex goals by breaking tasks into simpler parts instead of using rewards or labeled data.

Algorithm infers human preferences to improve AI safety

OpenAI

Insight

AI & Machine Learning

An algorithm learns what humans want by comparing which of two behaviors people prefer, reducing reliance on hand-written AI goals for safer systems.

OpenAI invites experts to join red teaming network for model safety

OpenAI

News

Cybersecurity

OpenAI is inviting domain experts to join its Red Teaming Network to help improve the safety of its AI models.

OpenAI launches bug bounty program to enhance AI security

OpenAI

News

Cybersecurity

OpenAI launched a bug bounty program to enlist the public’s help finding security issues and keeping its AI safe and trustworthy.

OpenAI’s API now available without waitlist following safety improvements

OpenAI

News

Tech News & Trends

OpenAI has removed the waitlist and made its API broadly available, enabled by improved safety measures.

Fine-tuning GPT-2 using human feedback for improved task performance

OpenAI

CaseStudy

AI & Machine Learning

Researchers fine-tuned the 774M-parameter GPT-2 with human feedback across tasks, finding it can match labeler preferences (sometimes by copying in summaries) and using 60k labels for summarization vs 5k for simpler style continuations to advance safer human-facing AI.

AI safety technique using agent debates judged by humans

OpenAI

Article

AI Agents & Autonomous Workflows

An AI safety method that trains AI agents to debate each other while a human judge decides the winner.

Key research problems in AI safety for modern machine learning systems

OpenAI

Report

AI & Machine Learning

A paper by Google Brain with Berkeley and Stanford co-authors outlines concrete research problems to ensure modern AI systems behave as intended.

OpenAI and leading labs advance AI governance through voluntary safety commitments

OpenAI

News

Tech Policy & Startups Regulation

OpenAI and other top AI labs are advancing AI governance by making voluntary commitments to improve AI safety, security, and trust.

Reducing bias and enhancing safety in DALL·E 2 image generation

OpenAI

News

AI & Machine Learning

DALL·E 2 is adding a new method to reduce bias and improve safety by generating more diverse, representative images of people.

Improving language model behavior through fine-tuning on a curated dataset

OpenAI

Report

AI & Machine Learning

Research shows language models can better follow specific behavioral values by fine-tuning on a small, carefully curated dataset.

AI safety research requires collaboration with social scientists

OpenAI

Article

AI & Machine Learning

A paper argues AI safety needs social scientists to help align advanced AI with real human values and behavior, and OpenAI plans to hire them to collaborate full time.

Training AI with occasional human feedback using RL-Teacher

OpenAI

Article

AI & Machine Learning

RL-Teacher is an open-source tool that trains reinforcement learning agents using occasional human feedback instead of hand-crafted reward functions, especially when rewards are hard to define.

OpenAI's technical goals for building safe and accessible AI

OpenAI

Article

AI & Machine Learning

OpenAI aims to build safe AI and share its benefits as widely and fairly as possible.

Showing page 3 of 3