Language Model (LM)

13 articles about Language Model (LM)

Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior

Gemma Scope 2 expands interpretability tools for language models

Google DeepMind

News

AI & Machine Learning

Gemma Scope 2 releases open interpretability tools for the full Gemma 3 model family to help the AI safety community better understand complex language model behavior.

SimpleQA: a benchmark for evaluating factual question answering

OpenAI

Article

AI & Machine Learning

SimpleQA is a factuality benchmark that tests how well language models answer short, fact-seeking questions.

ChatGPT introduces initial support for plugins to enhance functionality

OpenAI

News

AI Tools & Prompts

ChatGPT now supports safety-focused plugins that let it fetch current info, perform calculations, and use third-party services.

Training language models for summarization using human feedback

OpenAI

Insight

AI & Machine Learning

Using human feedback and reinforcement learning, we trained language models to produce better summaries.

Unsupervised system learns sentiment representation from Amazon reviews

OpenAI

Insight

AI & Machine Learning

An unsupervised model learns strong sentiment understanding from Amazon reviews by only training to predict the next character.

Understanding and preventing misalignment generalization in language models

OpenAI

Insight

AI & Machine Learning

This work shows that training language models on wrong answers can spread misalignment, pinpoints an internal feature causing it, and demonstrates it can be reversed with minimal fine-tuning.

Gemma Scope: helping the safety community shed light on the inner workings of language models

Gemma Scope: an open suite of sparse autoencoders for language model interpretability

Google DeepMind

News

AI & Machine Learning

Gemma Scope is an open suite of sparse autoencoders that helps the safety community interpret how language models work.

Training language models to better follow instructions and improve safety

OpenAI

Article

AI & Machine Learning

InstructGPT uses human-in-the-loop alignment training to better follow instructions than GPT-3 while being more truthful and less toxic, and it’s now the default model on the API.

Fine-tuning GPT-2 using human feedback for improved task performance

OpenAI

CaseStudy

AI & Machine Learning

Researchers fine-tuned the 774M-parameter GPT-2 with human feedback across tasks, finding it can match labeler preferences (sometimes by copying in summaries) and using 60k labels for summarization vs 5k for simpler style continuations to advance safer human-facing AI.

Deliberative alignment strategy improves safety in language models

OpenAI

Article

AI & Machine Learning

A new alignment method teaches o1 language models safety rules and how to reason about them to behave more safely.

Prover-verifier games enhance legibility of language model outputs

OpenAI

Insight

AI & Machine Learning

Prover-verifier games make language model outputs clearer and easier to check, improving trust for humans and machines.

Improving language model behavior through fine-tuning on a curated dataset

OpenAI

Report

AI & Machine Learning

Research shows language models can better follow specific behavioral values by fine-tuning on a small, carefully curated dataset.

Six-month update on the release and research of the 774M parameter GPT-2 model

OpenAI

News

AI & Machine Learning

OpenAI is releasing the 774M-parameter GPT-2 after staged earlier releases, alongside a model-sharing legal agreement and a report on coordinating with the AI community about misuse and publication norms.

Showing page 1 of 1