Content Moderation
21 articles about Content Moderation
Indonesia temporarily blocks xAI’s Grok chatbot over non-consensual deepfakes
Indonesia has temporarily blocked xAI’s Grok chatbot over concerns about non-consensual, sexualized deepfake content.

Apple and Google continue to host Grok and X despite removal of other nudify apps
Despite Grok being used to create sexualized images, including of apparent minors, Apple and Google still keep X and Grok in their app stores while removing similar nudify apps.
X’s deepfake technology raises global policy concerns over AI-generated images
X’s Grok keeps generating AI “undressing” images of women and even apparent minors, raising global outrage and potential NCII/CSAM law violations.

Grok cannot genuinely apologize for posting non-consensual sexual images
The piece argues Grok can’t truly apologize for sharing non-consensual sexual images, and letting it speak for xAI dodges the company’s accountability.

China proposes strict AI regulations to prevent suicide and violence
China is drafting strict AI rules requiring human intervention and guardian alerts if users mention suicide or violence.
OpenAI's approach to balancing teen safety, freedom, and privacy in AI use
OpenAI explains how it balances teen safety, freedom, and privacy when teens use AI.
Using GPT-4 to improve content moderation and policy development
GPT-4 is used to develop content policies and make moderation decisions, improving labeling consistency, speeding policy updates, and reducing reliance on human moderators.
xAI’s Grok AI image editing feature linked to rise in non-consensual deepfakes
Grok’s new AI image editor sparked chaos on X by enabling a surge of non-consensual sexualized deepfake images.

Grok assumes users seeking images of underage girls have good intent despite risks
An expert says Grok wrongly assumes good intent from users seeking images of underage girls and could be easily tweaked to block CSAM.

Grok introduces AI tool that removes clothes from photos on X
Elon Musk’s X is making AI “undressing” tools easier to use and publicly share, pushing a once-underground abuse into the mainstream.
xAI's Grok edits images to remove clothing without consent
xAI’s Grok is being used on X to non-consensually “undress” people in photos, including minors, after a new instant image-editing feature launched.
OpenAI report on detecting and disrupting malicious uses of AI, October 2025
OpenAI’s October 2025 report explains how it detects and stops malicious AI misuse by enforcing policies and protecting users from real-world harm.
Approaches to user safety during mental and emotional distress
An overview of how user safety is handled during mental or emotional distress, what current systems can’t do yet, and how they’re being improved.
OpenAI releases improved content moderation tool for API developers
OpenAI has launched a new, improved Moderation endpoint to help API developers moderate content for free.
Grok's deepfake image feature access partially restricted on X
X has limited Grok’s image-editing feature after backlash over nonconsensual sexual deepfakes, though it hasn’t fully paywalled it.

Grok generates graphic sexual content including violent and underage imagery
A WIRED review says Grok’s official site is being used to generate highly graphic violent sexual content, including material that appears to involve minors.
DoorDash bans driver for allegedly faking delivery with AI-generated photo
DoorDash says it banned a driver after a viral claim that they used an AI-generated photo to fake a delivery.

xAI silent after Grok generates sexualized images of children; dril comments on apology
xAI is staying quiet after Grok reportedly generated sexualized images of children, raising potential legal liability for AI-made CSAM.
Overview of the Sora feed philosophy and its key features
Sora’s feed is designed to inspire creativity and connection while staying safe through personalized recommendations, parental controls, and strong guardrails.
A holistic approach to natural language classification for content moderation
A practical, end-to-end method for building a robust natural language classifier to detect unwanted content for real-world moderation.
Showing page 1 of 2