VentureBeat July 24, 2024
Emilia David

OpenAI announced a new way to teach AI models to align with safety policies called Rules Based Rewards.

According to Lilian Weng, head of safety systems at OpenAI, Rules-Based Rewards (RBR) automate some model fine-tuning and cut down the time required to ensure a model does not give unintended results.

“Traditionally, we rely on reinforcement learning from human feedback as the default alignment training to train models, and it works,” Weng said in an interview. “But in practice, the challenge we’re facing is that we spend a lot of time discussing the nuances of the policy, and by the end, the policy may have already evolved.”

Weng referred to reinforcement learning from human feedback, which asks humans to...

Today's Sponsors

LEK
ZeOmega

Today's Sponsor

LEK

 
Topics: AI (Artificial Intelligence), Healthcare System, Safety, Technology
Samsung’s C-Lab to Showcase AI and Health Projects at CES
Foxconn Invests in AI Data Center Firm Zettabyte to Boost Sustainable Computing
DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch
Why One Startup CEO Is Excited About the White House’s New AI Czar Role
AI-Powered Smartphones Could Offset a Data Center Downturn

Share This Article