MIT Technology Review November 21, 2024
The company really wants you to know that it’s trying to make its models safer.
OpenAI is once again lifting the lid (just a crack) on its safety-testing processes. Last month the company shared the results of an investigation that looked at how often ChatGPT produced a harmful gender or racial stereotype based on a user’s name. Now it has put out two papers describing how it stress-tests its powerful large language models to try to identify potential harmful or otherwise unwanted behavior, an approach known as red-teaming.
Large language models are now being used by millions of people for many different things. But as OpenAI itself points out, these models are known to produce racist, misogynistic and hateful...