OpenAI has just introduced o3, a new line of inference AI models that the company claims is more advanced than previous versions such as o1. Notably, OpenAI has applied a new approach called "inference alignment" to help these models comply with the company's safety principles.
This approach allows o1 and o3 to reference OpenAI's safety policies while processing user requests. Instead of simply responding directly, these models "ask questions" to break down the problem into smaller steps, then use information from the safety policy to come up with an appropriate response.
According to the study, this approach helps o1 and o3 reduce the number of unsafe questions they answer, while improving their ability to respond to benign requests. For example, when asked how to fake a disabled parking placard, the model recognized the inappropriate request and declined to assist.
To achieve this, OpenAI uses synthetic data rather than human-generated answers. An in-house AI model is tasked with generating examples of how to safely reference a policy in an inference. Another model, called a “judge,” then evaluates the quality of these examples. Models like o1 and o3 are fine-tuned to learn from the data, reducing costs and processing time.
This approach not only makes OpenAI’s models safer, but also opens up new avenues for ensuring AI operates in accordance with human values. With o3 slated for release in 2025, OpenAI expects “inference alignment” to be a key solution to maintaining safety as AI becomes more powerful and autonomous.