OpenAI trains AI to think up safety policies on its own

Anh Vũ |

OpenAI has adopted a new approach called “inference alignment” to help AI models adhere to the company’s safety guidelines.

OpenAI has just introduced o3, a new line of inference AI models that the company claims is more advanced than previous versions such as o1. Notably, OpenAI has applied a new approach called "inference alignment" to help these models comply with the company's safety principles.

This approach allows o1 and o3 to reference OpenAI's safety policies while processing user requests. Instead of simply responding directly, these models "ask questions" to break down the problem into smaller steps, then use information from the safety policy to come up with an appropriate response.

According to the study, this approach helps o1 and o3 reduce the number of unsafe questions they answer, while improving their ability to respond to benign requests. For example, when asked how to fake a disabled parking placard, the model recognized the inappropriate request and declined to assist.

To achieve this, OpenAI uses synthetic data rather than human-generated answers. An in-house AI model is tasked with generating examples of how to safely reference a policy in an inference. Another model, called a “judge,” then evaluates the quality of these examples. Models like o1 and o3 are fine-tuned to learn from the data, reducing costs and processing time.

This approach not only makes OpenAI’s models safer, but also opens up new avenues for ensuring AI operates in accordance with human values. With o3 slated for release in 2025, OpenAI expects “inference alignment” to be a key solution to maintaining safety as AI becomes more powerful and autonomous.

Anh Vũ
RELATED NEWS

AirPods Pro 3 may come with heart rate monitoring feature

|

Apple is reportedly working on a number of new health features for future AirPods models, including heart rate monitoring and temperature sensors.

Smart rice and fruit tree cultivation thanks to AI and IoT applications

|

Mekong Delta - Applying modern technology to the soil environmental monitoring system helps agricultural production become more efficient and smarter.

Google launches its own AI feature for iPhone users

|

Google continues to expand the integration of artificial intelligence (AI) into its products, this time focusing on iPhone users.

After the severe flood, Lam Dong farmers struggle to restore dragon fruit gardens

|

Lam Dong - After the floodwaters receded, many farmers began their journey to restore their dragon fruit gardens.

Foreigners can send dialogue questions to President Putin

|

Foreigners can also send questions to the year-end question-and-answer session with Russian President Vladimir Putin.

Vietnam sports delegation departs for the 33rd SEA Games

|

The Vietnamese sports delegation has officially departed for Thailand to participate in the 33rd SEA Games, with the goal of winning 90-110 gold medals.

Union Newsletter: Enterprises in Hanoi face difficulties in finding workers at the end of the year

|

The year-end labor market is bustling, businesses launch many policies to attract human resources, Social Insurance books become scams...

AirPods Pro 3 may come with heart rate monitoring feature

TRÍ MINH (THEO macrumors) |

Apple is reportedly working on a number of new health features for future AirPods models, including heart rate monitoring and temperature sensors.

Smart rice and fruit tree cultivation thanks to AI and IoT applications

PHƯƠNG LINH |

Mekong Delta - Applying modern technology to the soil environmental monitoring system helps agricultural production become more efficient and smarter.

Google launches its own AI feature for iPhone users

Anh Vũ |

Google continues to expand the integration of artificial intelligence (AI) into its products, this time focusing on iPhone users.