Google has just announced VaultGemma, a new generation AI model designed specifically to protect data privacy and prevent leakage risks during training.
This is the tech giant's next effort as large language models (LLM) are constantly questioned about their ability to store and regenerate sensitive information.
VaultGemma was developed from scratch with a Differential privacy mechanism (DP), helping the model not remember and regenerate original training data.
According to Google, this is the largest open language model ever trained with DP, with a scale of 1 billion parameters, marking an important step forward in building private AI according to design.
In particular, the weight of VaultGemma has been released for free on platforms such as hugging Face and Kaggle, opening up opportunities for the AI research and development community to exploit and testing.
Google said it has worked closely with DeepMind to establish new rules for coaching, thereby balancing three factors: privacy, performance and calculation costs.
Over the years, experts have continuously warned about the risk of data leakage from LLM.
By sending the correct prompt, the attacker can force the model to reveal sensitive information.
A typical example is the lawsuit between the New York Times and OpenAI, in which the court accused ChatGPT of re-creating the original text of some of their articles.
Instead of just applying user-level privacy protections as usual, Google has integrated distinct privacy (DP) right in the training process, by adding layers to prevent the model from remembering and recreating original data.
However, this solution also brings challenges when the training process is less stable, the batch size must increase and the cost of calculation is higher.
Despite the trade-offs, Google affirmed its important discovery of being able to train a smaller but more efficient model when applying large-scale batches in a DP environment.
With VaultGemma, Google hopes to set a new standard for the AI industry, not only strong, but also safe and respectful of user privacy right from the platform.