The world's largest music data warehouse becomes a gold mine for artificial intelligence

Cát Tiên |

Spotify's huge music data warehouse is becoming a gold mine for artificial intelligence, raising concerns about copyright and data mining.

A group of hackers that has shocked the technology and music industry when announcing that they have collected and stored about 300 terabyte of data from Spotify, the world's largest music streaming platform.

This data store includes tens of millions of audio files, album cover photos, and huge amounts of metadata, currently published through Annas Archive, an open source search engine for underground libraries.

According to the published information, Annas Archive currently stores 86 million audio files and more than 256 million song super data streams, with a total capacity of about 300TB.

The music superdatenateness includes the artist's name, musician, producer, genre, duration, release date, and ISRC code, the international identification code for each recording.

With 186 million ISRC codes, the platform claims to own the world's largest public music super data database.

The group behind Anna's Archive said their goal is to build a comprehensive " Conservation store" of music, allowing anyone with enough storage space to copy.

According to the plan, in addition to the released metadata, 86 million music files, accounting for about 99.6% of total listenings on Spotify - will be announced through torrent files, arranged according to popularity level.

This move is especially noteworthy in the context of rapid development of artificial intelligence. AI companies are now heavily dependent on large-scale data to train models, from text, images to audio.

Such a huge music data warehouse could become an attractive resource for training music creation AI models, audio analysis or multi-modal, increasing the existing tension between the AI industry and copyright owners.

Spotify confirmed that it has detected and disabled accounts involved in illegal data copying, and deployed additional protections.

The company said the preliminary investigation showed that a third party had collected public metadata and used illegal measures to bypass the digital copyright management system (DRM), thereby partially accessing the audio file.

Annas Archive works as a search engine, helping users access content stored in other sources on the internet, and asserting that the platform itself does not directly store copyright infringing content.

Previously, the platform's data warehouse was mainly books, research papers and academic documents. The expansion to metadata and music marks a new step, and Annas Archive has become a regular target in requests to remove content from copyright owners.

The operator group Annas Archive believes that current music libraries focus too much on famous artists and high-quality records, making it difficult to store the entire history of recorded human music.

By prioritizing Spotify's comprehensiveness and using Spotify's popularity index, they declare they want to create a representative music list for all previously released records.

Although justified under the name of cultural conservation, this 300TB data warehouse still raises big questions about the line between storage, copyright infringement and data mining in the AI era, where the value of data is becoming increasingly sensitive and controversial.

Cát Tiên
RELATED NEWS

AI conquers investors' gold standard

|

The new AI models have passed all 3 levels of the CFA exam, which is considered the "golden standard" of global investors.

Increasing autonomy, creating momentum for Vietnam's artificial intelligence to reach international standards

|

According to VNPT leaders, the establishment of VNPT AI aims to ensure digital sovereignty, reduce dependence and create a foundation for Vietnamese artificial intelligence to spread internationally.

Vietnam's digital economic vision for the global semiconductor and artificial intelligence race

|

According to experts, Vietnam is entering the "digital decade", where data, semiconductor technology and artificial intelligence become the engine of growth.

Bac Me Hydropower Plant refuses to coordinate with other sectors to inspect 3 times

|

After National Highway 34 suffered a serious landslide, Tuyen Quang province repeatedly inspected and requested the Bac Me hydropower investor to work, but this unit did not cooperate.

Promoting the role of trade unions in digital transformation

|

On December 24, a solemn session of the 1st Congress of the Vietnam Science and Technology Trade Union, term 2025-2030, took place.

Lam Dong promptly controls fire in residential area at midnight

|

Lam Dong - After more than an hour, the authorities controlled and extinguished the fire that broke out at midnight at a farm machinery business in B'Lao ward.

Ukrainian army withdraws from key town in Donbass region

|

The Ukrainian army has withdrawn from the town of Siversk in the eastern part of the country, Kiev said on December 23.

AI conquers investors' gold standard

Cát Tiên |

The new AI models have passed all 3 levels of the CFA exam, which is considered the "golden standard" of global investors.

Increasing autonomy, creating momentum for Vietnam's artificial intelligence to reach international standards

Hạo Thiên |

According to VNPT leaders, the establishment of VNPT AI aims to ensure digital sovereignty, reduce dependence and create a foundation for Vietnamese artificial intelligence to spread internationally.

Vietnam's digital economic vision for the global semiconductor and artificial intelligence race

NGUYỄN ĐĂNG |

According to experts, Vietnam is entering the "digital decade", where data, semiconductor technology and artificial intelligence become the engine of growth.