The confrontation between content platforms and AI companies continues to heat up when Reddit recently announced that it will block Wayback machine, a famous web storage tool of Internet Archive, which scans and stores most of the posts, comments and public information on the platform.
The reason, given by Reddit, is that they have evidence that some AI companies are taking advantage of Wayback machine to collect data, avoid licensing fees and exploit user information.
This decision means that Wayback machine will not be able to store posts, comments or profile information from Reddit, except for the content appearing on the Reddit.com homepage.
The move comes as the social media platform tightens data control and is ready to cooperate with AI companies, but only if they accept payment.
Previously, Reddit had affirmed that it would not limit "good-natured workers" like Internet Archive. However, this stance has changed when it was discovered that some AI-enabled parties illegally exploited data through Wayback machine.
Internet archive and Wayback machine
Established in 1996 in the US, Internet Archive is a non-profit organization founded by computer engineer Brewster Kahle, with the goal of building a comprehensive and public storage network on the Internet. The most famous is Wayback machine, a tool that allows users to access archived versions of past websites.
Internet Archive provides free access to many types of digital content, from websites, software, music, movies to print publications...
Most of the data is automatically collected through a crawler system, to preserve public information and combat the "evaporation" of digital data.
According to Brewster Kahle, the Internet Archive is not only a digital library, but also a cultural protection measure against technological change. Mr. Brewster Kahle compared the project to an effort to recreate the " handover of Syriacs" of modern times, ensuring that human knowledge is preserved for future generations.
Reddit's blocking of Wayback machine shows new tensions in the AI era, as the line between protecting data ownership and maintaining an open internet is increasingly difficult to determine.