PROMISE project

A federal strategy for the preservation of the Belgian web

The project Preserving Online Multiple Information: towards a Belgian strategy (PROMISE) started on 1 June 2017 and aimed to develop a federal strategy for the preservation of the Belgian web.

Scope of this project

The web has become a central means of communication in our everyday lives. Today it is considered as a publication channel in its own right. With the dawn of the digital era and the rise of online publications, the notion of a ‘publication’ has widened and has found its own place in the digital world and the world of the web. As is the case for print publications, the preservation of which is guaranteed by legal deposit, a long-term preservation policy needs to be developed for the web.

A research project was initiated to formulate an answer to the urgent question of preserving the Belgian web for future generations, as an important part of Belgian publishing and Belgian history.

As digital publications are naturally linked to the web because of the technologies that are used to create and disseminate them, KBR is entitled to collect and create an inventory of the websites that are within the scope of its mission. The Royal Decree that determines the mission of the library was adapted to this end on 25 December 2016. This new mandate is linked to a legislative proposal that is currently being prepared with regard to the extension of the legal deposit to include digital publications. This will enable KBR to respond to the pressing need of the preservation of Belgian digital publications.

Phases of the project

  1. Identify best practices in the field of web-archiving
  2. Set up a pilot project for the archiving of the Belgian web
  3. Identify use cases for the scientific study of the Belgian web
  4. Make recommendations for the implementation of a sustainable web-archiving service

During the two-year project a research team drafted a selection policy for the websites that need to be archived, proposed a legal framework to determine the roles and responsibilities of the State Archives and KBR and developed a prototype web-archive that was tested and evaluated by a panel of users.

The research results of this project are currently being used to develop and implement a web archive within KBR. Moreover, the BESOCIAL research project (2020-2022) is an interesting addition to the PROMISE project as it deals with the archiving of social media.

Project researcher: Friedel Geeraert

Documentation

Partners

The PROMISE project is financed by the Belgian Science Policy Office (Belspo) as part of the BRAIN.be programme and is coordinated by KBR. The State Archives, the universities of Ghent (Research Group for Media, Innovation and Communication Technologies ; Ghent Centre for Digital Humanities) and Namur (Research Centre in Information, Law and Society) and the university college Bruxelles-Brabant (Unité de Recherche et de Formation en Sciences de l’Information et de la Documentation) are partners in the project.

Promisebot

Promisebot is a web crawler or spider used in this project.

Promisebot’s crawl process begins with a list of web page URLs. As Promisebot visits each of these websites it detects links (SRC and HREF) on each page and adds them to its list of pages to crawl.

Promisebot identifies itself as: “Mozilla/5.0 (compatible; promisebot/1.0 +https://www.kbr.be/en/promise-project)”
in the User-Agent HTTP request header. Promisebot uses these IP addresses and host names:

  • 91.121.67.124 – ns301053.ip-91-121-67.eu
  • 172.18.16.11 – ea06c202.private.ugent.be
  • 193.191.148.229 – promise.ilabt.imec.be

If you detect any unexpected behaviour please contact us, indicating the full User-Agent and if possible the IP
address.

Promisebot shouldn’t access your site more than once every few seconds on average to prevent overloading web servers. However, due to network delays, it’s possible that the rate will appear to be slightly higher over short periods.

Promisebot honours robots.txt, which you can use to allow or deny access to (parts of) your site, or to change the request rate.