The web has become a central means of communication in our everyday lives. Today it is considered as a publication channel in its own right. With the dawn of the digital era and the rise of online publications, the notion of a ‘publication’ has widened and has found its own place in the digital world and the world of the web. As is the case for print publications, the preservation of which is guaranteed by legal deposit, a long-term preservation policy needs to be developed for the web
A research project has been initiated to formulate an answer to the urgent question of preserving the Belgian web for future generations, as an important part of Belgian publishing and Belgian history.
As digital publications are naturally linked to the web because of the technologies that are used to create and disseminate them, the Royal Library is entitled to collect and create an inventory of the websites that are within the scope of its mission. The Royal Decree that determines the mission of the library was adapted to this end on 25 December 2016. This new mandate is linked to a legislative proposal that is currently being prepared with regard to the extension of the legal deposit to include digital publications. This will enable the Royal Library to respond to the pressing need of the preservation of Belgian digital publications.
The project Preserving Online Multiple Information: towards a Belgian strategy (PROMISE) started on 1 June 2017 and aims to develop a federal strategy for the preservation of the Belgian web.
The different phases of the project are:
- Identify best practices in the field of web-archiving
- Set up a pilot project for the archiving of the Belgian web
- Identify use cases for the scientific study of the Belgian web
- Make recommendations for the implementation of a sustainable web-archiving service
During the two-year project a research team will draft a selection policy for the websites that need to be archived, propose a legal framework to determine the roles and responsibilities of the State Archives and the Royal Library of Belgium and develop a prototype web-archive that will be tested and evaluated by a panel of users.
Project researcher: Friedel Geeraert
Download the presentation (pdf) in which the project is explained.
The PROMISE project is financed by the Belgian Science Policy Office (Belspo) as part of the BRAIN.be programme and is coordinated by the Royal Library. The State Archives, the universities of Ghent (Research Group for Media, Innovation and Communication Technologies ; Ghent Centre for Digital Humanities) and Namur (Research Centre in Information, Law and Society) and the university college Bruxelles-Brabant (Unité de Recherche et de Formation en Sciences de l’Information et de la Documentation) are partners in the project.
Promisebot is a web crawler or spider used in this project.
Promisebot’s crawl process begins with a list of web page URLs. As Promisebot visits each of these websites it detects links (SRC and HREF) on each page and adds them to its list of pages to crawl.
Promisebot identifies itself as: “Mozilla/5.0 (compatible; promisebot/1.0 +https://www.kbr.be/en/promise-project)”
in the User-Agent HTTP request header. Promisebot uses these IP addresses and host names:
- 184.108.40.206 - ns301053.ip-91-121-67.eu
- 172.18.16.11 - ea06c202.private.ugent.be
- 220.127.116.11 - promise.ilabt.imec.be
If you detect any unexpected behaviour please contact us, indicating the full User-Agent and if possible the IP
Promisebot shouldn't access your site more than once every few seconds on average to prevent overloading web servers. However, due to network delays, it's possible that the rate will appear to be slightly higher over short periods.
Promisebot honours robots.txt, which you can use to allow or deny access to (parts of) your site, or to change the request rate.