Web archive
Web archive
The National Library preserves Luxembourg’s web as part of the country’s digital heritage. Websites constantly evolve, and without intervention, past versions disappear. Our web archive captures and stores websites at different points in time, documenting changes in Luxembourg’s society, culture and knowledge. It offers a valuable service to website owners by preserving the content they create and represents a unique resource for researchers and historians, ensuring that future generations can explore and study the digital past.
Contents
General information
The BnL preserves a broad range of websites to document Luxembourg’s digital heritage. This includes all sites with a “.lu” top level domain, as well as websites published in Luxembourg. The archive also captures websites created by Luxembourgers abroad or which have a strong connection to Luxembourg. In addition to large-scale web crawls, the BnL curates special collections on specific themes or events, that require closer attention and more frequent captures. By combining different methods (broad domain crawls, targeted thematic collections and time-sensitive event-based captures) the BnL ensures that significant online content is preserved for future generations.
If you manage a website or contribute content, let us know! If you come across a Luxembourgish site that should be preserved for the future, we’d love to hear about it.
Limitations and completeness
The BnL strives to capture websites as comprehensively as possible. However, technical and practical constraints mean that not everything can be archived in full or that all changes on a website can be captured. Websites are generally selected based on domain lists, as well as lists compiled from submissions from website owners and subject experts.
Certain limitations affect the completeness of the archive. Some websites require advanced resources to be properly captured, and in some cases, only partial snapshots can be preserved. Social media platforms present additional challenges due to their rapidly changing content. Their restrictions on automated tools such as web crawlers make it difficult to collect data from most platforms systematically. Legal and ethical considerations also play a role. While content is never deleted from the archive, access to certain materials may be restricted if required by law. Privacy concerns are taken into account but do not automatically override the public’s right to access historical information.
Despite these challenges, the BnL continues to expand its web archive.
Archiving methods
The BnL’s web archive goes beyond simple screenshots or code – it aims to capture websites as they are, preserving their structure, content and functionality in as much detail as possible.
The process begins with a web crawler, an automated program that navigates and scans websites much like a search engine. This crawler systematically downloads all publicly accessible elements, including text, images, documents and layout. The result is a complete archival copy of the site, which can be browsed just like the original.
Over time, as multiple versions of a website are archived, the collection forms a timeline of its evolution. These past versions can be explored through our web archive, allowing users to track changes and developments across the years.
Information for webmasters
Website archiving is conducted in accordance with the Law of 25 June 2004 on the reorganisation of state cultural institutes and the modified Grand-Ducal Regulation of 6 November 2009 on legal deposit. These regulations mandate the archiving of publicly accessible digital content.
The web archive’s spider respects the robots.txt file with a few exceptions. Any file necessary for the complete display of a webpage (e.g., CSS, images) is downloaded even if it is listed in the robots.txt exclusion list. Additionally, all landing pages for all sites are collected regardless of the robots.txt settings. The BnL reserves the right to modify this policy as needed, in accordance with the law.
To ensure your website is crawled correctly, please refer to our compliance guidelines. They will help you preserve your website for future generations.
Frequently asked questions
Where can I view the web archive and why is it not available online?
The web archive can only be accessed on-site at the National Library of Luxembourg, using the computers provided in the Reading Room. This is due to copyright restrictions – making archived websites publicly accessible online would infringe upon the rights of website owners.
How are websites selected for archiving?
The BnL employs three methods for web harvesting:
- Large-scale crawls: Conducted four times a year, these cover all “.lu” websites and other domains determined by website owners and third-party contributors. These crawls establish a snapshot of the Luxembourg web every 3 months, but may not capture rapidly changing or short-lived content.
- Event collections: Focused on events of national importance, a collection of websites is crawled on a high frequency of captures over a limited period of time. Typical event crawls cover topics, such as election campaigns, natural disasters, or the Covid-19 pandemic.
- Thematic collections: Based on evolving lists, we count on the support and contributions of subject experts from all fields of knowledge. More and more websites are added over time, other websites which might have disappeared, still remain available in the web archive.
This combination helps ensure a more complete picture of Luxembourg’s digital heritage, though technical and logistical limitations mean that some content will inevitably be missed.
How do I know if my website has been archived?
If your website has a “.lu” domain, it is automatically included in the regular large-scale crawls. Websites with other domain extensions (e.g., “.com”, “.de”, “.eu”) are not archived unless they have been manually added manually to our crawling lists – either through our own research, or suggestions from website owners. In order to make sure that your website is included in the web archive, please contact us, or use the suggestion form. All suggestions are checked for relevancy before being added to our seed lists.
Website owners can check their server logs for requests from our web crawler, identified by the user agent “NLUX_IAHarvester”, which includes a link to further information. If you are unsure whether your site has been archived, or if you want to let us know about your website, you are welcome to contact us.
Can I request that my website not be archived?
The BnL follows legal deposit regulations, meaning that all publicly accessible Luxembourg-related websites are subject to archiving. We do not hack, bypass privacy settings, or collect personal information beyond what is already visible to the public.
Our web crawler only collects publicly available content and generally respects robots.txt exclusions, except for files essential to a website’s display (e.g., CSS, images). Websites that have been taken offline due to legal issues may still be archived for research purposes but will not be viewable in the public archive.
If you would like to submit a takedown request for archived versions of your website or other web contents to be excluded from our web archive, send us a request via email and include:
- the URL or URLs of the websites and web contents;
- the time period that you think should be excluded;
- the reason, why you think these URLs should be excluded;
- and any other information that you think would be helpful for us to better understand your request.
We will then start a review process. Please note that we cannot guarantee the outcome of your request in advance.
Do you archive social media?
No, social media is not systematically archived, but we may capture select content on a case-by-case basis when relevant to Luxembourg’s digital heritage.
Social media platforms also place restrictions on automated tools such as web crawlers, which make it difficult to collect data systematically.
Last update