It took nearly five years into the internet’s life before anyone made a concerted effort to archive it. Much of our earliest online activity has disappeared.

This post was originally written by Stephen Dowling for BBC Future.

In 2005, student Alex Tew had a million-dollar brainwave.

The 20-year-old was playing around with ideas to pay for a looming three-year business degree; Tew was already worrying that the overdraft he had would mushroom. So he scribbled on a pad: “How to become a millionaire.”

Twenty minutes later he had what he thought was the answer.

Tew set up a website called the Million Dollar Homepage. The site’s model was almost obscenely simple: on it was a million pixels of ad space, the pixels available to buy in blocks of 100 at $1 a pixel. Once you bought them they were yours forever. When the millionth pixel was sold, Tew would be a millionaire. At least, that was the plan.

The Million Dollar Homepage launched on 26 August 2005, after Tew had spent the grand sum of 50 euros on registering the domain and setting up the hosting. Advertisers bought pixels and provided a link, tiny image and a short amount of text for when the cursor hovered over their image.

After little more than a month, thanks to word-of-mouth and ever-increasing media attention, Tew’s homepage had raised more than $250,000 (£140,000). In January 2006, the last 1,000 pixels were sold at auction for $38,100 (£21,500); Tew had indeed made his million.

The Million Dollar Homepage is still online, nearly a decade and a half after it was created. Many of the customers – which included the likes of the UK’s The Times newspaper, travel service Cheapflights.com, online portal Yahoo! and rock duo Tenacious D – have had 15 years of advertising off that one-off payment. The site still has several thousand viewers every day; it has probably been a very good investment.

Million Dollar Homepage (Credit: Getty Images)

The Million Dollar Homepage is now full of links to sites which no longer exist (Credit: Million Dollar Homepage)

Tew, who now runs the meditation and mindfulness app Calm, indeed became a millionaire. But the homepage he created has also become something else: a living museum to an earlier internet era. Fifteen years may not seem a long time, but in terms of the internet it is like a geological age. Some 40% of the links on the Million Pixel Homepage now link to dead sites. Many of the others now point to entirely new domains, their original URL sold to new owners.

The Million Dollar Homepage shows that the decay of this early period of the internet is almost invisible. In the offline world, the closing of, say, a local newspaper is often widely reported. But online sites die, often without fanfare, and the first inkling you may have that they are no longer there is when you click on a link to be met with a blank page.

***

Around a decade ago, I spent two years working on a rock music blog and on the music section of AOL, the sprawling internet pioneer now owned by US phone company Verizon. I edited or wrote hundreds of live reviews, music news stories, artists interviews and listicles. Facebook and Twitter were already massive audience drivers, and smartphones were connecting us to the Web between work and home; surfing the Web had become a round-the-clock activity.

If Brewster Kahle hadn’t set up the Internet Archive and started saving things, without waiting for anyone’s permission, we’d have lost everything – Dame Wendy Hall

You could, quite reasonably, assume that if I ever needed to show proof of my time there it would only be a Google search away. But you’d be wrong. In April 2013, AOL abruptly closed down all its music sites – and the collective work of dozens of editors and hundreds of contributors over many years. Little of it remains, aside from a handful of articles saved by the Internet Archive, a San Francisco-based non-profit foundation set up in the late 1990s by computer engineer Brewster Kahle.

It is the most prominent of a clutch of organisations around the world trying to rescue some of the last vestiges of the first decade of humanity’s internet presence before it disappears completely.

Dame Wendy Hall, the executive director of the Web Science Institute at the University of Southampton, is unequivocal about the archive’s work: “If it wasn’t for them we wouldn’t have any” of the early material, she says. “If Brewster Kahle hadn’t set up the Internet Archive and started saving things – without waiting for anyone’s permission – we’d have lost everything.”

AOL Music at SXSW 2011 (Credit: Getty Images)

AOL shut its music sites in 2013, deleting years of music coverage from around the world (Credit: Getty Images)

Dame Wendy says archives and national libraries had experience saving books, newspapers and periodicals because print had been around so long. But the arrival of the internet – and how quickly it became a mass form of communication and expression – may have taken them by surprise. The attempts to archive the internet have, in many areas, been playing catch-up ever since. “The British Library had to have a copy of every local newspaper published,” she says. As the newspapers have gone from print to the Web, the archiving takes a different form. Are these websites as vital a resource as the papers which preceded them?

Newspaper archives are vulnerable, too, to being lost when the publications are closed down or merged with other titles. “Most newspapers, I imagine, will have some sort or archive,” she says. “But that can be lost unless it is archived properly.”

Who’s going to pay for it? We produce so much more material than we used to – Dame Wendy Hall

One major problem with trying to archive the internet is that it never sits still. Every minute – every second – more photos, blog posts, videos, news stories and comments are added to the pile. While digital storage has fallen drastically in price, archiving all this material still costs money. “Who’s going to pay for it?” asks Dame Wendy. “We produce so much more material than we used to.”

In the UK, the role of digital conservation has partly fallen to the British Library. The library runs the UK Web Archive, which has been collecting websites by permission since 2004. The archive’s engagement manager Jason Webber says the problem is much bigger than most people realise.

People surfing internet in 1995 (Credit: Getty Images)

Very little of the content from the earliest days of the Web – the era of messageboards and internet cafes – now remains (Credit: Getty Images)

“It’s not only the early material. Most of the internet is not being stored,” he says.

“The Internet Archive first started archives pages in 1996. That’s five years after the first webpages were set up. There’s nothing from that era that was ever copied from the live web.” Even the first web page set up in 1991 no longer exists; the page you can view on the World Wide Web Consortium is a copy made a year later.

For much of the first five years of the Web, much of the material published in Britain ended with the designation .ac.uk – academic articles written by academics. It was only in 1996 that the Web started seeing more general sites being set up, as commercial websites started outnumbering academic ones.

I think there’s been very low level of awareness that anything is missing – Jason Webber

The British Library does one “domain crawl” every year – saving anything that is published in the UK. “We try and get everything, but we do only do it once a year. But the cap for a lot of these sites is set at 500MB; that covers a lot of smaller sites, but you only have to have a few videos in there and that limit gets reached pretty quickly.” News websites like BBC News, however, do get crawled more often. The library, Webber says, has tried to build as complete picture as possible of events such as Brexit, the London 2012 Olympics and the 100th anniversary of World War One.

“I think there’s been very low level of awareness that anything is missing,” Webber says. “The digital world is very ephemeral, we look at our phones, the stuff on it changes and we don’t really think about it. But now people are becoming more aware of how much we might be losing.”

But, Webber says, the only material organisations have the right to gather is publicly viewable; an even bigger amount of culturally or historically important data is sitting on people’s archives, like their hard drives. But few of us are keeping those for posterity.

“The British Library is full of letters between people. There are exchanges between politicians, or love letters, and these things are really important to some people.”

Pile of newspapers (Creidt: Getty Images)

Archives knew the importance of saving newspapers but were slower to react to the rise in online material (Credit: Getty Images)

We consider the material we post onto social networks as something that will always be there, just a click of a keyboard away. But the recent loss of some 12 years of music and photos on the pioneering social site MySpace – once the most popular website in the US – shows that even material stored on the biggest of sites may not be safe.

And even Google’s services are not immune. Google+, the search giant’s attempt at a Facebook-rivalling social network, closed on 2 April. Did all its users back up the photos and memories they shared on it?

“Putting your photos on Facebook is not archiving them, because one day Facebook won’t exist,” says Webber. If you have any doubt about the temporary nature of the Web, take a few minutes to trawl through the Million Dollar Homepage. It is the testament to how quickly our online past is fading away.

There is another side to data loss. Dame Wendy points out that not archiving stories from news websites could lead to a selective view of history – new governments choosing not to save stories or archives which have cast them in a poor light, for instance.

The political is so often tied into the technical – Jane Winters

“As soon as there’s a change of government, or restructuring of quangos, sites are closed down,” says Jane Winters, a professor of digital humanities at the University of London. “Or look at election campaigning sites, which by their nature are set up to be temporary.”

Sometimes the sites that are lost echo even more seismic changes; the deaths and births of nations themselves. “It happened with Yugoslavia; .yu was the top-level domain for Yugoslavia, and that ended when it collapsed. There’s a researcher who is trying to rebuild what was there before the break-up,” she says.

“The political is so often tied into the technical.”

There is, perhaps, a slight silver lining. “I come from a history background, and we’ve always had to deal with gaps in the historical records, some of which we know about, and some we just have no idea about.”

Dame Wendy Hall also sees parallels with the physical. When she was 15, in the late 1960s, she appeared as part of the audience in a taping of the BBC’s music show Top of the Pops.

The show was shown on Christmas Day. “The TV was on, and my mother said ‘There you are! But I missed it. And I’ve since gone to the BBC and tried to get a copy of it – they taped over it. I never got to see it.”