If we're talking about just another bot like GoogleBot going out there to look at what it can find, then couldn't they be blocked via something like the .htaccess file?
Publishing ANYTHING on .uk? From now, Big Library gets copies
On the same day that thousands of public sector bods will go on strike in a row over pay, pensions and working conditions, new regulations will come into force at midnight tonight allowing the British Library to begin scraping content from UK websites. Under the rules - known as legal deposit - the country's biggest collector …
-
-
Friday 5th April 2013 11:15 GMT Vimes
Also: I hope they don't plan on using cloud based services like the offerings provided by the likes of Amazon to do the scraping and rely on their own servers.
I've already got Amazon on the naughty step thanks to hacking attempts coming from them, and the only other thing that would distinguish the legitimate scraper from the rest in the access log would be the user agent string - and this can be easily forged.
-
-
-
Friday 5th April 2013 13:26 GMT Vimes
Re: Errrr presuambly these are excluded
Somebody has been paying too much attention to certain ideas put out as April fools jokes...
http://gizmodo.com/5777429/the-entire-internet-on-a-floppy-disk
Also found this when looking for a link like the one above:
http://www.w3schools.com/downloadwww.htm
:)
-
-
Friday 5th April 2013 11:27 GMT Anonymous Coward
What about our copyrights?
"Under the rules - known as legal deposit - the country's biggest collector of publications produced in the UK and Ireland will start harvesting what it described as "ephemeral materials like websites" to ensure that the content is "preserved forever"".
Yet what if I published something (put online) which I don't want to be preserved (yet) ?
And let's ignore the obvious "I own copyright on my work" issue but what about situations where I pre-publish stuff to appeal to the visitors while I'm still working on it? I'm doing that a lot with several tutorials I write (I'm passionate about sound synthesis & design and maintain my own hobby website) and as long as a version hasn't reached v1.0 status I wouldn't want to see it getting included with some big collection of stuff. Simply because some things could easily change, sometimes quite drastically.
Another issue; although its very easy to point at Google many people forget that in contrast to popular belief something which gets slurped by Google can be removed again. And it's quite easy too, the keywords being webmaster tools. As others above already pointed out; you can even prevent Google from indexing your site (or parts of it).
So what do these guys provide? Or are we now down to "We're the government, we decide, the end justifies the means, it's all for the common good, stop whining." ?
And some people still wonder why so many are losing faith rapidly when it comes to governments in combination with IT.
-
Friday 5th April 2013 11:37 GMT Pen-y-gors
Re: What about our copyrights?
Copyright isn't an issue, that's the whole point of copyright deposit - publishers of books are required to give a copy to each of the six copyright deposit libraries (if they want one) - it doesn't affect the authors' copyright. Ditto with online material - the copyright is unchanged, but they are now allowed to make a copy for archival purposes whether you want them to or not. If you want to keep it secret don't publish it on the web for everyone to see and download.
This greatly simplifies things - the National Library of wales started to archive a number of 'important' Welsh websites several years ago, but had to contact the site owners of each one and get their permission in advance, and, if I remember rightly, the copies are not accessible outside the Library network - it's an archive for long term preservation, not a mirror site.
-
Friday 5th April 2013 12:01 GMT Anonymous Coward
Re: What about our copyrights?
Copyright isn't an issue
but are they just slurping text and ebook files? or are they taking music, movies, images and everything?
if you have a website where you've purchased the rights to display photographs (i.e. a celebrity fansite), the license only exists to your own site. Are the British Library purchases a blanket license from the likes of GettyImages?
what if you've put up a mp3 of your favourite music on your site? it's not worth the music industry targeting you for your single infringement, but after this exercise, the British Library could be liable for millions of copyright infringements for non-book material.
-
Friday 5th April 2013 12:06 GMT David Dawson
Re: What about our copyrights?
Copyright is a legally granted monopoly given to the creator of a work.
Its not something that naturally exists, its a collection of laws passed by HM Government.
So, if the Government of the day chooses to alter how copyright is assigned to allow the British Library to scrape the UK portion of the internet, it is perfectly legal for it to do that, as it created the entire concept of copyright in UK law in the first place.
-
Friday 5th April 2013 13:01 GMT Vimes
Re: What about our copyrights?
But where laws are concerned: whose laws take priority when more often than not sites cross national boundaries?
Take for example:
http://amazon.co.uk.ipaddress.com/
Hostname: amazon.co.uk, IP Address: 176.32.108.186, Organization: PROD DUB, ISP: Amazon Data Services Ireland Ltd, City: -, Country: Netherlands
As for the registrant's address for Amazon.co.uk:
65 boulevard G-D. Charlotte, Luxembourg City, Luxembourg, LU-1311, Luxembourg
-
Friday 5th April 2013 18:07 GMT Ken Hagan
Re: What about our copyrights?
I expect the UK government's attitude would be that (since the .uk namespace belongs to them, even if they do delegate its management to Nominet) if you publish under a .uk address, you are putting the material (and perhaps yourself) under UK law. If you are re-publishing stuff which you don't have the right to put under UK law, that would be a matter between you and the owner of the stuff you are re-publishing.
-
-
Saturday 6th April 2013 04:11 GMT Anonymous Coward
Re: What about our copyrights?
off-topic @Daivid Dawson: what kind of answer is that? It's ok for the government to take things away since they created it?
If one day the UK is to be hit by a meteorite, and the UK government decided to suspend all telecommunications, air and cross-channel traffic to prevent panicks and to only allow the "privileged" to safely escape the country, according to your reasoning, it's ok to do that since they created much of what modern society is made up of.
I didn't realise we're still a bunch of serfs under the feudal system.
-
Saturday 6th April 2013 11:10 GMT Anonymous Coward
Re: What about our copyrights?
"I didn't realise we're still a bunch of serfs under the feudal system".
Most people don't realise that. Congratulations on waking up and noticing reality. (Matrix, anyone?)
Consider. As Walter Bagehot said, Parliament can do anything except change a man into a woman. (And with modern technology I'm not sure that restriction applies any more). Parliament is an assembly of our elected representatives, which therefore expresses our collective will - right? Wrong. Parliament is an assembly of self-seeking jobsworths a majority of whom do exactly what the Prime Minister tells them to - if they want to go on enjoying their cushy lifestyle.
So far, we have established that David Cameron can do anything he wants, with the possible exception of sex changes. There is no essential difference between his power and that of a medieval king such as, perhaps, William the Conqueror or John. So yes, actually, we are serfs - except that serfs had more concrete and enforceable rights than we do. And didn't have to pay as much tax. (See, for example, https://sites.google.com/site/stevenburgauer/essay03).
-
-
Sunday 7th April 2013 09:17 GMT Anonymous Coward
Re: What about our copyrights?
"I'll believe in this relationship when Cameron dies from a surfeit of peaches and cider and Miliband is found drowned in the Butt of Ramsey. Or wherever he has been brownnosing this week".
Somewhere, Messrs Sellers and Yeatman are laughing heartily. They must have come up with some jokes the publishers wouldn't print.
But there is something in it. After all, wasn't King Gordon faced down by the banking barons?
-
-
-
-
-
Saturday 6th April 2013 21:32 GMT David Dawson
Re: What about our copyrights?
"off-topic @Daivid Dawson: what kind of answer is that? It's ok for the government to take things away since they created it?
If one day the UK is to be hit by a meteorite, and the UK government decided to suspend all telecommunications, air and cross-channel traffic to prevent panicks and to only allow the "privileged" to safely escape the country, according to your reasoning, it's ok to do that since they created much of what modern society is made up of.
I didn't realise we're still a bunch of serfs under the feudal system."
-----------
In this country, Parliament is sovereign, so yes, if the government chose to do that, then that would be legal, which is a different thing to 'ok'. Legal and moral/ ethical are separate concepts I'm afraid.
Sorry you had to find out this way. I wish they would teach this kind of thing in school.
"Er, and other governments. The UK government can pass laws overriding the copyright it grants, but not that granted by the USA, France, Germany, China..."
--------------
Only so far as the law in this country respects those other countries laws. Which is what sovereign means. This is an important distinction! The UK has signed up to copyright treaties, so I imagine they would be respected...
-
Saturday 6th April 2013 22:06 GMT Anonymous Coward
Re: What about our copyrights?
<quote> The UK government can pass laws overriding the copyright it grants, but not that granted by the USA, France, Germany, China...</quote>
Yes it can, for the same reasons that other countries like China get to ignore OUR copyright laws, we may end up violating some treaty but at the end of the day? trade sanctions from the USA? no one cares about their entertainment being banned anyway.
-
-
-
-
-
Friday 5th April 2013 12:34 GMT Gav
Re: What about our copyrights?
It's not rocket science people. If you do not want people to take a copy of your website content then do not put it on a publicly accessible website. It's how browsers work. They have caches, they take copies.
Copyright has nothing to do with it, as no-where is it said that the British Library will be re-publishing your website. It has a copy. It will let others see that copy, just like it already does for millions of books.
-
Saturday 6th April 2013 20:42 GMT Anonymous Coward
Re: What about our copyrights?
Agree entirely. What cheek!
I have no objection to services like Archive.org, which keep a record of valuable sites, and which any of us can access. But I object to this one. Why? Because the "archive" is purely for the benefit of staff at the British Library (and, yes, the relative handful of people who can physically walk there, if the staff decide to let them use it too). I don't put stuff on the web for the benefit of British Library staff; I do so for everyone.
As ever with the British Library, "one for us, and none for you".
-
-
-
Friday 5th April 2013 11:39 GMT Pen-y-gors
Re: Wayback machine
Yep, an existing private service that can be switched off tomorrow at the whim of the owner. That's not what I call secure long-term archiving and preservation. It's important for people in 200 years to have access to the the day-to-day publications of the 21st century - will Wayback machine still be online in 5 years, 10 years, 20 years, 50 years?
-
Sunday 7th April 2013 09:20 GMT Anonymous Coward
Re: Wayback machine
"It's important for people in 200 years to have access to the the day-to-day publications of the 21st century..."
If you believe for a single moment that Web sites scraped by copyright libraries and stored with today's technology will be legible in 200 years, I have $1 trillion worth of hybrid HD/SSDs to sell you.
-
-
-
Friday 5th April 2013 11:52 GMT Tom7
How far does this go?
I'm curious, though not curious enough to go look it up. Does this allow them to scrape your content, or does it force you to allow them to scrape it? What I'm getting at is, what if I detect the British Library robot and send it off to some obscure error page to prevent them archiving my site? Has that just become illegal? Or does it just indemnify the libraries from copyright claims if they happen to get to my content?
-
Friday 5th April 2013 12:00 GMT Kubla Cant
Beano
Although these libraries are entitled to receive and keep a copy of all copyright publications, I don't think they necessarily do so. I seem to recall that the Bodleian failed to produce back numbers of the Beano to help while away the long hours that should have been spent writing essays.
In the case of web content, the fact that a large and increasing proportion is produced dynamically must make things difficult. For many sites there's no such thing as a definitive copy, so there's nothing to keep.
And can anybody explain why an alien university such as Trinity College Dublin should benefit from the free handout of books?
-
Friday 5th April 2013 12:53 GMT Anonymous Coward
Re: Beano
Although these libraries are entitled to receive and keep a copy of all copyright publications, I don't think they necessarily do so.
However, many years ago I seem to recall reading an amusing report about a school somewhere in England that had discovered a dusty old tome in its library and after a bit of research came to the conclusion they had the only copy of this book in existence. They made a big deal of it with press releases etc ... and were then surprised when they got a letter from one of the copyright libraries saying that as they were entitled to a copy of the book and as this seemed to be the only copy available then they'd be sending someone to collect it!
-
Saturday 6th April 2013 11:15 GMT Anonymous Coward
Re: Beano
"...they got a letter from one of the copyright libraries saying that as they were entitled to a copy of the book and as this seemed to be the only copy available then they'd be sending someone to collect it!"
Think of all the fun they could have had by informing the other copyright libraries of the situation, and then watching them fight it out.
-
-
-
Friday 5th April 2013 12:05 GMT heyrick
Thoughts
My stuff is released under a sort of licence. Essentially it is reminding you of copyright, but it also expressly forbids the content being served by a third party system while my website is still "live" (when I'm gone, it's no longer my problem). Secondly it prohibits in any case the modification of content for any purpose other than translation (especially the practice of detecting keywords and linking them to adverts). Those are the terms of distribution, Accept them or piss off, basically.
Secondly, given that recently a person was guilty of libel for retweeting a lie; I presume if somebody libels on their site and this turns up in the copy, the British Library will also be equally liable.
Thirdly, any terms and conditions imposed by the library will be groundless; they want to come get our content and copy it, so good luck making a disclaimer stick...
Fourthly, I assume it will obey robots.txt; if not it'll get blocked by IP on principle (or maybe I'll redirect them to their own website?).
Did they think this through?
-
Friday 5th April 2013 12:43 GMT Anonymous Coward
Re: Thoughts
" Secondly it prohibits in any case the modification of content for any purpose other than translation (especially the practice of detecting keywords and linking them to adverts). Those are the terms of distribution, Accept them or piss off, basically."
You don't understand copyright. You have been *given* the right over copies on the condition that the BL is allowed to store the material regardless of what you want/like. If you don't like it, your choices are to take it off line or release it as public domain.
This is not new; it's just a clarification of existing law.
-
Friday 5th April 2013 17:04 GMT dssf
Re: Thoughts THEREin lies the problem
Well, for some people. OK, so I'm trying to "deconstruct" , this to understand it...
Does a government think that it "created the concept of copyright"? Why cannot this be just a mere recognition that inventiveness deserves some protection?
If a author writes a story, publishes it, and people pay for it (assuming it is that good), then, if government wants a copy, does it then get in line and PAY for a copy just like anyone else paying for a legit copy? (Doesn't the author have to pay just to obtain recognition of the copyright?) Otherwise, TAKING a copy could be seen as tantamount to theft. (If I just said "arrrestable words", then, kindly remind me not to dare step foot in the UK or on soil from where I can be extradicted to the UK...) I could see huge problems in the future if the UK library law were allowed to perpetuate on distant human colonies. Colonists might throw an insurrection unless it is the PEOPLE who collectively say that it is an okay thing. I am not saying *hide* or deny the preservation of published materials of worth or note, but that each published work preserved by a library should first be done so with the permission of the author or rightful rights holder. Government doesn't publish fiction, cooking guides, comics, porn, or love stories. So, it doesn't deserve to "own" the copyright in those works. Fortunately, it seems, things are different in the states. Well, to an extent. Here, it's getting to the point where the public may end up paying to access court documents and "public records".
Would it be unreasonable to hear someone say "The real fact behind such a proclamation is that it allows powerful men to shut down and jail/imprison those it deams a threat"? If government "grants" rather than "recognizes" copyright of an other's works, then it means government can shut down a voice it doesn't want heard. Copyright may be a "human construct", but it should not be a right for any damned government to think it can just take and shut down works.
Yes, I get it, the story is not about copyright and government profiting financially.
BTW, do I understand that in the UK, if a person in the UK publishes a work, and makes only ONE physical copy, and the government library system wants a copy, it can *demand* a copy? What if the author says, "You must pay me for time, materials, and labor", and marks it up above street price? Would that be legal? Can the English/UK library system demand the author provide a free copy? I am assuming that an author or publisher or copyright owner must pay at the government toll gate to initially get that piece of paper stating "you're the proud owner of this government-issued/authorizied/revokable copyright"...
-
Friday 5th April 2013 18:40 GMT Steve Knox
Re: Thoughts THEREin lies the problem
Does a government think that it "created the concept of copyright"?
-
Saturday 6th April 2013 11:23 GMT Anonymous Coward
Re: Thoughts THEREin lies the problem
"If a author writes a story, publishes it, and people pay for it (assuming it is that good), then, if government wants a copy, does it then get in line and PAY for a copy just like anyone else paying for a legit copy?"
For the same reason that, if a government wants £100 billion to give to corrupt banksters or to render some remote country uninhabitable, it doesn't tell its ministers to roll up their sleeves and do some honest work to earn the money. It just passes a law compelling the rest of us to give it the money.
I'm always amused by people who maintain that "violence never solves anything" or that we live in an essentially peaceful society. Everything government does is founded solidly on an indispensable foundation of almost unlimited violence. If I don't wish to pay taxes, they will take them out of my bank account. If I withdraw the money and hide it under my bed, they will send policemen to take it from there. If I resist, the policemen will arrest me. If I won't let them, they will threaten me with weapons. If I defend myself with a weapon (using an appropriate level of force) they will, at some point, kill me - or wound me severely enough to stop me resisting. Then (if I survive) they will send me to prison and keep me there by more violence.
That is how government works. But don't take my word for it. Would you believe the first president of the freest, most democratic, most wonderful nation the world has ever seen?
"Government is not reason. It is not eloquence. It is a force, like fire: a dangerous servant and a terrible master".
- George Washington
-
Tuesday 9th April 2013 12:20 GMT Anonymous Coward
Re: Thoughts THEREin lies the problem
"If a author writes a story, publishes it, and people pay for it (assuming it is that good), then, if government wants a copy, does it then get in line and PAY for a copy just like anyone else paying for a legit copy? (Doesn't the author have to pay just to obtain recognition of the copyright?)"
The government does pay, in the form of the legal protection. And, no, the author doesn't have to pay for copyright in the UK.
-
-
Friday 5th April 2013 22:40 GMT heyrick
Re: Thoughts
"You don't understand copyright. You have been *given* the right over copies on the condition that the BL is allowed to store the material regardless of what you want/like."
In other words, "let's tweak a law so we can record a copy of everything without landing in trouble".
Consider this: The acts restricted by copyright in a work. (1)The owner of the copyright in a work has, in accordance with the following provisions of this Chapter, the exclusive right to do the following acts in the United Kingdom— (a)to copy the work (see section 17); (b)to issue copies of the work to the public (see section 18); [F44(ba)to rent or lend the work to the public (see section 18A);] (c)to perform, show or play the work in public (see section 19); [F45(d)to communicate the work to the public (see section 20);] (e)to make an adaptation of the work or do any of the above in relation to an adaptation (see section 21); and those acts are referred to in this Part as the “acts restricted by the copyright”.
(2)Copyright in a work is infringed by a person who without the licence of the copyright owner does, or authorises another to do, any of the acts restricted by the copyright.
In other words, I as the author of something have the right to provide it, or not. On the terms of my choosing. This is part of national and international agreements that just can't be arbitrarily modified. So, no, I do not believe that I have been given the "right" on the condition that the BL copies everything regardless. [further complication: my material originates in France and is uploaded to a .co.uk domain hosted in the United States... <grin>]
Actually, I don't care if they copy, it is the (re)serving that I don't appreciate.
The above, by the way, is from the Copyrights, Designs, and Patents Act 1988 - read it here (part 1, chapter 1, paragraph 16).
.
Now, you may believe that this is a big storm in a very small teacup (yes, it is), however it introduces interesting precedent. Either a government institution can copy and then publicly reproduce copyrighted content without giving a damn about said copyrights; or the government is quite willing to modify copyright laws to allow the above to happen. Funny, citizen copies something they shouldn't (as a copyright infringement), it's a whole different story...
-
-
-
Friday 5th April 2013 12:21 GMT Anonymous Coward
Frequency
How often are they going to trawl a site? Presumably they will want each update as a separate archive. Presumably Google merely update to the latest copy of a page.
It would be interesting to know what intelligent limits they are going to place on this. Keeping "everything" is not an option.
-
-
Friday 5th April 2013 17:15 GMT dssf
Re: UK-created websites in the non-.uk domain-- or blogpot.uk?
Re: UK-created websites in the non-.uk domain-- or blogpot.uk?
If your audience is in a certain country, and you use Blogger/Blogspot, and people in that country start accessing your content, even if it starts out as .com (say, in the USA), then that country's domain will be on a blogger/blogspot page of your audience. Google says this is to speed up access to the content. I don't completely buy it. If the site has a very small amount of dynamic content, and if most of it is text and small images, then probably javascript crap and browser settings conflicts will slow the page loading down more than it being. 6,780 miles from the reader.
However, IIRC, you an ask Google/Blogger/Blogspot to disallow per-country domain appending or whatever it is they call it.
Also, you can set the page to be readable only by invited or white-listed people or email addresses. So, even if you ARE in the UK, if you publish content under TOCs that state the readers are special, private, invited, non-public guests, then that might legally be enough to disallow a grab-bagging/copyright-vacuuming library system from scarfing up content its author intents to keep in limited, private circulation. Of course, it would get nasty if a TOCs-violating subscriber/invitee just screen scrapes and then republishes the content on a legit .uk page that is harveste-- umm, archived before a takedown notice could be issued.
Also, if an author laces his or her content with ungainly working making it offensive to the public and not satisfactory to introduce to schools, then would the government redact/black out such words if the remaining content is somehow worthy of archiving and representation to the public? How non-sensical would an author need to become to virtually guarantee the UK copyright czars back off? (IIRC, UK parody, libel, and freedom of speech concepts are different than in the USA and some other countries...)
-
-
Friday 5th April 2013 12:33 GMT Sandpit
@heyrick
"Did they think this through?"
Did you think this through? This is enabled by an act of parliment. It is protected by law. look up legal deposit legislation, it's been a round for a long time, this new provision extends that to non-print, something that was granted in 2003 but has only just gone through today 10 years later.
Yes, this has been thought through, a lot!
Andy why is this being done at all? It's for YOU, to make everything that is published (in whataever form) available to the public and forever.
-
-
Friday 5th April 2013 13:44 GMT Steve Knox
In what sense is it being done for YOU when YOU didn't ask for it and doesn't want it or want money to be spent on it?
Government agencies don't exist to do what you want. They exist to do what you need.
There is valid space for discussion on what actually is necessary, but personal whim is irrelevant to this discussion.
-
Friday 5th April 2013 16:13 GMT Anonymous Coward
"Government agencies don't exist to do what you want. They exist to do what you need."
I need them to not spend a single penny on breaching website T&Cs. If the website T&Cs say "thou shalt not copy my content", then they can just move on and go collect the next website.
Until the law is clear - and it's as clear as some very mixed-up mud at the moment, the way lawyers like it - about -where- a work is "published", then if this only applies to .uk domains, then surely the Leveson-inspired idiocy of websites being covered by press regulation only applies to .uk domains.
Oh, no, wait, parliament decided that their "press" regulation applies to anything, anywhere on the Net, if "aimed" at UK users (without defining "aimed"), then set some restrictions on the over-arching authority they decided to grant themselves.
-
Friday 5th April 2013 18:15 GMT Ken Hagan
"If the website T&Cs say "thou shalt not copy my content", then they can just move on and go collect the next website."
Does your browser do that? Mine doesn't. Mine ignores any and all such non-executable requests and instead makes a copy of the content for my perusal.
If you want such wishes to be executed, make them executable by not serving up the pages to everyone who passes by. *Many* websites do exactly that, and only dish content to paying customers. I imagine that *those* will not be appearing on the archive.
As another commenter explained, this is a natural extension of existing (and very long-standing) copyright law to a new medium.
-
Friday 5th April 2013 18:43 GMT Steve Knox
I need them to not spend a single penny on breaching website T&Cs.
No, you want them not to spend a single penny on breaching website T&Cs. This is demonstrable by the fact that they already have on at least one occasion, and yet you survived to post ex post facto. Had this point actually been a requirement, you would have expired upon the breach.
-
-
Friday 5th April 2013 23:26 GMT ForMe
"Government agencies don't exist to do what you want. They exist to do what you need."
On whose whim does a government agency decide what 'you' 'need'? Or which 'yous' to take into account? Or what constitutes a 'need?
A government agency only exists as a servant of the 'yous'? Unless it exists by divine providence.
-
Saturday 6th April 2013 04:36 GMT Steve Knox
On whose whim does a government agency decide what 'you' 'need'? Or which 'yous' to take into account? Or what constitutes a 'need?
Too long to go into here. Since this is the UK we're talking about, try looking up "constitutional monarchy" in wikipedia.
A government agency only exists as a servant of the 'yous'? Unless it exists by divine providence.
I do believe you've just profoundly illustrated the dichotomy of a constitutional monarchy...
-
Saturday 6th April 2013 14:51 GMT ForMe
"On whose whim does a government agency decide what 'you' 'need'? Or which 'yous' to take into account? Or what constitutes a 'need?"
"Too long to go into here. Since this is the UK we're talking about, try looking up "constitutional monarchy" in wikipedia."
Let me help: on the government agency's whim, with or without reference to the 'yous', whose servant it is.
(Not restricted to the UK, constitutional monarchies, orother particular bunch of government agents. Try looking up Animal Farm in Wikipedia, or a reliable reference source of your own choosing, and extend the "some are more equal than others" part to any bunch of self-important, determiners of rules to lord it over others).
See? Not at all too long, really.
-
-
-
-
-
-
This post has been deleted by its author
-
Friday 5th April 2013 14:32 GMT Dr Paul Taylor
practicalities
This is a Good Thing.
However, this is not really the same as the BL getting a copy of every book that is published, because before a book is published it is put in a form that the author and publisher regard as "finished".
Websites are never "finished". (Once upon a time it was de rigeur to have an "under construction" logo on one's site.)
So it would be nice to know more about the practicalities of this, so that the owners of websites who would like to regard their content as permanent can contribute in the most effective way to the national archive.
I couldn't find anything about this in my quick perusal of the relevant webpage.
http://pressandpolicy.bl.uk/Press-Releases/Click-to-save-the-nation-s-digital-memory-61b.aspx
-
Friday 5th April 2013 16:26 GMT billse10
Re: practicalities
This is not always a Good Thing, for the reasons you have given. They should not be copying websites as of today, and claiming that's the final version of the website, and website owners should be allowed to require them to update content on demand and remove all older versions of works that have been scraped in an older form. Oh, and if they are copying content for which you charge for access (they can demand password access it appears), and allowing other people to read it, unless there is a requirement for them to give you the full name and other details of each person so you can collect the money you are owed, that's just stupid.
There's more info at
http://www.bl.uk/aboutus/legaldeposit/introduction/index.html
Follow the links from there and you find this little gem:
"A publisher must also deliver a copy of any computer programs, tools, manuals and information—such as metadata, login details, and a means of removing individual DRM technical protection measures—that are necessary for using and preserving the publication."
So, if you build your website on top of an Oracle database, you have to give them a copy of Oracle? If you did graphics with Adobe Creative Suite you have to give them that?
Only a complete idiot could have approved these rules, Mr Vazey.
-
Friday 5th April 2013 16:37 GMT billse10
Re: practicalities
sorry for replying to myself but digging deeper in to their sh - sorry, site, this may be of interest to some:
http://www.bl.uk/aboutus/legaldeposit/websites/websites/faqswebmaster/index.html
Couple of key bits:
----
We use Heritrix and the crawler’s User Agent should identify itself as ‘bl.uk_lddc_bot’.
--
(b) it is made available to the public by a person and any of that person’s activities relating to the creation or the publication of the work take place within the United Kingdom.”
---
So, you work in the UK and build content for someone in the US, they can claim access to password-protected content hosted in the US, just because you do the work in the UK.
What bunch of idiots wrote these regulations? Names should be attached to this sort of garbage to they can be sued for stupidity.
-
Friday 5th April 2013 22:56 GMT heyrick
Re: practicalities
"(b) it is made available to the public by a person and any of that person’s activities relating to the creation or the publication of the work take place within the United Kingdom.”"
Interesting. It doesn't explicitly state that it will archive .uk domains, rather domains in the UK (which they refer to as "UK published material"). This should in theory exclude .uk domains hosted elsewhere, but may bring in .com and .org and .net and anything else "within the UK".
My password protection is for development/test software, screenplays I'm letting selected friends test-drive, and other stuff that I want accessible to specific people, but not public. They are not getting copies. Just because it exists on a computer doesn't mean it has been "published"; some of the things have been read by three people on the entire planet (and I'm one of those people). If they are going to take that approach, then everything that exists with words in it (even private stuff, doctors reports and such) can be considered a "publication" and will need to be collected and archived. Do you still think this is sane?
Continue to downvote if you want, but this whole concept IS ill-conceived and retarded.
-
Saturday 6th April 2013 00:20 GMT John Brown (no body)
Re: practicalities
Continue to downvote if you want, but this whole concept IS ill-conceived and retarded."
Some points.
1. The Patriot Act. THe USA thinks that gives it the right to access any and all data held anywhere inside the USA or anywhere outside the USA if they can contrive a link to the USA or any USA based corporate entity.
2. The archives will be "stored" in a library in the same way as all dead tree publications are (supposed to be), eg The British Library or, in the case of the USA, the Library of Congress (I think).
3. To access those archives, you need to physically visit the building (or maybe they'll do this via their website) to request access to specific "sites" in the archive. This is not some WayBackMachine that world+dog can just trawl for any old bits of info.
4. Anything behind a password access system is not "published". That's the equivalent of printing leaflets/pamphlets for a private members club.
-
Saturday 6th April 2013 00:50 GMT WaveSynthBeep
Re: practicalities
In law, anything made available [i]is[/i] published. In the old days you'd see an advert in the back of the local paper "Secrets of Reincarnation. Send 29p to PO Box blah, London N1 blah". It doesn't matter that you got back a handwritten badly-photocopied sheet, that's a publication. Same goes for something on a random website. Doesn't matter that three people have asked for it, it's 'made available to the public'.
If it's password protected, that's not a publication. It's not made available to the public, it's made available to your Aunty Joan only. Same goes for an internal document. It may be a memo from Bills Gates to a hundred thousand minions, but it's not made available to the public and thus is not a publication.
A grey area is hidden links. I can put a private document on my website and tell only you the URL. That's not a public document. But if your email is hacked and the URL is leaked so that crawlers pick it up, arguably that becomes a publication.
-
Sunday 7th April 2013 09:30 GMT Anonymous Coward
Re: practicalities
"hey are not getting copies. Just because it exists on a computer doesn't mean it has been "published"; some of the things have been read by three people on the entire planet (and I'm one of those people). If they are going to take that approach, then everything that exists with words in it (even private stuff, doctors reports and such) can be considered a "publication" and will need to be collected and archived".
The people who are proposing this ridiculous, half-baked nonsense almost certainly agree that Bradley Manning and Julian Assange committed dreadful crimes and must be severely punished.
Irony? They haven't even heard of it - and if they did they would think it was an industrial process.
-
-
Sunday 7th April 2013 09:28 GMT Anonymous Coward
Re: practicalities
"What bunch of idiots wrote these regulations? Names should be attached to this sort of garbage to they can be sued for stupidity".
No need - start with the Cabinet and move on down to the civil servants who attempt to put their ridiculous ideas into practice.
It doesn't occur to a politician that there might be any more to his superficial knee-jerk vote-grabbing reactions than he thinks of. That makes it easier to blame the people who have the thankless/impossible job of "delivery".
-
-
Friday 5th April 2013 17:27 GMT dssf
Re: practicalities... Smackticalities
Re: practicalities... Smackticalities
Such a demand could spark a mild insurrection. If my .com blogs that already -- thanks to google -- have other countries' .country domain extensions end up with .uk domains, and the GL/GL demands access to my screenplays, databases, scripts, manuscripts, drawings, doodlings, and more, it can KISS my ASS. This is the same rights-violating bullshit foisted on desperate authors by shysters/crooks who claimed to be verifying the true lineage of works presented for publication in the early 2000s. Such companies demanded this info so they could "protect themselves and be indemnified" from law suits in the event the material was in fact stolen or plagiarized.
But, for a government to demand the same, that deserves a smackdown. Thieves are EVERYWHERE, and once a government employee misappropriates content that is then re-misappropriated to his or her business buddies or creditors, you can bet your ass that the employee's government, YOUR government, would whip out a clause or codicil saying it is indemnified from the damages you suffer/suffered.
Not, the government should only demand the author signs or attests that he/she is the true, correct, creator or owner or authorized distributor.
I suppose I'd have a HELLUVA frackin' hard time living in the UK if I started encountering laws and precepts and proclamations that laid non-worked-for claim over *my* inventions or ideas. How long before people say, "Theft of my work by ANYONE or ANY AGENCY will result in unpredictable behaivor on my part"?
OK, maybe I'm overreacting. But, it sure is nice to be under the current copright system where I am -- wait -- that is changing, too... Ohhh nohsssss...
-
-
-
Saturday 6th April 2013 09:25 GMT mark l 2
i fail to see how storing 1000s of internet shopping website on UK domains in the library is going to be benificial to the public in 50 or 100 years time. Take ebay for example just indexing the UK site once would fill up gigs of data which a week later would all be out of date as new products would be added.
-
Saturday 6th April 2013 10:12 GMT Anonymous Coward
.htaccess
deny from 194.66.224.0/20
I caught this IP range ignoring robots.txt ages ago, so I just blocked them by .htaccess at first, then firewalled the CIDR
inetnum: 194.66.224.0 - 194.66.239.255
netname: BLIB-2
descr: British Library
country: GB
admin-c: BT544-RIPE
tech-c: BT544-RIPE
remarks: rev-srv: dns1.bl.uk
remarks: rev-srv: dns2.bl.uk
remarks: rev-srv: ns2.ja.net
remarks: rev-srv: ns0.ulcc.ac.uk
status: ASSIGNED PA
mnt-by: JANET-HOSTMASTER
source: RIPE # Filtered
remarks: rev-srv attribute deprecated by RIPE NCC on 02/09/2009
-
Sunday 7th April 2013 10:35 GMT william_7
the uk web archive is here:
www.webarchive.org.uk/
previously to get your site added, you had to fill in a form to give permission.
I would think they would honour a removal request.
archive.org seems to not be really crawling regularly or very much,
robots directives (robots.txt, http header, html ) will stop webcrawlers.
It would be a nice feature of uk webarchive for the website operator that you could download the WARC (or other standard archive format) of the website snapshot.