Crack team realise pipe dream huh?
British Library sprinkles digital dust on dusty newsprint
The British Library and its commercial partner brightsolid opened up a pay-per-view online archive of newspapers today, after a crack team scanned 4 million searchable pages that mainly date from out-of-copyright papers published in the 19th century. It was confirmed in May 2010 that the British Library and brightsolid – which …
-
-
Tuesday 29th November 2011 13:08 GMT Tel Starr
How much?
The price is over three times as much as the american equivalent - www.newsinhistory.com - which does not have 'credit' restrictions. (I've used it a few times in the past)
Plus the OCR has done its usual job of garbling things from the looks of it.
Wish someone would do a 'guttenburg' with all the old newspapers. Which is what the BL should be doing anyway.
-
-
Tuesday 29th November 2011 13:40 GMT Justicesays
Names changed to protect the innocent?
So,what about the "right to be forgotten"? Is a paywall sufficient divide between publicly availiable and research only archive material?
When is an credit/employee/CBR check going to include as standard a check for any references to you (or people with the same name that might be you) in past newspaper articles?
-
Tuesday 29th November 2011 15:00 GMT Daggersedge
Why is there a subscription fee?
Why do taxpayers have to pay a subscription fee? After all, taxpayers have already been supporting the British Library and its newspaper archive for years.
I wonder whether this is just a scheme to enrich Brightsolid (or, rather, its owner, DC Thomson). The subscription price is high, but many people seem to think it reasonable. In a few years, though, it might rocket. What happens is DC Thomson is taken over by a foreign company? What if it sells Brightsolid to some foreign company?
I also wonder what will happen to the physical archive. Will some bean counter decided that all the parts that have been digitised no longer need maintaining? Out of sight in Boston Spa and out of mind.
Will all the articles be digitised - even the politically-incorrect ones? Perhaps some future regime, company, etc will decide to 'disappear' the politically-incorrect articles. It might even be happening now.
-
Tuesday 29th November 2011 15:02 GMT Brian Cockburn
OCR not particularly good IMHO
Here's an example "“ ne t- Cookbnrn, 'Eq., and niece of thVry ev Si'lln Ooebr Ba In- Camp, AdrianiopleRodn a ;ofcoeont 1h ta u himo rP James Lindsay, of thea Cbtrn ieta e 1511B~ ~ ~ idsy b4,,,e r, ' - Deatet ut Mr Alexander Lidsy.brwe, ba atDalkt At Gallipoli, on the 14t ... ?". So If one wanted to look up one's ancestors to see what they were up to one would have to include as search terms every possible OCR cock-up. In the example above I happened to find that one OCR version of "Cockburn" is "Cookbnrn" and searched on that, which turned up nine pages each of twelve references (108). This may well limit its usefulness at the moment.
-
Monday 5th December 2011 09:08 GMT Smithy
OCR That Works
Initially I thought this was a good idea but not to the point of profit that makes this information unaffordable to younger or poorer families. I.e. information should be free at the point of use and should be on a not-for-profit basis where charges are set to cover costs only and should be free to view after the cost of scanning/copyright of the newspaper is covered by viewers.
Asking viewers to correct a mass of errors without a discount or payment is a cheek to save money. At least with Wikipedia it is free via fund-raising which makes adding content worthwhile for this access. Also, if their OCR is inaccurate, that may mean they used a single OCR engine rather than multiple ones for different newspapers or quality types which would hugely improve accuracy.