User topics

Article topics

Log in Sign up

British Library sprinkles digital dust on dusty newsprint

The British Library and its commercial partner brightsolid opened up a pay-per-view online archive of newspapers today, after a crack team scanned 4 million searchable pages that mainly date from out-of-copyright papers published in the 19th century. It was confirmed in May 2010 that the British Library and brightsolid – which …

COMMENTS

House rules Send corrections

This topic is closed for new posts.

Tuesday 29th November 2011 12:23 GMT RobD69

Crack team realise pipe dream huh?

0 1
Tuesday 29th November 2011 12:28 GMT jolly

Registration

Can't even view example pages without registering <sigh/> (let the data harvesting begin)

4 0
Tuesday 29th November 2011 12:33 GMT Jim 59

Great

...but nothing from the last 60 years?

0 0
1. Tuesday 29th November 2011 16:11 GMT Hardcastle the ancient
  
  The adverts and some photos will still be copyright
  
  0 0
Tuesday 29th November 2011 13:08 GMT Tel Starr

How much?

The price is over three times as much as the american equivalent - www.newsinhistory.com - which does not have 'credit' restrictions. (I've used it a few times in the past)

Plus the OCR has done its usual job of garbling things from the looks of it.

Wish someone would do a 'guttenburg' with all the old newspapers. Which is what the BL should be doing anyway.

1 0
Tuesday 29th November 2011 13:35 GMT Chris 3

Damn

I wish there was some educational discount. This would be cracking for kids in school researching history projects.

1 0
Tuesday 29th November 2011 13:40 GMT Justicesays

Names changed to protect the innocent?

So,what about the "right to be forgotten"? Is a paywall sufficient divide between publicly availiable and research only archive material?

When is an credit/employee/CBR check going to include as standard a check for any references to you (or people with the same name that might be you) in past newspaper articles?

0 1
Tuesday 29th November 2011 15:00 GMT Daggersedge

Why is there a subscription fee?

Why do taxpayers have to pay a subscription fee? After all, taxpayers have already been supporting the British Library and its newspaper archive for years.

I wonder whether this is just a scheme to enrich Brightsolid (or, rather, its owner, DC Thomson). The subscription price is high, but many people seem to think it reasonable. In a few years, though, it might rocket. What happens is DC Thomson is taken over by a foreign company? What if it sells Brightsolid to some foreign company?

I also wonder what will happen to the physical archive. Will some bean counter decided that all the parts that have been digitised no longer need maintaining? Out of sight in Boston Spa and out of mind.

Will all the articles be digitised - even the politically-incorrect ones? Perhaps some future regime, company, etc will decide to 'disappear' the politically-incorrect articles. It might even be happening now.

1 0
Tuesday 29th November 2011 15:02 GMT Brian Cockburn

OCR not particularly good IMHO

Here's an example "“ ne t- Cookbnrn, 'Eq., and niece of thVry ev Si'lln Ooebr Ba In- Camp, AdrianiopleRodn a ;ofcoeont 1h ta u himo rP James Lindsay, of thea Cbtrn ieta e 1511B~ ~ ~ idsy b4,,,e r, ' - Deatet ut Mr Alexander Lidsy.brwe, ba atDalkt At Gallipoli, on the 14t ... ?". So If one wanted to look up one's ancestors to see what they were up to one would have to include as search terms every possible OCR cock-up. In the example above I happened to find that one OCR version of "Cockburn" is "Cookbnrn" and searched on that, which turned up nine pages each of twelve references (108). This may well limit its usefulness at the moment.

0 0
1. Tuesday 29th November 2011 16:11 GMT Hardcastle the ancient
  
  I looked up my home town, and hardly any of the text in the stories was OCR';d correctly.
  
  I'd have thought they would twin OCR with spell checking, but fairly obviously have not.
  
  0 0
Tuesday 29th November 2011 23:02 GMT Anonymous Coward

WTF

do I have to pay for access, when I already pay taxes which pay for the upkeep of the British Library?

0 0
Wednesday 30th November 2011 00:20 GMT Anonymous Coward

One thing Aus has got right

http://trove.nla.gov.au/ndp/del/home

And they allow users to correct OCR errors.

0 0
Monday 5th December 2011 09:08 GMT Smithy

OCR That Works

Initially I thought this was a good idea but not to the point of profit that makes this information unaffordable to younger or poorer families. I.e. information should be free at the point of use and should be on a not-for-profit basis where charges are set to cover costs only and should be free to view after the cost of scanning/copyright of the newspaper is covered by viewers.

Asking viewers to correct a mass of errors without a discount or payment is a cheek to save money. At least with Wikipedia it is free via fund-raising which makes adding content worthwhile for this access. Also, if their OCR is inaccurate, that may mean they used a single OCR engine rather than multiple ones for different newspapers or quality types which would hugely improve accuracy.

0 0

This topic is closed for new posts.

The Register Biting the hand that feeds IT

About Us

Our Websites

Your Privacy

Situation Publishing

Copyright. All rights reserved © 1998–2024