back to article British Library sprinkles digital dust on dusty newsprint

The British Library and its commercial partner brightsolid opened up a pay-per-view online archive of newspapers today, after a crack team scanned 4 million searchable pages that mainly date from out-of-copyright papers published in the 19th century. It was confirmed in May 2010 that the British Library and brightsolid – which …

COMMENTS

This topic is closed for new posts.
  1. RobD69

    Crack team realise pipe dream huh?

  2. jolly
    Thumb Down

    Registration

    Can't even view example pages without registering <sigh/> (let the data harvesting begin)

  3. Jim 59

    Great

    ...but nothing from the last 60 years?

    1. Hardcastle the ancient

      The adverts and some photos will still be copyright

  4. Tel Starr

    How much?

    The price is over three times as much as the american equivalent - www.newsinhistory.com - which does not have 'credit' restrictions. (I've used it a few times in the past)

    Plus the OCR has done its usual job of garbling things from the looks of it.

    Wish someone would do a 'guttenburg' with all the old newspapers. Which is what the BL should be doing anyway.

  5. Chris 3

    Damn

    I wish there was some educational discount. This would be cracking for kids in school researching history projects.

  6. Justicesays
    Devil

    Names changed to protect the innocent?

    So,what about the "right to be forgotten"? Is a paywall sufficient divide between publicly availiable and research only archive material?

    When is an credit/employee/CBR check going to include as standard a check for any references to you (or people with the same name that might be you) in past newspaper articles?

  7. Daggersedge
    FAIL

    Why is there a subscription fee?

    Why do taxpayers have to pay a subscription fee? After all, taxpayers have already been supporting the British Library and its newspaper archive for years.

    I wonder whether this is just a scheme to enrich Brightsolid (or, rather, its owner, DC Thomson). The subscription price is high, but many people seem to think it reasonable. In a few years, though, it might rocket. What happens is DC Thomson is taken over by a foreign company? What if it sells Brightsolid to some foreign company?

    I also wonder what will happen to the physical archive. Will some bean counter decided that all the parts that have been digitised no longer need maintaining? Out of sight in Boston Spa and out of mind.

    Will all the articles be digitised - even the politically-incorrect ones? Perhaps some future regime, company, etc will decide to 'disappear' the politically-incorrect articles. It might even be happening now.

  8. Brian Cockburn
    FAIL

    OCR not particularly good IMHO

    Here's an example "“ ne t- Cookbnrn, 'Eq., and niece of thVry ev Si'lln Ooebr Ba In- Camp, AdrianiopleRodn a ;ofcoeont 1h ta u himo rP James Lindsay, of thea Cbtrn ieta e 1511B~ ~ ~ idsy b4,,,e r, ' - Deatet ut Mr Alexander Lidsy.brwe, ba atDalkt At Gallipoli, on the 14t ... ?". So If one wanted to look up one's ancestors to see what they were up to one would have to include as search terms every possible OCR cock-up. In the example above I happened to find that one OCR version of "Cockburn" is "Cookbnrn" and searched on that, which turned up nine pages each of twelve references (108). This may well limit its usefulness at the moment.

    1. Hardcastle the ancient
      FAIL

      I looked up my home town, and hardly any of the text in the stories was OCR';d correctly.

      I'd have thought they would twin OCR with spell checking, but fairly obviously have not.

  9. Anonymous Coward
    Anonymous Coward

    WTF

    do I have to pay for access, when I already pay taxes which pay for the upkeep of the British Library?

  10. Anonymous Coward
    Thumb Up

    One thing Aus has got right

    http://trove.nla.gov.au/ndp/del/home

    And they allow users to correct OCR errors.

  11. Smithy
    FAIL

    OCR That Works

    Initially I thought this was a good idea but not to the point of profit that makes this information unaffordable to younger or poorer families. I.e. information should be free at the point of use and should be on a not-for-profit basis where charges are set to cover costs only and should be free to view after the cost of scanning/copyright of the newspaper is covered by viewers.

    Asking viewers to correct a mass of errors without a discount or payment is a cheek to save money. At least with Wikipedia it is free via fund-raising which makes adding content worthwhile for this access. Also, if their OCR is inaccurate, that may mean they used a single OCR engine rather than multiple ones for different newspapers or quality types which would hugely improve accuracy.

This topic is closed for new posts.