back to article Southampton Uni shows way to a truly open web

Southampton is pushing to be the go-to place for expertise on linked data in the UK, and researchers at its main university launched a site earlier this month containing no less than 21 "non-confidential" datasets that underline that semantic web desire. The University of Southampton (UoS) is one of the first academic …

COMMENTS

This topic is closed for new posts.
  1. Derek Jones
    FAIL

    Bus routes and vending machines

    Drilling down to the actual data I find information on bus routes, vending machines and a short list of other stuff already available on the Southampton web site, hardly worth a fawning two page article on El Reg. Come on guys where is your customary cynicism?

    1. Dr. Mouse

      Erm...

      "hardly worth a fawning two page article on El Reg"

      I count 3 pages.

    2. Jerome 0

      Potential

      And if you wanted to cross-reference these bus timetables against, say, course timetables, that would be easy already, right? Because the information is already on the web? Come on, it doesn't take a genius to work out how this stuff is going to change the web completely.

  2. Anonymous Coward
    Grenade

    "What we need is an information shaman,"

    No, actually we don't, we really don't.

    1. amanfromMars 1 Silver badge
      Boffin

      Future Information Linking for Brave New World Projects

      ""What we need is an information shaman," .... No, actually we don't, we really don't." ... enigmatix Posted Tuesday 22nd March 2011 12:13 GMT

      Err..... Yes, actually you do, you really do. And ideally providing virgin core source lode for intelligent conversion into semantic product placements for sublime use in future programs and live linked future data projects.

      Are Southampton into Future Reality Sets via Linked Novel MetaDataBase Stations/Core Virgin Source Mines which could even be just Open Minds in Networks InterNetworking JOINT Applications ...... Joint Operations Investigating Network Transparency as a FailSafe Security Protocols rendering Secrecy redundant and unnecessary as a Future Control Lever.

  3. heyrick Silver badge

    "Our screens are all A4 landscape"

    Huh? His, maybe, but I'd have thought that most people's displays were 16:9.

    I believe PDF was designed to replace Postscript (which in the early days was a pain to use) in an HTML 2.0-like world where styling didn't exist and the mainstream software could display the same content radically differently (and, plus, remember frames? how do you print a framed site?).

    For its original purpose, PDF has done well. It has become icky and bloated, but to insult a technology because nobody has devised something both better and universal shows a certain degree of stupidity.

    But then we are talking academics getting excited over datasets. While I can understand the value of this, Joe Average just wants something to look at. Like, for example, Nuclear (poop) Boy instead of a string (<cough> array) of readings in what would appear to be an obscure format.

    1. This post has been deleted by its author

    2. Keith Langmead

      What about how people ACTUALLY read?

      Completely agree, he also seems to miss the fact that many people use PDF's as a way to send an electronic document in a fixed form, eg a quote, invoice, contract etc, so you can be reasonably sure that it hasn't been altered (yes I know there are ways to do it, but most users wouldn't know them). In terms of portrait / landscape I can kind of see where he's coming from, however I think he's missing how people actually read. A column of text is far easier to read and scan through, than a wide long line of text, that's why after all many documents in A4 portrait have two columns.

      1. Michael Nidd

        Literal content is important

        Not just sure that it hasn't been altered, but also confident that all the people who read it have seen the same thing. If you are sending an invoice in Euros, you don't want to find out that one reader had his viewer configured to automatically convert that to Yen at the current exchange rate, because your quote is for Euros. I know that sort of configuration would count as a bug, but it's a lot less likely when you send a PDF. For some things, what you actually said is more important than what you meant, and fewer filters/conversions is better.

        1. Tom 7

          But if you send me a PDF

          I can change the price in it anyway. You cant prove that what I got is not what you sent unless you methods that are above and beyond PDF - so using PDF is just way of sending money to adobe, it helps neither of us.

          As someone said PDF the new fax - only secure and reliable for the gullible. PDF turns a 21C computer into 19th paper. A bit like using your ferarri to open you stable doors to go for a ride on your horse. What the man is suggesting is perhaps taking the combustion engine out of the ferarri and using it to power a matter transporter for your data. All PDF is is a larger and larger filing cabinet to stick your data in before tying it to your horses arse.

          1. heyrick Silver badge

            @ Tom 7

            All this hatred for PDFs based upon odd uses of it.

            Okay. On my mobile phone, I have a bunch of component/processor datasheets and other tech docs. They're all PDFs. They load up in the reader app, and they look exactly like they do on the PC.

            Would those slagging off PDF care to come up with a multi-system compatible, easy (hence single-file) version of representing this sort of information? Should we do back to pure text files with lack of styles and formatting and laughably bad ASCII art circuit diagrams?

            There are some things PDF is useful for...

      2. Allan George Dyer
        Paris Hilton

        PDF is the new Fax

        "there are ways to do it, but most users wouldn't know them"

        Hands up who has, when they've been faxed a document to sign, pasted in their pre-scanned signature and sent it back through the fax server?

        People may be "reasonably sure that it hasn't been altered", but that belief is based on ignorance.

    3. ChrisC Silver badge

      Quite

      Mixed 16:10 and 4:3 displays here, definitely no 99:70 ones to be seen...

      "PDF is a brilliant way to simulate A4 or portrait views."

      And if you change your page orientation to landscape before saving to PDF, it's a pretty good way to simulate those too... Blaming PDF for the dearth of landscape-formatted documentation is, even by the usual standards of academics infected by outoftouchwiththerealworlditis, really rather dumb.

      There's a place for both styles of formatting - viewing some types of data works really well when it's spread laterally across a widescreen display, other data is unmanageable unless constrained into a narrower band. e.g. I see no technical reason why novels couldn't be printed in landscape format, but from a useability point of view it'd be hideous. Just because we can reformat data to fill the available space doesn't mean we necessarily should...

    4. stewski

      Joe average wants?

      Joe average probably wants his work calendar in a an electronic format that can actually be processed by machines. Yeah a print is hand and not much wrong on the pdf front there, want to actually use some of that data however...

  4. unitron
    Headmaster

    Well said, sir, well said...

    " "PDF is an embarrassment to our species," Gutteridge says of Adobe Software's once proprietary but now open standard for document exchange." *

    For that I can almost forgive him for say "less than 1000" instead of "fewer".

    I wonder if he knows that data 'R' plural?

    I'm glad to see that TB-L does. Must have had a proper education.

    * I've cursed PDF so often and so vehemently you'd think it was a Microsoft product.

    1. The Other Steve
      Happy

      It's always the way, isn't it ?

      " For that I can almost forgive him for say "less than 1000" instead of "fewer". "

      There is almost certainly some kind of fundamental law governing the increased possibility of buggering up one's grammar while bashing someone else with it.

      1. Chris Miller

        There is: Muphry’s Law

        (also known as Hartman's Law of Prescriptivist Retaliation): "any article or statement about correct grammar, punctuation, or spelling is bound to contain at least one eror".

        1. Code Monkey

          "eror"

          I hop that was deliberatte!

  5. James Downes

    What about the BBC?

    This isn't just an academic exercise. There are some real world uses starting to appear.

    One example is the BBC who are using linked data in real live situations (e.g. 2010 World Cup site) see this blog post http://www.bbc.co.uk/blogs/bbcinternet/2010/07/bbc_world_cup_2010_dynamic_sem.html and are starting to move more and more to a dynamic publishing framework that is built on top of a "news" ontology.

    In this way "linked news" can be based on the ontology, rather than any predetermined relationships that the content authors may define.

  6. DrXym

    Freedom box

    Eben Moglen suggests people should be able to maintain their info in an encrypted container hosted by a "freedom box". That could mean a physical device or it could mean a piece of software running on your own PC or running on a trusted host of your choice. You could grant / deny access to the box on a granular level and your data would be distributed (encrypted of course) via P2P to make it easier to find.

    It's not a bad idea at all but not one without issues. Biggest issue is that this is Eben Moglen and the FSF proposing it. This means the concept is practically DOA because anything the FSF touches gets bogged down in polemics. It needs somebody, preferably a startup to embrace the idea, monetize it and the pragmatism to see it to implementation. Probably the closest thing at the moment to a freedom box would be the Diaspora (also Moglen inspired) but whether it can compete against Facebook is a massive question.

    1. Spearchucker Jones

      Simpler than that...

      Information can be tagged. Those same tags can be used to enforce security. All you're doing is extending the model to include protective markings and clearance levels. For example:

      Object = securable item.

      Subject = something that consumes (view/print/edit) objects.

      An object is tagged* with a protective marking, such as "No marking", "Private" or "Secret."

      A subject is tagged* with a clearance level, such as "No clearance", "Friend" or "Family."

      * Tagging of objects and subejcts is done by the owner of the object.

      Lastly a policy is created that limits access to any object marked as "secret" to subjects cleared to "Family."

      The policy travels with the object in a single file. The object is encrypted, and the policy is readable. A central or federated server enforces policy. Much like this app does: http://wwww.wittenburg.co.uk/interact/

    2. Colin Tree

      use your router/modem

      We should own our own data.

      We should be able to control who accesses or links to our data.

      A new type of device is needed. A secure section of our home router or modem could contain a publicly accessible store.

      Never fill in a form with your various personal data again, allow a site to access whichever limited data from your data-set. Change your personal information once only, it is always current everywhere. Stop permission for whichever sites you want when you want.

      We might turn off our PCs, laptops, phones, pads from time to time, but our routers generally stay on, except for power outages or maintenance.

      I have been pushing this barrow for many years now.

      People or corporations can access our data when we want them to, or if there is a perceived value for how much we sell it to them.

      You have a log of who access what data when.

      It could end up being our storage in our part of our cloud. We own it, we control it. It might contain all of our personal data collected and created through our lifetime. Expand it as required, backup as required.

      Facebook just got made open... so did everything else

      ....

      Confluence

      1. KitD

        @Colin Tree

        Seeing as you can run your own OpenID provider behind your router, it shouldn't take much to move it onto the router itself.

  7. Sir Runcible Spoon

    Sir

    This sounds like the path to hell if you ask me.

  8. Fred Flintstone Gold badge

    I have seen real world apps

    I have seen real world semantic web apps - there is a company in Switzerland that has stuck manyears into developing the very idea. The key to handling this sort of data is deciding first what information you actually need - without a use case to define the shape of the specific needles you seek you'll be simply stuck looking at hay.

    BTW, re PDF - there is another reason why portrait persists: our own physical limitations. Our eyes have a limited width from which we read. If you had a text landscape you'd tire pretty soon when reading..

    1. James Hughes 1

      Never mind the width

      I agree - trying to read a Kindle landscape feels really wrong - put it back to portrait and all is well.

  9. DPWDC

    Reg, take note ;-)

    "But our screens are all A4 landscape yet there is this stupid insistence that the portrait way is still developed. It's a legacy thing and we haven't got around to getting rid of it yet"

    While his tape measure is slightly off - I do agree with the "stupid insistence that the portrait way is still being developed" - I say this looking at the white sliver down the centre of the screen that is the reg website!

  10. Rebecca 1
    Stop

    Of course the worst PDFs of all are...

    ...the ones with 2 columns of text!

    Who thinks that is a good idea? I'm reading it on a screen too small to display a full A4 page at a readable size, so halfway through each page I get to scroll back up again and start again. PDFs designed to be read onscreen (and which are created with linked chapters and text rather than images of text) are wonderful in comparison.

    1. Pigeon

      At least you can read it

      Why not read it in a CSV form, which this chap suggests is a replacement for PDF...

      1. Anonymous Coward
        Anonymous Coward

        RE: At least you can read it

        He's citing an example where CSV is used to exchange data between systems. The idea being, once you have the CSV you can format it any way you please when you display it. You can't do this with PDF as they're pre-formatted when the document is written.

  11. Anonymous Coward
    Anonymous Coward

    Horrors

    PDF is horrible because it is, at its simplest, a vector-based picture of a printed page, from which it's almost impossible to extract useful information. So Adobe added a pile of extensions to pdf (and also, even more stupidly, to ttf fonts) to let you try to reverse engineer the original information out of the print representation. This is unnecessarily complex and still doesn't work reliably for unicode text using complex scripts.

    RDF (specifically RDF/XML) is horrible (and consequently not widely used) because it is almost sadistically complicated. As the article says, the information model it encodes is simply triples of URIs, yet it provides myriad ways to describe this described by a not-quite-finished specification. (I've had the misfortune of writing an RDF/XML parser so I've been bitten by all the corner cases that were brushed under the carpet without resolution in the rush to get the spec. published.)

  12. Adrian Walker

    Executable English over RDF

    The problem:

    "The skills of taking a data system and understanding how to map it into RDF so that it can be useful is bloody hard. It requires someone who can see the data, understand the structure, understand how it will be used and then map between two spaces in their head."

    A solution:

    Document what you want to do in Executable English, then _run_ it. Other people will be able to read what you did, and also get English explanations of the results of running it.

    Here's an example that you can view, run and change, using a browser:

    www.reengineeringllc.com/demo_agents/RDFQueryLangComparison1.agent

  13. Christopher utteridge
    Megaphone

    Why I hate PDF

    To be fair most things suck in some ways, but I've spent the last 10 years working to give people open access to research papers, and I hate the fact that PDF is the defacto standard. We should be able to read them easily on phones, ipads and kindles and we can't. But there's top men working on it http://blogs.ch.cam.ac.uk/pmr/2011/03/11/scholarly-html-theme-and-presentations-today/

    Not everyone can code; and you need a coder, or other unusual skills to get the value from open data, but once the app (or whatever) is produced, we can all benefit. An information shamen is one of those rare people who can (a) understand complex diverse data sources and (b) is willing to build tools to help the other 99.9% of us benefit from them.

    Some of our students produced this tool from the open data we made available:

    http://opendatamap.ecs.soton.ac.uk/

    (requires a recent browser!)

    1. David Hicks
      Linux

      I can read pdf on my phone...

      with evince. It's nice having an N900.

      On the main content - so you finally bought into the semantic web idea then?

      I remember that was the 'next big thing' back when I was a lowly student and you were moaning about how half the new intake didn't know how to use ftp from the command line any more...

      Plus ca change etc.

      Genuinely surprised to see your face peering out at me from the front page this morning though, nearly spat-up coffee all over the keyboard!

      D.

  14. andy 45
    Thumb Down

    What a numpty

    "PDF is an embarrassment to our species"

    A completely blinkered view which really doesnt deserve arguing the case against....

    ...but briefly, speaking as a designer, there are loads of reasons why I love pdf as well as things that annoy me -- the same with design for other media (web/print etc).

    That quote must have been said with no thought whatsoever about the words which were coming out of his mouth

    1. Christopher utteridge

      Accusations of numptiness

      nah, just in the context of scholarly communication and of communicating data. Getting data as a table in PDF really really sucks.

      Using PDF in the current era of many sizes of viewport is just plain daft.

      1. Rob
        Alert

        Good point, wrong phrasing prehaps

        I think what most of the posters are getting at is that your comment was a tad to general and sweeping about PDFs. They have a use and also being from a designer background a bloody good use.

        But I can see why when you recieve a PDF it is generally uselss and a pain in the proverbeal to you.

        Plus I have also made some good money out of organisations designing and building PDFs so don't dis an element of my trade too quickly, same goes for your semantic data. Don't be surprised if it's not received to warmly by people who can see their livelihoods slipping away from a new standard being employed. I would definately stick to your comments about letting the politicians hammer it out but I would be very good friends with a Marketing bod if you want it to lift off.

        1. Pigeon

          a title

          Er, well *Page Description Language* kind of says it all. PDF is not a data extraction format. If it doesn't fit your page, then get another PDF, generated from the actual data, using the application. Maybe numpty was interviewed in the pub. I see scrollbars on my ElReg screen. Does that mean that HTML doesn't work?

  15. TRT

    Data

    I've heard a whisper that the Wellcome Foundation now requires it's grant funded researchers to publish their raw data on the internet. Not that it'd be of much use to the general public, but it's the same kind of thinking. I'm concerned over data integrity in the cut throat world of multi-million pound research grants where reputation is 95% of everything.

  16. amanfromMars 1 Silver badge

    For Unhindered and RapidE Progress

    "Ultimately, we provide the tools. Let the politicians do the arguments.".... Christopher Gutteridge

    The wiser semantic web developer will completely avoid the politician, realising that they have no valid lead input to offer themselves into future linked data programming. They are useful tools though for leads which linked data sets provide, so they are not completely useless. In fact, there is probably good enough evidence to suggest that they are easily groomed to be quite convenient servants.

    1. Christopher utteridge

      Easy to Groom

      Sir, I have been accused of many things, being easy to groom is generally not one of them.

      1. The Other Steve
        Heart

        That's only because

        No one has seen the chat logs that I'm keeping for later.

        xxxx

        MILFChix0r69

      2. amanfromMars 1 Silver badge

        Sugar Daddies Rule..... ? :-)

        Quite so, it is indeed an unusual semantic knack to perform better than just well, and can be difficult to probably impossible without the competent exercise of a particular and peculiar knowledge and provision of sticky sweet bait.

  17. The Other Steve
    Thumb Up

    Ah, the unfettered idealism of youth

    Information, lacking agency - and contrary to the popular (and annoyingly resistant to logic) meme - doesn't want to be free.

    Some of the data in those silos is in silos because it has value to the people who collected it, and they certainly don't want it to be free.

    So I would hope that there is some parallel group working on implementing a complementary micro-transaction framework so that on the day when the big switch is thrown on the brave new semantic web those of us who believe in swapping money for things of value are able to play.

    Otherwise simply wishing for all the information to be free is like asking santa for a magic kitten that shits fairy dust.

    Still, good luck to them. I can't wait to have another standard to choose from.

    BTW, if anyone is actually in possession of a magic kitten that shits fairy dust and is willing to swap it for one that vomits what appear to be the remains of dead snails, do get in touch.

    1. Rob

      Offer

      Best I can do is one that vomits undigested cat biscuits and leaves the liver of eaten creatures on your kitchen floor. With enough prodding can also perform a 'double-tap' on your smallest childs head.

    2. stewski

      This was not a good comment

      Idealism yes but I'm not sure TimBL is the poster boy for "youth" these days :-)

      Why I say your comment is poor is this, what semantic web switch are you talking about?

      Your facebook data is not about to be magically represented in rdf made open and linked.

      The Semantic Web is an effort to standardise the publishing of interlinked data in the same vein that the original web standardised the publishing of interlinked documents. You didn't have to publish web pages back then and no one forces people to publish linked data now.

      Now the open access ideal for science publication/data is related but different, if your experiment/science isn't repeatable and observable it's not really following the scientific method anyway, so the idea that research data and articles should be locked away and unavailable (or behind some pay wall) is not really taking science forward. Allowing research to be open, linked and widely accessible seems likely to change the quality of research science.

      As for other data sets like government and organisational (the OS stuff that was opened) It is simply OUR data in the first place, why lock it up? The one time Labour/Tory/Liberals have agreed in any meaningful way about tech is when they listen to Sir Tim BL about data.

  18. Studley
    Stop

    *insert PDF containing glass house specifications*

    Search Google for:

    site:www.soton.ac.uk inurl:pdf

    ...4,350 results found

  19. J 3
    Headmaster

    Er... Oook?

    "a detail-obsessed librarian who's middle name is pedant"

    Whose name is pedant again? I must have made a mistake here too, for ironic value. Unavoidable, apparently.

  20. J. Cook Silver badge
    Go

    Ah, PDF hate...

    PDF is merely Postscript with a nice wrapper around it. The original intent was to provide a document format that people could create a print-ready document on one platform (Say, a mac with Pagemaker or some other desktop publishing program installed) and open it on, say, a PC or unix workstation for later viewing/printing. (although the latter still requires a bit of mucking about with the workstation's print drivers to make it come out right.)

    It's somewhat handy for technical manuals, print-ready artwork for proofing, and when Adobe strapped on the forms capabilities for things like tax forms. It's not all that useful for ebooks, especially graphics heavy ebooks, at least on reader devices.

    My own personal hate is the $&*#^! idiots who make a fill-able form, but disallow printing or saving a copy of the form data. WHAT IS THE POINT, PEOPLE.

    1. Anonymous Coward
      Coat

      PDF hate

      > PDF is merely Postscript with a nice wrapper around it.

      If only it were. Sadly it isn't quite: lack of a full programming capability being one of the bigger omissions.

      > My own personal hate is the $&*#^! idiots who make a fill-able form, but disallow printing or saving a copy of the form data. WHAT IS THE POINT, PEOPLE.

      This is usually because the person creating the form hasn't realised they bought the wrong version of Acrobat. So the form works for them and they never bother to test on a standard Reader.

      /Mines the one with the blue cookbook in the pocket

      1. Allan George Dyer
        Paris Hilton

        Another Gotcha...

        is creating an English-only document on a Chinese* copy of Acrobat. Try opening it on a reader without Chinese fonts and you just get blank pages, even though there are no Chinese characters in the document. Again, the document works for the creator, but not for others.

        * Probably also true for other non-Latin languages, but I haven't tested.

  21. Peter Gathercole Silver badge

    Not convinced

    I understand making data accessible. And I also understand that having relationships between data items makes a lot of sense. But I really doubt URIs are the way to do it.

    My concern is that by using URIs (at least the way I understand them to work) will effectively hardcode location and shape information into the datasets in the same way that a schema does in a relational database, but with a fixed location. Unless someone can indicate otherwise, I believe that this makes the data almost completely non-portable.

    OK, in a web-centric world this may make sense, but unless someone puts some clever caching technique, it means that you will only be able to use the data when you ate connected.

    Sometimes you want to take a fixed snapshot, or make sure that the data in your thesis does not change between you writing it, and it being read by your moderator.

    I'm all for making data easily useable (god knows I've spent enough time massaging data over the years), but to tie it to the Internet should obviously be stupid to anyone unless they are from the facebook generation.

    I'm also not certain that it is reasonable for the person who originally structures and creates the relations in the data to be able to anticipate how that data will need to be used in the future. Today's data mining systems are all about making assiciations between data-sets that were never imagined when the data was recorded.

    Something like an encapsulated schema in the data set would be a great advantage, but you would have to have someway of normalising not only the data sets, but the schema's to allow automated queries.

    1. heyrick Silver badge

      @ Peter Gathercole

      I agree. I have numerous old docs and faqs that point you to blah.ac.uk/~something or to geocities... None of which exist now.

      Furthermore, linking to data might mean that data is "fresh", but this doesn't account for either expanding/altering the data in a way that might become incompatible with the other data, or deliberate modification for vested interests. It does happen.

      If I was working with a data set, I would prefer a snapshot of the entire thing, not a list of locations to pull bits from.

    2. stewski

      I'm not sure this makes sense.

      If I understand what you are saying, why doesn't that apply to the current web?

      Web pages are also "non portable" in that they link to other pages or locations which mean if you move them off line they no longer function identically. Whilst this is true, it is possible to spider to any given depth and cache that, the complaint also ignores the added value of those links for discovery, rating, relating and indexing etc.

      Linked data attempts to gain value in a way similar to the web for documents by showing relationship between data or concepts via a uri. If you can say that linked data is not as valuable as a traditional off line database then you could also say the web is not as valuable as word documents. Much like the current web, linked datasets may often be based on tradition database data, you gain value by making the format an open royalty free standard that links to other datasets, that does not detriment the original data set you have. Your web pages do not diminish the documents they are based on either.

  22. Jimmifett
    Grenade

    Do not want

    I do not want my datasets being accessible by any old person on the street. It may be fine for some fringe cases, but I and many others take the time and resources to collect and organize data. Often, that costs money if not time. I may or may not charge for my compiled datasets, but I may require usage agreements and define permissions to access it. I'll also want to distribute the data in whatever format I choose. I'm not going to offer up millions (billions?) of rows of data at my own hosting/bandwidth expense for some fool that wants to see how many green Volkswagons got parking tickets near the bus station they catch on a whim. I will present it in whatever format I choose, and if you want to access it, you can use my format with my rules using whatever API I chose, or get your own dataset.

    1. The Other Steve
      Thumb Up

      Well, yes quite

      "but I may require usage agreements and define permissions to access it"

      I avoided the [D]RM issue as I didn't have my asbestos pants handy, but yes, absolutely.

      Let's explore that a bit. Say I sell someone some rows from my dataset. Do they now own them ? Can they show them to others ? Use them in a profit generating capacity ? If they profit from it, do I want some of it ? I can hear the freetards cracking their knuckles ready to type flamage, but these are important questions.

      How do I price access, essentially. For many large datasets the value is not in the individual data but in the aggregation, which allows you to perform ad-hoc queries and derive some result. Arguably if I have a gert big dataset and you want to run queries over it, you should be paying for the whole set, or at least each row you touch, rather than for the four rows of results you get at the end.

      And what have you just paid for ? To own the results ? To have a licence to them under certain terms ?

      One of the reasons that "information wants to be free" is so wrong headed is that while it is exceptionally easy to put a lower bound (0) on some piece of information, it is very hard to find an upper bound. The number of people who drive VW beetles, wear wellies and like orange juice may look like a piece of trivia to most people, but to someone it could be the key piece of information for a multi million pound business venture.

      Hence much information actually tends towards expensive - at least in volumes large enough to be useful - rather than free.

      So while "we demand free access to data" is a nice rallying call - and there are many, many datasets that we ought to be able to get at, especially ones we already paid for, and ones that benefit the data providers by their existence (bus timetables, transport geo info, etc) - it will take a bit more than just having a suitable technical framework in place to get the data out of Berners Lee's "silos".

      They aren't inaccessible by accident, but by design.

  23. Paul Williams

    Isnt the issue more to do with data whose privacy level may change dependent on use?

    I may not care if someone knows any one of my bank account number, mothers maiden name, or the name of my first pet - they're all harmless, but i would certainly be concerned if someone were to know all three of them, for example.

    So how do you reconcile the openness of a geneology database, the openess of the postcode system, and a facebook field that lists what pets you have?

  24. Alan Firminger

    Tesco get rich linking data

    I want to as well.

  25. Anonymous Coward
    Anonymous Coward

    OM...

    So whats needed on the privacy side is an Online Me product which:

    a. provides secure personal id, probably involving physical authentication;

    b. has a rich user interface that encourages you to define multiple levels of disclosure in a simple-to-use format;

    c. can auto-update login profiles on sites such as Facebook to ensure the security settings match what you want rather than what the site owners may have set today's default to be;

    d. support multiple public ids so you can avoid being tracked across different sites;

    e. externally is one-way only, i.e. no path from public ids back to you.

    anything else ?

  26. penguin slapper
    Thumb Down

    rofl

    "Zuckerberg wants to make the world a more open place"!

    Do me a favour.

    Zuckerberg wants to sell as much of your information to corporations as he possibly can - if that's "openness" then we need to go back to the drawing board.

    1. Anonymous Coward
      Anonymous Coward

      Well in a way he does

      It's just that Zuckerberg's approach involves putting the whole world inside Facebook without any privacy controls.

  27. uuf6429
    Grenade

    Rubbish!

    I've used RDF in the past, as well as SPARQL. The concept is great, but the technology itself sucks.

    It's worse than the mess we have in HTML. To the fanboys: I making a living HTML, well a quarter or so. Point is, you add some script to a page with a script tag, you add css using a style tag, as you know, we add script files via script tag as well, and them same with css....oh wait, actually it's a link tag...uhmmm...hellooo??? Then you have tag naming...a, img, b, i, u ...and...object style body html video among others.

    The only consistent thing in HTML is that it is inconsistent in each aspect.

    Back to RDF. Check out the tag list - half of it is inconsistent, and the rest asks for awful implementations (eg: rdfXML).

    But hey, it's written in the golden book of standards, just be normal like everyone else and learn it!!!

  28. John Savard

    Aliens Won't Find Out

    When we saw people reading books on the monitor screens in Star Trek, they were arranged in landscape format, so the issue raised will be corrected in time to save humanity's reputation.

    In any case, while data was originally the plural of datum, in current parlance, it is used as the name of a continuous substance like "water", "iron", or even "fish". Treating data as a plural would therefore be confusing pedantry, and thus it should not be considered a requirement for correct English usage.

  29. Anonymous Coward
    Anonymous Coward

    Isn't that Julian Assange??

    'Mark Zuckerberg has famously declared that he "is trying to make the world a more open place" '.

  30. Anonymous Coward
    Anonymous Coward

    anon

    I've seen this kind of initiative being worked on in systems biology. While there are human beings involved it will never succeed. Maybe if machines take over and begin designing themselves then data standards will arrive.

    Systems biology needs datasets that can be accessed by different pieces of software. The file type standardization has been sorted out with xml, however the formatting of the data and the standards being set up to ensure data about certain subjects is entered in a generic interchangeable format is still not progressing. Each group suggests their own "standard" and then sticks to it while trying to bash the others so they can be the ones to implement the "standard". Politics and ego takes over and the initiative ends.

    Give up. Start a system to collect the data yourself, or automate the conversion, and then use your own system and accept that everybody else's data will never be compatible with you because they're doing the same.

    Humans can't operate on a group collective level. It's genetic.

  31. Tom 7

    "PDF is a brilliant way..."

    of doing nothing at all. PDF only creates lookalike documents when everyone has the same calibrated printers or monitors using the same calibrated inks and papers under calibrated humidity and temperature conditions.

    I do like the idea of tesco et all publishing their prices in an easily computer digestible form. If the others do it I should be able to put in my shopping list and expected travel costs and minimise my shopping bills .. Well for a couple of years until realise competition wasn't what they had in mind really.

    1. Anonymous Coward
      FAIL

      That's why it's a fail though isn't it?

      Why would the likes of Tesco ever bother? The only up side is for the consumer being able to find products cheaper elsewhere. This is always going to be to the disadvantage of the majority of businesses so they won't bother.

      Berners Lee lives in a fantasy world where he reads his own press and believes he's some kind of web guru. He's just a jobbing coder who got lucky.

  32. Youngdog
    Thumb Down

    Don't take the piss out of Baby Doll...

    ...

    Can you cross-reference tiger

    owners with all the moustache-

    wearing sous chefs within a 4-mile radius?

    Yes, yes you can - if you don't trust your tagging and linking to the herd and instead just hoover it all up and put your efforts into the Application level rather than the Data/Transport level

    Iriots

  33. Anonymous Coward
    Thumb Down

    Load of arse

    The semantic web, that is.

    Berners Lee has been banging on about it for 10 years. It hasn't happened, and it never will happen.

  34. Timjl

    PDF is great

    for things that need to be printed out, and the processes you need to follow to produce them.

    If you are using them for anything else you should be killed.

  35. Peter Mc Aulay
    FAIL

    Colour me uninpressed.

    So the semantic web is bascially the same idea as database views, only instead of links to tables you get URIs which may or may not be under your control, may change (format, location or content) or simply vanish at any moment. What could possibly go wrong with that?

    And these people criticise PDF files, which are designed for printing, for not displaying text in A4 landscape mode on their screen (an argument founded in such amazingly complete and utter bollocks I don't even know where to begin).

    Perhaps the people designing the next generation of the interwebs should start with the web as it is today and not as it was in 1995.

  36. Flossie
    FAIL

    PDF has nothing to do with A4

    Since when was PDF tied to A4 portrait? I was under the impression that PDF was designed to provide a consistent fixed document display across multiple platforms. It could easily be formatted to match a typical widescreen monitor.

    Portrait is not a "legacy format". He also ignores the fact that there are good reasons why portrait is the best format for text (we've been using it for 100's, probably 1000's of years for a good reason), there is a limit to the length of line which the human eye can cope with, so the portrait format is ideal. Widescreen monitors are now the norm because of the economics of the screen manufacturing industry and because many people use their laptops as video players, so consumers demand widescreen. It has very little to do with what is actually best when it comes to reading written documents.

This topic is closed for new posts.