back to article Structured data is boring and useless

We all know that structured data is boring and useless; while unstructured data is sexy and chock full of value. Well, only up to a point, Lord Copper. Genuinely unstructured data can be a real nuisance - imagine extracting the return address from an unstructured letter, without letterhead and any of the formatting usually …

COMMENTS

This topic is closed for new posts.
  1. Dan

    It's Friday, save it for a Tuesday.

    The world cup is on, it's Friday, and we're about to head out early to the bar. And you unload this on us. Very interesting stuff, really. I'm gonna bookmark it and re-read it next week.

    I promise.

  2. Leon Guzenda

    Does It Matter?

    Objectivity/DB, like all ODBMSs, stores both structured and unstructured data as objects with a unique Object Identifier [OID]. In the absence of indices or other query aids the only ways to find an object are by its OID, by scanning or by navigating to it from another object. However, you can also build many kinds of index (B-Tree and multidimensional), hash table, naming, versioning and ordered or unordered collection structures that reference groups of objects. So, the data may be unstructured upon arrival, but the supplementary structures can enhance the efficiency of queries.

    Making sense of both unstructured and structured data requires some human input. The main difference is that the rules are more rigid in a structured database. Object databases can encapsulate the rules in methods, which are generally linked with the applications or query servers that access the data.

    Some of the most advanced query tools employ ontologies, which can generalize the rules into concepts familiar to the user. The question "Are there any relationships between Person A and Person B" can initiate a search for family, business, social, geospatial and other possible connections. The High Performance Knowledge Server, from Ontology Works, is a good example of a powerful search engine based on ontologies. It does not distinguish between structured and unstructured data as it has Objectivity/DB as its engine.

    Objectivity/DB has a distributed,parallel query engine that has user replaceable components. These components can be used to go out and search any data source and return objects to the client that needs them. Objectivity/DB federated databases present a single logical view across distributed real or virtual (external) databases.

    All of the relational database vendors are adding extensions for searching unstructured data via their own conventional (SQL generally) interfaces. Unfortunately, without ontologies, or some similar mechanism, the queries are less powerful than expert users desire. OQL was a step in the right direction, but there's still considerable room for improvement in this area.

    Regards,

    Leon Guzenda

    CTO, Objectivity, Inc.

  3. Kingsley Idehen

    Unstructured vs Structured Data terminology does matter

    Your article addresses some very important issues.

    1. Data understanding and appreciation is dwindling at a time when the reverse should be happening. We are supposed to be in the throws of the "Information Age", but for some reason this appears to have no correlation with data and "data access" in the minds of many -- as reflected in the broad contradictory positions taken re. unstructured data vs structured data.

    2. The difference between "Structured Containers" and "Structured Data" are clearly misunderstood by most.

    "Structured Containers" (most DBMS products) have been limited by proprietary data access APIs and underlying data model specificity to date, when looking at the needs of the loosely coupled "Open-World" web of data called the World Wide Web. Naturally, in the "Open-world" model of the Web this is unacceptable. But things are changing fast, and the concept of multi-model DBMS products is beginning to crystalize.

    For instance, the Semantic Web (a vision that most don't understand due to the lack of coherent annecdotal material for the less technical) will ultimately manifest itself as a collection of loosely coupled databases that possess object-relational DBMS functionality.

    ORDBMS engines can extend data model support capabilities via object-relational functionality as exemplified by OpenLink Virtuoso which enables SQL, XML, and RDF management from one place (Unified Storage) with support for SPARQL, GData, OpenSearch, and other emerging Query Protocols).

    Please note that I am not implying that ORDBMS knowldge is required to make the Semantic Web more coherent than it is to date. I am designating ORDBMS engines as the DBMS engine form best suited for building the applications layer that is ultimately exposed as an endpoint in the eventual "web of databases". Personally, I prefer to call these endpoints "Data Spaces" since this is what ultimately fuses the Web 2.0 and the Semantic Web paradigms (that are currently perceived as mutually exclusive).

    For information on Virtuoso you can take a look at the Open Source project at: http://virtuoso.openlinksw.com/wiki/main/

    Nice article!

    --

    Regards,

    Kingsley Idehen

    President & CEO

    OpenLink Software Web: http://www.openlinksw.com

    Personal Blog: http://www.openlinksw.com/blog/~kidehen

  4. David Norfolk

    OODBMS

    Well, I'm sorry about posting this piece on Friday (even sorrier to be posting responses at midnight on Friday <grin>) but the OODBMS comments are interesting - and quite valid, although perhaps not all that can be said on the subject.

    The trouble is, that many people don't think there's an issue here and that using an OODBMS (I've been told by people who've tried them) needs quite as much discipline as using an RDBMS properly.

    More trivially, perhaps, things like Ontology make peoples' heads hurt. I was at an IDC Business Performance Management and BI conference this week where someone bought up ontologies. Everyone looked blank - analysts, speaker panal and delegates.

    I think that there are technologies that can cope with managing structured and unstructured data and everything in between. I don't think that is the issue - I think Kingsley's remark - we are supposed to be in the throws of the "Information Age", but for some reason this appears to have no correlation with data and "data access" in the minds of many - is about right. And it will lead to problems.

    A lot of "information processing" (or integration) efforts still fail because of data quality issues are underestimated and the politics of data semantics and ownership are overlooked. I think that the confusion between "structured" and "unstructured" data that Duncan

    identifies is just one symptom of a general malaise involving a lack of understanding of data issues and their importance.

  5. Duncan Pauly

    One Man's Structure...

    I agree David. The methods exist for dealing with data structure, but the “freedom” associated with unstructured data has become seductive to many who are happy to perceive structure merely as XML semantic tagging; and meanwhile nomenclature continues to confuse with the term “structure” meaning different things to different interest groups.

This topic is closed for new posts.