* Posts by Mark Whitehorn

22 publicly visible posts • joined 29 Dec 2006

Frustration and joy - Microsoft's CTP in action

Mark Whitehorn

Replies to several comments

@ bored AC

"Seriously, are you going to use the express version for your final deployment?"

"nonetheless you aren't half making a massive rod for your own back by beta testing for MS"

Thanks for the comments. No, we won’t be using the Express version – we were using that early on (in fact, in the 2005 version) to do some rough prototyping and it happened to illustrate rather elegantly that the database file size was getting quite large, even with relatively few blobs.

And no, I don’t think we are making rods for backs. The original decision to use the 2008 stack was made precisely so that we could gain experience with 2008. You will have gathered that the download/install of Microsoft CTPs does not fill me with awe and wonder (or, rather, it does, but only in a bad way….). However all of the rest of the CTP program, actually using the software has been, as reported in the rest of this series, remarkable pain free. We have gained huge experience and both the management and the developers are extremely happy that we decided to go down the CTP route.

Had we started with 2000/2005 as you suggest we wouldn’t have gained the experience.

@Kevin Bailey

Kevin asks an interesting question “Who pays for the install time?” The answer is the management. Are they happy to do so? They are delighted. Why? Because they understood at the time the decision was made that working with a CTP program has associated costs. They (not the developers) decided that the benefits were likely to outweigh the costs and, happily, in this case, they have been proved correct.

@AC (possibly the same one, possibly not, it’s difficult to tell…..)

“And why worry about DB size? Overall size counts - total data doesn't get smaller if you store it outside the db; there's just as much to back up, only it's in pieces... which makes reliable backup during runs problematic, if a file is added/deleted while backup is going on. Please explain!”

It’s always a trade off. Keeping the BLOBS in the file system can be an advantage if the blobs are mainly static and can be compartmentalised. That way you can only backup the new BLOBS and it makes the backup of the database much faster because it is smaller. However I don’t want to make it sound as if these were the only considerations. We could, for example, also have partitioned the database and achieved a similar (but not identical) effect. It is easy to see these architecture decisions as simple – in fact they are relatively complex and there was some discussion and testing before we made the decision. (Indeed, that is another reason why we wanted to use the CTP, so that we could do the testing!).

Mark Whitehorn

4GB Limit

OK, my bad.

The problem was my careless editing. This article was supposed to be 800 words long and my initial draft was over 1,200; so some serious surgery had to be performed. This section originally read as:

“As it happens, our application has a great number of BLOBs. In SQL Server 2005 we tried both approaches. Storage in the database rapidly produced databases files that exceeded the 4GB database file limit that was imposed because, just to get started, we were using SQL Server 2005 Express running on a 32 bit operating system. We did some further testing and tried storing the BLOBs in the file system. This was better but managing the pointers to ensure that the pointed to the correct BLOB - in our case by bar codes - was time consuming and limiting. So, when the CTP arrived we moved to 64 bit (as described in an earlier article) and started using the file streaming which has proved to be fabulous for our application.”

130 words – far too long. So I edited it down to this:

“As it happens, our application has a great number of BLOBs. In SQL Server 2005 we tried both approaches. Storage in the database rapidly produced databases files that exceeded the 4GB limit on a 32-bit operating system. Storing the BLOBs in the file system was better but managing the pointers to the correct BLOB - in our case by bar codes - was time consuming and limiting.

File streaming is fabulous for our application.”

Which is a much trimmer 74 words. Sadly I failed to notice that I had ended up implying that there is a 4 GByte limit on 32 bit OSs.

Windows hardware challenge draws on resources

Mark Whitehorn

Mother is watching

The BRSSITS (The Big Red Security Switch In The Sky)

It is excellent that Microsoft has provided all this security stuff. In a production system it should be very carefully used. All of this is true.

But think about what happens in practice during development if we don’t have a BRSSITS. Nothing will work out of the box. The development guys simply want to test the spatial data types, not the security. So people dig deep, find all of the switches and set them all, individually, to the “Completely Unsafe” setting in order to get the thing working. Time passes, development takes place and the system moves into production.

Now, what should happen at this stage is that an entirely new production system is created – complete within a fully tested, secure environment. But suppose the development system is somehow, sneakly, moved from development to production status? (Whilst it shouldn’t happen, it does in practice as pressure is applied from the business side.) At that point, someone has to go through the system, finding all of the switches and resetting them. And this is the dangerous part. If one is missed you have a potential security issue.

You can probably see where this is going. If you use the BRSSITS, then all you have to do is to switch it off. Of course, everything stops working, but that’s OK because it forces you, at this point, to set up the security properly – in the correct way, by switching on only the bits you need.

Is it possible to forget to switch the BRSSITS to the ‘Safe’ position before going into production? Of course it is. The next question is “Which is more likely? To forget one HUGE RED SWITCH or to overlook an obscure setting, buried deep in the bowels of the UI?”

And remember, you did have to have that note from your mother before you invoked it. She’ll be patiently watching from the sidelines, making sure that you don’t forget. After all, that’s what Mothers do best.

Back to basics for SQL Server 2008

Mark Whitehorn

Clayhithie

AC is very kind but these four

Streatley Hill, Berkshire

Clayhithe, Cambridgeshire

Yalta, Crimea

Causey Pike Gill, Cumberland

are examples of ones for which we DO know the location:

51.52305, -1.15609

52.25789, 0.19911

44.49673, 34.16313

54.57683, -3.19908

However, as I say, the thought is much appreciated,

Mark Whitehorn
Thumb Up

Pleasby Wood

Excellent Andrew, thank you! That's another one accounted for.....

Inside the Windows 2008 stack experience

Mark Whitehorn

re:SQL Server 2008 delayed to Q3 2008

Happily, as I say in the next, exciting, episode (which should appear soon), having lived through the release of SQL Server 2003/2004/2005 I'd already allowed for substantial slippage in the 2008 version. This doesn’t mean I am happy about the slippage, but I was prepared. The project will go live in the summer, but only with a subset of the users and we can do that on the beta (sorry, CTP).

"Are you sure you don't want to consider an open source alternative for this project, Mark?"

We have already done so. When the project started we drew up a spec. and then matched that against the available products; some of which were open source. A subset of open source products was potentially capable of meeting the database requirements but not the BI (specifically the OLAP and data mining) requirements.

However, for other projects I am always open to suggestion and the world is constantly changing. So if any Reg. Dev. readers are aware of good open source OLAP engines and data mining tools (in particular those capable of scaling well) please do let me know.

Why Borland trashed its spreadsheet

Mark Whitehorn

Seriously now....

I wish to formally align myself with the views expressed by Bill Trent in his first posting.

In answer to his request for information, I think that the word “irony” is fine to cover “hyperbole intended to imply the opposite.” The Cambridge dictionary defines irony as:-

a means of expression which suggests a different, usually humorous or angry, meaning for the words used: Her voice heavy with irony, Simone said, "We're so pleased you were able to stay so long." (= Her voice made it obvious they were not pleased).

Finally, I wish to formally apologise for not making it clear in my article that I was using irony. Sorry. I will try to be clearer about this in future.

(In keeping with my new policy about being clearer, I must state that the previous paragraph contains an ironical statement.)

Mind your languages with Microsoft LINQ

Mark Whitehorn

Re: Some facts wrong, too much focus on database

I’m sorry that you feel there was too much focus on database but, even after re-reading, I feel it was made perfectly clear that LINQ can link to sources other than databases. I simply then had to focus on one source, so I chose what was for me the most obvious one.

> LINQ to SQL *do* have an abstraction (mapping) layer.

Yes, it does. In truth, I don’t believe I ever said it didn’t.

> Your claim that LINQ writes directly to the tables is flat out wrong.

What I actually said that “This would also address another criticism levelled at LINQ, which is that whilst it isn't SQL, it still writes directly to the tables. As soon as the table structure is altered, the application breaks.”

Let me expand that, because I agree it is cryptic. Clearly LINQ has to use a mapping layer since LINQ works with objects and databases use tables. However, take a look at example three (011) from our 101 samples. It includes the code:

var expensiveInStockProducts =

from p in products

where p.UnitsInStock > 0 && p.UnitPrice > 3.00M

select p;

It does not require a huge exposure to sample databases to recognize our old friend NorthWind here. If we were in any doubt, the appearance of Sir Rodney's Marmalade in the answer set must allay any misgivings. (Let’s not even begin to speculate as to why the Microsoft guys have seen fit to use such an old sample set of data rather than the shiny, new Adventure Works…….)

Northwind, of course, includes a table called ‘Products’ and ‘Products’ includes two columns called ‘UnitsInStock’ and ‘UnitPrice’.

So here, no matter what mapping could be done, the LINQ code it actually addressing a table; the mapping is essentially one to one. We can argue about whether this is wise, but the bottom line is that this is being done in the sample code that Microsoft has produced to teach people to write LINQ.

Now I argue in the article that databases also have a mapping layer (views, stored procedures and functions) and I also argue that most people don’t use them. So, whilst I agree that LINQ has a mapping layer, it is almost certain that people won’t use it effectively; its effective use is not being demonstrated to them and experience suggests that people don’t make use of abstraction layers here anyway. If the mapping isn’t used as an abstraction layer, that is functionally the same as it not being there.

However, I am happy to amend my original statement to: “This would also address another criticism levelled at LINQ, which is that whilst it isn't SQL, it can still effectively write directly to the tables. As soon as the table structure is altered, the application breaks.”

> Contrary to your claims, LINQ to SQL will also work with stored procedures,

I have to say, having reread the article several times, I simply cannot find any evidence that I claimed this. I mentioned stored procedures in relation to well designed databases. I never claim that LINQ will not work with stored procedures. I might (if pressed) argue that people are unlikely to use that ability, but that's different.

> I disagree that LINQ is a way to let programmers

> forget (or not learn) SQL.

Well, we will simply have to disagree about that. I think it is.

>My experience with LINQ (not just LINQ to SQL) is

>that it will drastically change (for the better)

>how I program any set/list manipulating code,

>not just database results.

I agree. I’m a database freak so I find SQL a perfectly natural language to use. But that doesn’t mean it is perfect, no language ever is. I want to try LINQ in some real projects and decide whether it has enough advantages is practice to make it worth adopting. Time alone will tell.

Running queries on the HMRC database fiasco

Mark Whitehorn

Re: Security should be discussed!

George Danezis makes an excellent point about good, well designed security systems. As he says “Making such documents public should not make the system more vulnerable, if it is engineered with security in mind.” I agree.

In general, the less you know about a security system, the more difficult it is to break. However if, as George suggests, a system is well engineered with security in mind, it possible that some information about the architecture can be revealed without compromising the system.

But the converse is also true. Some badly designed systems rely on the fact that the architecture is hidden to provide some of the security. I’m not suggesting that this should be the case, merely that sometimes it is so.

For example, imagine a physical security system that includes a wire on top of a wall. If you know nothing about the wire or the signal it may carry, you risk detection if you cut it. On the other hand, if you know that it carries a very simple signal that can only detect a complete break you can happily use a jumper wire to avoid detection. (I don’t write from experience here, but I have watched innumerable spy movies).

Now it is clear (painfully, excruciatingly clear) that the system under discussion was not well-designed. Had it been, we would not be discussing it. And given that it was poorly designed, it may be that some measure of protection might still be afforded if the remaining details of the ‘architecture’ are not revealed.

I agree that ignorance of the database format or even the encrypted archive format will not appreciably slow down professionals. How much it would slow down or stop amateurs would depend upon their level of expertise – which is unknown.

Clearly there is a spectrum of risk here. Some information (the file names) is very low risk, other information (the password) carries a somewhat greater risk. Exactly where the line should be drawn is tricky but the government is wise to err on the side of caution. Doing otherwise has the potential to further compromising security to an unknown degree.

>The government is clearly trying to say as little as possible on the matter,

> with good *political*, not security, reasons.

I agree. Given that the government has, in the past, shown very little concern about protecting this data (hence the leak) there is every reason to believe that it is currently more concerned with the politics than security. But the motivation of the government and the morally correct course of action are not linked as cause and effect. In other words, just because the government has a hidden agenda for wishing not to discuss the details of the security does not mean that those details should be discussed.

>It is unclear why IT journalists should play along with this strategy

>instead of asking for the full requirements, specifications…

I don’t agree that we are playing along with a strategy. I think that journalists face the same choice as the government (but without the political pressure) and, for the reasons outlined above, should make the same decision.

I suspect that if we were ‘playing along’ with some government strategy, we wouldn’t be highlighting the absurdity of that same government using pseudo-technical arguments for political ends.

Hands on with MDX

Mark Whitehorn

Re. YTD

For those who don't recognise the name, 'Mosha' is Mosha Pasumansky; without doubt the world's leading authority on the language. The book mentioned in the article (FastTrack to MDX) is one that Mosha and I wrote in conjunction with Robert Zare. I have learned over the years that Mosha is always right about MDX (and lots of other stuff as well) and, if you get into MDX, his blog at:

http://sqljunkies.com/WebLog/mosha/

is well worth studying.

He is, as always, quite correct that we can reduce this MDX down to two words which emphasizes still further the power of this language. So we see an analytical statement that in SQL would take serious effort being achieved with almost Zen-like ease in MDX. This is not to say that MDX is therefore better than SQL; the two have strengths in different areas; SQL for transactional querying, MDX for analytical. But it does emphasise that MDX is worth learning.

Mark Whitehorn

Re. Interesting, if a little abrupt...

Hi David,

Well spotted! Sorry about that, there was a technical hitch and only about two thirds of the article appeared initially.

It is all there now; thanks for pointing out the error.

Mark

It was Microsoft wot done it

Mark Whitehorn

Re. MS Spot Changing

> can you point us to a modern version of the Bloor Report?

Yes, for example you can download the report on SQL Server 2005 free from Bloor's website.

http://www.bloor-research.com/research/product_evaluation/

741/microsoft_sql_server_2005.html

A couple of quotes from it:

"SQL Server 2005 is highly scalable...."

and

"We have not previously considered SQL Server to be a serious contender for data warehousing, at least not for large sites. However, Microsoft has made significant inroads into the market over the last few years."

suggest that Bloor now sees the product as much improved. In addition, the existence of such a report tells us that Microsoft does give permission for testing and I'm sure that companies like Bloor now insist that they get said permission is writing.

> why believe that the company or their product is any way changed?

My beliefs about the change in the product are based in part on reports like this one but also on my own experience with SQL Server, Oracle and DB2.

10 years ago there was, in my opinion, a huge gulf between the products – now there is not. They still have differing sets of strengths and weaknesses; but at least they are all in the same ball park (cricket pitch for our UK readers).

My belief about the change in the company is based on my interaction with it over the years. Am I swayed by the PR? Of course not; just like everyone else I have ever met, I KNOW that I am personally unaffected by marketing and/or advertising…….

Steve Ballmer may have said something interesting; we couldn’t possibly comment...

Mark Whitehorn

re. Journalists ask questions...

An excellent point.

Most keynotes don't have a Q&A section but just suppose this one did (I don’t KNOW if it did because I wasn’t allowed in…..). In that case you could have put your finger on the problem; Ballmer didn't want those perfidious journalists asking the awkward ones. However my guess is that the audience there was quite capable of asking challenging questions on its own (which is probably why keynotes don’t usually allow questions.)

So, can any reader who was there enlighten us? Did Ballmer take questions from the audience?

MySQL is the company's SQL now...

Mark Whitehorn

MySQL - Ready for the enterprise?

This article is about the new version of MySQL and the opportunities/challenges that it poses for the company. It is not about the suitability (or otherwise) of the database engine for the enterprise market. Just because many people find the free version great for the purposes to which has been traditionally been put doesn’t mean that it can withstand the white heat of the enterprise environment. Billy Verreynne’s experience suggests to him that it can’t; what are your experiences?

We intend to follow up this article with another that does examine MySQL’s enterprise capabilities. If you have an experience that you are happy to share with the World, feel free to post it here or email it to me.

Mark Whitehorn

comment on comments

Thank you for taking the trouble to post comments.

Yes, I agree. PostgreSQL is also a fun database engine.

MySQL’s record so far in the Enterprise market.

It is perfectly true that MySQL has a long list of customers and I’m delighted. However not even MySQL would claim that all of these companies have replaced their mission-critical systems with MySQL. Many of them are using MySQL as a very cost-effective database engine for non-critical systems whilst still running more traditional engines for their core systems.

In addition, the world is a very big place with a great number of companies and the customer list of a product like DB2 is immense in comparison. In enterprise terms, MySQL is still way behind the big three.

For example, Gartner ranked the engines like this for market share in 2005:

Oracle 48.6%

IBM 22.0%

Microsoft 15.0%

Teradata 3.2%

Sybase 2.9%

Other vendors 8.2%

MySQL doesn’t even appear. Now, it is quite true that Gartner essentially uses revenue (of one kind or another) and that measure discriminates (until now) against products like MySQL. Nevertheless, it is clear that MySQL has not, so far, come anywhere close to the penetration of the other engines. Indeed, part of the reason for the change in sales model that the article outlines is to try and improve the enterprise sales of the product.

Transaction support.

Transactions are clearly a topic of some interest to Reg. Dev. readers:

http://www.regdeveloper.co.uk/2006/12/22/access_not_relational_myth/comments/

As discussed there, deciding exactly when an engine provides full support for transactions is not easy. However MySQL, the company, announced in a press release dated September 23, 2002 that:

“the MySQL database now provides full support for transactions to enable the development of e-commerce and other robust, business-critical applications. The standard version of the MySQL database now includes a high-performance transactional storage engine with crash recovery and other capabilities that will significantly extend MySQL's role in the enterprise database market.”

So, let’s take that date as correct. It is just over 4 years ago. I have no doubt that this seems like a long time ago if, for example, you have only been using MySQL for three years. Indeed, it must then seem as if the product has always supported transactions. However, in transactional terms, this is still recent history.

It is worth remembering that the concept of a transaction didn’t pop into being one day, fully formed. It took a great deal of effort to develop over many years and Jim Gray did much of the fundamental work. In 1975, along with Chamberlin and Traiger he published “Views, Authorization and Locking in a Relational Database System”. This was followed by a long series of papers including the classic “Transactions and Consistency in Distributed Database Systems” and “A Transaction Model”, both in 1980.

Relational engines were being developed during this time and they started to embrace the notion of transactions and implement Gray’s ideas. For example transactions, at least in terms of commit and rollback, first appeared in Oracle version 3 (1980). Subsequently, as our understanding of transactions further improved, Oracle improved the support in future versions.

Seen within this broader context of enterprise level database engines supporting transactions since 1980, MySQL’s introduction of support in 2003 has to be seen as relatively recent.

OLAP and the need for SPEED

Mark Whitehorn

Re: OLAP is not just MOLAP.

> but you knew that :-)

Yes I did and, without wishing to be contentious, I don’t think I was using OLAP to mean MOLAP. All three flavours of OLAP are inherently multi-dimensional, at least in terms of the way in which users can visualise their data. The proof of that is easy. If you use a front-end tool (such as ProClarity) then the user experience is absolutely identical (dimensions and measures) no matter which flavour of OLAP is used. Where MOLAP, ROLAP and HOLAP do differ is in terms of physical implementation; but that is typically completely transparent to the user.

Mark Whitehorn

Obvious to some, certainly.

If you fully understand OLAP then, by definition, you must understand that this myth is invalid. No question.

However not everyone does. My experience is that people who are currently learning OLAP often believe that it is just about speed; and it is easy to see why.

OLAP has two main advantages; speed and the ability to present data in a multi-dimensional way. If you have been bought up on relational databases then the first advantage makes immediate sense. You know that users always want increased performance, so this advantage sticks out like a sore thumb. The bit about multi-dimensional access to the data…… well, until you actually get your brain around what this means (and that often takes considerable time) then it is difficult to see this as an advantage. Indeed, if you are familiar with relational tables and are happy using SQL to query them, then a whole new data structure, with a whole new language (MDX), can seem like a positive disadvantage.

I guess it all depends on the people that you meet. I certainly meet people with this view in the training work I do but, even in my consultancy work, I still meet people who subscribe to this myth.

One acid test is to search the groups for a phrase like “OLAP is about”. You’ll certainly find people there who believe (and are telling others) that OLAP is about speed and aggregations.

Access isn’t a relational database

Mark Whitehorn

Re: ACID test and durable

My guess is that the authors of the two comments “The ACID test” and “durable” are using different definitions of the word ‘durable’. I agree wholeheartedly with the general definition of the word as discussed in the comment “durable”. Durable means, it endures. If most people update a database in Access, they can reasonably expect that database to be available later that day or even the next. So, in that sense, Access transactions are durable.

However we can then ask “How durable is durable? How long is it supposed to endure? What level of failure, both hardware and software, should a transaction be able to withstand in order to be considered durable in the context of the ACID definition?” For this we need to turn to the person who spent decades working on transaction theory and ACID properties, Jim Gray.

www.regdeveloper.co.uk/2006/05/30/jim_gray/

A useful book here is:

Jim Gray, Andreas Reuter: Transaction Processing: Concepts and Techniques. Morgan Kaufmann 1993, ISBN 1-55860-190-2

http://books.theregister.co.uk/catalog/browse.asp?isbn=1558601902

This is the classic work on transaction theory. In it Gray discusses what is meant by durable in an ACID context. His description isn’t a general one based around the fact that, on any given day, a single machine is unlikely to fail. He is talking about a much more rigorous definition; much closer to that given in the comment headed “The ACID test”.

Of course, it is still a matter for debate. If we ship the log files off a server every 30 seconds and the machine crashes and burns, we could lose up to 29.99999 seconds worth of transactions. Does that meet the criteria for durability? Is once every 5 seconds enough? What if we mirror the entire database to another machine in the same room? The transactions can now survive one machine failing, but what if the room burns down? If we simultaneously mirror to ten different off-site machines; are the transactions now durable?

Ultimately I think that Access transactions are perfectly durable in the normal use of the term, but Access doesn’t have the additional mechanisms necessary to provide durability as the term is typically used in the ACID sense.

Mark Whitehorn

Re:Irony

It is irony rather than recursion. To be more specific it's irony 3.2.

Mark Whitehorn

Re. transactions, version 4.0

David’s posting raises several issues which, while they aren’t directly germane to the original direction of the article, are fascinating in their own right (as long as you are interested in databases!) and therefore worth following up.

For information, before we start, the following list shows the appropriate Jet engine for each version of Access:

Access 1.0 Jet 1.0

Access 1.1 Jet 1.1

Access 2.0 Jet 2.0

Access 2.0 Jet 2.5 (Service Pack)

Access 7.0 Jet 3.0

Access 97 Jet 3.5

Access 2000 Jet 4.0

Access 2003 Jet 4.0

The first point for discussion is when Jet started to support transactions. Was it Jet 4.0 as I stated, or 1.1 as David says? Well, it depends on what you consider to be a transaction.

I have a set of original Access 1.0 manuals and on page 117-121 of the ‘Introduction to Programming’ manual we find a description of how to perform transactions. In the ‘Language Reference’ manual we find the following statements supported:

BeginTrans P50

CommitTrans P76

Rollback P406

So we can easily argue that transactions have been supported since version 1.0.

But, of course, it isn’t as simple as that (you must have known it couldn’t be….). For a start this is non-standard syntax. Neither BeginTrans no CommitTrans are part of the ANSII SQL standard, although Rollback is. In addition, the SQL standard assumes that transactions start automatically when you issue an INSERT, UPDATE, SELELCT or DELETE statement. Access doesn’t do this, which is why it needs a BeginTrans statement. Having said that, the SQL standard does also support the START TRANSACTION command which is similar to BeginTrans. Then there is the little matter of SAVEPOINT which is also part of the standard, but isn’t supported in Jet 1.0 (or, indeed, in 4.0). And the standard also supports a considerable set of options for COMMIT and ROLLBACK that are not supported by Jet 1.0.

So, does Jet 1.0 support transactions? ‘Yes’ in the sense that you can wrap up several SQL statements into one operation that either succeeds completely or fails completely; ‘No’ in the sense that it does not conform to the definition of transaction as defined in the SQL standard. So ultimately it depends on the definition you apply.

As Jet progressed, while support for transactions improved, it has never reached the stage where the entire standard is fully supported.

In the article I chose 4.0 as the first version that provided enough support for transactions to be worthy of the name. I had two main reasons for this choice. Firstly 4.0 was the first version where Microsoft elected to change to (more or less) the correct syntax for transactions. Secondly, and more importantly, it was at this point that Microsoft itself appeared to concede that support had been poor up until that point.

I quote from a Microsoft document http://support.microsoft.com/kb/275561 called “Description of the new features that are included in Microsoft Jet 4.0”.

“Microsoft Jet SQL now supports invocation and termination (committing or rolling back) of transactions.”

So, a ‘new feature’ of Jet 4.0 is the support for invocation and terminations of transactions. The clear implication being that, prior to 4.0 it did not. I was absolutely certain that if I referred to the support of transactions any earlier than 4.0, a multitude of Reg Developer readers would have pulled me up and pointed out that even Microsoft didn’t consider that transactions were supported prior to 4.0……

Incidentally, in this document Microsoft still acknowledges that support is by no means complete. The very next line reads:

“Note that while ANSI SQL specifies that a new transaction is started automatically following a COMMIT or ROLLBACK, Microsoft Jet does not follow this model. Thus, an additional transaction verb is defined to explicitly start transactions, because Microsoft Jet does not automatically start transactions.”

So, does all of this mean that I am obliquely saying that David is wrong? Absolutely not. When writing the article I thought carefully about it and, in truth, I nearly opted for 1.0 instead of 4.0. If I hadn’t remembered the reference in the 4.0 spec., I probably would have. Exactly where any of us think that Jet starts to support transactions really is simply a matter of opinion.

Log files.

A log file contains a log of all the transactions (in the standard sense, not just in the Access sense) that take place against a database. Log files are typically not stored on the database server itself but on another machine, a fact which turns out to be very useful. Suppose that we create a backup of the database at midnight. The database runs until 3:00PM then it crashes and burns in a spectacular fashion. The server is destroyed. But we are not downhearted. We restore the backup to a new server and then essentially run the transactions recorded in the log file against the database. Since every transaction is recorded in the log file, we can roll the database forward to any point in time; yea even unto 3:00PM. Wonderful. This is how databases ensure that transactions are durable (the D part of ACID); they survive even if the operational database is lost.

Access logs enough of a transaction to be able to roll it back, but does not provide durability. So, David and I agree totally that Access doesn’t guarantee its transactions pass the ACID (Atomicity, Consistency, Isolation, Durability) test.

Building a data warehouse on parallel lines

Mark Whitehorn

Re. Indexes... blah...

I don't think that anyone implied that indexes were the only way to speed up databases. There are, indeed, manifold techniques we can apply – indexing, aggregation, partitioning, OLAP (which typically, but not exclusively, includes an element of aggregation), hardware (a very broad church embracing parallel processing, faster disks, RAID, more and faster CPUs, more memory) query optimization, updating statistics, denormalisation (in all its flavours) to name but a few.

Any and all of these can be appropriate, or not, for any given database; it depends on the circumstances. Aggregation can be an excellent solution but it isn’t perfect, see the Register article:

www.regdeveloper.co.uk/2006/10/06/aggregates_the_dba_headache/

Kognitio supplies yet another way to speed up databases: its solution is based around parallel processing and in-memory querying. The solution will be highly appropriate in some circumstances but no one would suggest that it is appropriate in all. The good news is, the more solutions that are available to us, the more likely it is that we can find an appropriate one.

Ultimately there is no single button labeled ‘Make your database 100 times faster’. If there were, we’d all have pushed it a long time ago.

Mark Whitehorn

Size, is it really important?

Well, there are several answers to the questions raised about size.

Firstly Sun and Greenplum did indeed announce a data warehouse appliance in July 2006. However it actually starts an order of magnitude lower than 100TB. To quote from the Greenplum press release of the time:

“The Data Warehouse Appliance will be available later this quarter. Initial configurations will deliver usable database capacities of 10, 40 and 100TB.”

Secondly, as far as I am aware, Kognitio has never made any pronouncements about the volume of data that either constitutes a data warehouse or should be put on a node.

The quote in the article is there to illustrate scalability not absolute volumes. It says:

‘In addition the architecture that Kognitio has elected to use has a very desirable side-effect: scalability. The company claims, for example, that “the query performance of a 100-server WX2 system with 10TB of data will be the same as that of a 10-server system with 1TB of data.” ’

There is certainly no inference that nodes are limited to 100Gb, the numbers are simply being used for illustrative purposes.

Thirdly, and most importantly, size is not important. The volume of data held in a data warehouse really isn’t that relevant, it is the quality of information that can be extracted which can make or break the project and, indeed, the enterprise. That is not to imply that large data warehouses cannot provide important information; of course they can. It’s just that the correlation between volume and importance is not absolute.

With reference to the other products, such as:

Kognitio

Greenplum

Netezza

DATallegro

and others

They are all doing very exciting work to push the boundaries of what can be achieved in data warehousing; we think it is important to give these products more exposure. In this article I focused on one of them but David Norfolk (my editor) and I were both keen to include references to some of the other products that are (as far as we are concerned) in the same space.