SysAdmin daily cycle
- Nebulous / Ill-conceived external pressure
- Red mist descends
- Whining & Screaming
- Apply ugly solution / band-aid
- Vent in the pub
Repeat
Friday - Read Dabbsy - Relax
A recent demonstration of Juniper's Software Defined Networking (SDN) showed a level of automation that makes me loathe the mundanity of my day job all the more. Software defined networking is a collection of technologies that could free me to do far more business-critical things with my time, like research, automating more …
So we have finally started to convince the world that large layer 2 domains are a very bad idea (think: blast radius), and that lots of individual devices working together (Distributed) is much better than centralising everything...
Until along came SDN and re-centralises everything once again (Controller wise). I can already see the mega-outages a bug at the controller level will cause, or the lack of individual node optimisation this will cause.
It's not that I think SDN is a bad idea, I just think it's half baked right now. I also think that simplicity in design, and a good provisioning tool and excellent engineers will trump the cost of SDN, in terms of man hours, resources and impact.
Beer; because this is what SDN drives me to.
"I can already see the mega-outages a bug at the controller level will cause..."
Totally agree. I bet we're going to see some spectacular failures that take down huge chunks of infrastructure, not only from bugs or poor design, but from good ole human mistakes. If you think understanding the network is hard now, wait until you can't understand it - where only the SDN controller truly knows what's going to happen when you make that next change.
Still, I really like the idea of SDN and it's uptake is inevitable. We're just going to need some really good modeling tools for SDN that allow you to test changes virtually before anything bad actually happens.
Having a controller doesn't immediately mean creating one very large failure domain. And I don't think anyone actually involved in the SDN architecture discussions has been saying that. This feels like unnecessary FUD to me.
First, not all controllers are active in the data path. It is entirely possible that architectures can be resilient to controller failures. Second, most architectures are likely to include redundant controllers. This concept isn't really that new in networking. And third, the likely use of controllers is in a federated (read: distributed) fashion. You won't have one controller to rule an entire network, because of the failure and maintenance domains.
What controllers do that the current distributed protocols do not is provide a global view of the network as a resource. This makes it possible to look at traffic patterns, application requirements, resource availability, etc and make intelligent decisions. The days of managing a large distributed system through pinpoint configuration control are coming to an end eventually.
Mike Bushong (@mbushong)
Plexxi
But if all your distributed controllers have the same bug or are sent the same commands with unintended consequences (in part due to their desired ability to hide the underlying complexity), then what happens? That's the big kaboom I think some of us are concerned about.
Just like it's a whole lot easier right now to take down 500 servers at once in a VMware environment with one wrong distributed switch config change than it ever was with 500 physical servers, same will hold true for SDN. Will the benefits of SDN outweigh the risk? Absolutely. The same way I;d never want to manage 500 physical servers ever again. I'm just not so convinced the road there will be as smooth as many (vendors) would like us to believe.
TaabuTheCat is indeed correct in this - one primary concern is that the blast radius (regardless of your specific architecture) is fundamentally larger.
Another point I neglected to make is that this brings us way back in terms of network stability as a whole. It'll be like the late 90's again in terms of OSPF/ISIS - running fresh builds, having outages because the implementations are just not mature. You can argue about architecture all you want, but the fact is that you just cannot afford outages. In some cases, it's better to solve the problem with existing protocols, rather than throw everything out and start again (However I would argue that _some_ protocols should be thrown out by default).
The fact is that SDN is only built for scale, and nobody running at that scale can afford _any_ downtime. I actually fully support most of the key arguments behind SDN, it's just that some of the principles seem to come directly from the VM world, and wont have a 1:1 translation to the networking world. I am a huge believer in automation, in partitioning services. However I'm also a believer in correctly architecting the network to your users requirements, and automating deploy time.
Lastly - it's not so much FUD as paranoia and skepticism, due to watching what sales promise go up in smoke with a frequency that's left me bitter right through to my core.
Speaking as someone who has been a consultant rather than an employee for the past 15 years, I think I can give you a few reasons why that is.
1) the average admin isn't smart enough to do a good job automating (so if they try it has so many problems either they give up or their boss makes them give up) Sometimes I try to use their halfassed attempts at automation as a starting point, but I always end up scrapping it and going from scratch
2) the average admin is too lazy to want to do the extra up front work to make this happen, versus just fighting the fires as they come - if they're overloaded, they're not smart enough to realize that a few evenings/weekends spent automating a few things they spend way too much time on will give them the free time they wish they had
3) they're worried they may "automate themselves out of a job"
When I set up scripts to stuff in minutes that people would otherwise do by hand in an hour or two (like configuring the SAN switches in a C3000 blade chassis, for instance) it is interesting to see the responses I get. Other consultants think it is really great and want to use it right away. Those regular admins that I had already identified as being the more clueful and more interesting in learning how things work, rather than how to do what their job requires and nothing else, are also interested.
Others are actively hostile - they like doing these sorts of mindless tasks, because it takes a while, it is visible, and assuming they've been given a cheat sheet doesn't require any active thought. They might have to do actual work if all the "easy" stuff is automated away...
Really too much to put in a comment box, so google "techopsguys sdn" for my analysis of SDN. I was able to ask the creator of SDN a couple key questions which I used to confirm my own beliefs of what SDN is, and I rip the SDN concept to shreds in a 2,000 word blog post complete with pictures and diagrams.
SDN is to networking as FCoE was to storage. It's all marketing.
Same goes for cloud but that is another rant.
Really the only people that can truly benefit from SDN are service providers that have massive scale(e.g. 50-100k systems and up), and have very very frequent changes. If your operating at say a few thousand systems or less SDN is just stupid. It fails to address the core problems of networking complexity.
If SDN helps you at smaller scale then you've probably done something horribly wrong with your network design before deploying SDN. Or you picked the wrong gear. I cover that in the post as well.
Perhaps the Juniper stuff looks cool because otherwise their gear is just too complicated to manage(there's been solutions on the market to handle that aspect for 15 years).