I wonder if CloudFlail host the front-end for this one also?
43 posts • joined 11 Jul 2008
So, we should name the leaks instead of actually fixing the broken system which is currently in place? Yupp, it's the IETF at work again!
I'm looking at you, ROA, IRR sets and manual configuration changes. Until we are able to validate a route inside of the BGP process, we will continue to see route leaks.
Lastly: A niggly point, but Telias outage was due to an IS-IS cost, not a BGP leak.
Well, storage is actually gonna be a bit of a pain on cheaper 10G switches.... and NIC's (Actually - side note, our 10G NICs are probably 3x to 4x the cost of the ports they connect to)
If you're using something older that's got poor buffers (Or poor buffering architecture!), media change and store-and-forward, you're gonna have a bad time with drops in your network, specifically around SAN and storage (Since they can pump out bits quite fast for the cost)
In general, when the network tends towards melt-down the effects are more widespread than a single server going down, which is why I'm always careful in purchasing. An under performing network also is SUPER costly to re-provision (As is the staff to make it happen).
These are the reasons that I'm quite hesitant to go down the route of junkboxing. Even whiteboxing does not make sense at small scale (Since it's back to the engineer cost for setting up/maintaining/developing them exceeds the cost of the network at small scale).
I read the article again, and I still see my points as valid.
Quick maths to show:
40x10G switch @ 1,500 GBP -> 37,5GBP /port.
I'm currently looking at about 100GBP/10G port. Granted it's about 2.5x the price, but it's a) new, b) under warrenty, c) having features actively developed, d) not going to die of old age/degrading soldering/degrading components and e) has support costs included.
Also, my power draw is about 5w/port. We cannot calculate what the actual power draw is - but on older kit, I would wager it's quite higher, which will end up costing more in the long run in a DC (Where power IS a premium).
Lastly, my docs are current, and the amount of engineer hours spent on each device is minimal (I've got many devices in my network I've never had to login to - auto provision, monitoring in place, work flawlessly - the kind of features you dont find on older kit).
So I'd respectfully ask you to re-read my original comment and try follow my logic of breaking down pricing into per-unit.
For sure I'll give you that server/computer related stuff - totally makes sense to do down that route. I'm just not convinced on the networking side.
Everything comes at a price, and the phrase "pay in peanuts, get monkeys" springs to mind.
Yes, you can get a cheap 10G switch - but what's it buffering like? Feature support? Supportability in production when things go down?
Honestly, it's not helpful to say "A switch for X currency" - we should be talking about price/port and watts/port. Both will give you a better measure of what you're actually looking at. Also, if the port is routed or switched (A routed port will probably be 20x more the cost of a switched port)
I guess it comes down to how often you expect the network to go down, or performance expectations (And if you're running big data or not, as if you are, good luck with cheap switches). These days, I'm finding that expectations for uptime on the network layer is 100% and a corresponding 0% packet loss.
Beer, because that's what cheap switches drive me to.
To be fair, you dont leave your management interfaces exposed to the public at large when you're talking about networking devices.
*Even* if you're running 10-15 year old kit that does not support SSH, it's in it's own private/management VLAN that's heavily firewalled/no access to the outside world.
Re: "Knowing these photos were deleted a long time ago"
By "cloud" I assume you mean CDN, and if that's the case - there's two operations a CDN needs to do to be considered a CDN - load content onto the CDN, and invalidate assets at the edge.
Same goes for "cloud" - reclaiming freed up storage actually makes financial sense at scale.
Re: Xperias are great
Also a huge fan of the xperia lines. I've had an activ, ZR and Z1C in the last 2 years, extremely happy with them all (Apart from some annoying teething issues/bugs in 1st/2nd firmware release).
The quality of low light pics from the Z1C is astounding. Battery life, well, I get to make fun of the poor iPhone users who get maybe a day.
Form factor of the Z1C is basically perfect for me. It's just a shame that the glass back was repalced w/ some plastic stuff last minute, which tends to scratch all too easily.
Have one for about a month now. The 4.2.2 update was pushed to me last week, so you might want to update the article.
I'm upgrading from the ZR, which I absolutely love. The compact is a really awesome phone. I get 3 days with heavy usage (WiFi,4G,GPS). Early OS revs were a bit fruity, but it seems that most issues are addressed now. The camera is really impressive, and the AR mode is hilariously awesome.
The only negative things I can say about the phone is that the lack of option to disable vibrate on touch is *annoying* and the plastic back scratches all too easily.
I'm surprised at this for a number of reasons.
Any single competent network engineer would just drop AS13414 at the border, and be done with it (And AS35995/AS54888, since Twitter have multiple AS's). However that's not the case... So I can only assume the technical advisor to the government either didnt want to have a functional ban in place, or was not competent enough to advise correctly.
I'm also surprised that Twitter dont start the whack-a-mole game of spinning up EC2 instances as light weight reverse proxies for their service, then GeoDNS to target Turkish users.
Hang on, seems like you guys missed out the real fun in this story
Q: How and when did Cisco find out about this issue?
A: Cisco first became aware of this issue in December 2010.
Only in late 2012 did field failures and supplier review data point to a potential customer impact
So it took an engineering company 2 years to figure out it was bad memory, and the same time frame to admit to customers they were exploring an issue (They denied all knowledge until this PR stunt was ready to be pushed out the door).
I dont know about you, but that seems like a very long time, a significant portion of the shelf life of most of these products.
Re: Right on.
TaabuTheCat is indeed correct in this - one primary concern is that the blast radius (regardless of your specific architecture) is fundamentally larger.
Another point I neglected to make is that this brings us way back in terms of network stability as a whole. It'll be like the late 90's again in terms of OSPF/ISIS - running fresh builds, having outages because the implementations are just not mature. You can argue about architecture all you want, but the fact is that you just cannot afford outages. In some cases, it's better to solve the problem with existing protocols, rather than throw everything out and start again (However I would argue that _some_ protocols should be thrown out by default).
The fact is that SDN is only built for scale, and nobody running at that scale can afford _any_ downtime. I actually fully support most of the key arguments behind SDN, it's just that some of the principles seem to come directly from the VM world, and wont have a 1:1 translation to the networking world. I am a huge believer in automation, in partitioning services. However I'm also a believer in correctly architecting the network to your users requirements, and automating deploy time.
Lastly - it's not so much FUD as paranoia and skepticism, due to watching what sales promise go up in smoke with a frequency that's left me bitter right through to my core.
So we have finally started to convince the world that large layer 2 domains are a very bad idea (think: blast radius), and that lots of individual devices working together (Distributed) is much better than centralising everything...
Until along came SDN and re-centralises everything once again (Controller wise). I can already see the mega-outages a bug at the controller level will cause, or the lack of individual node optimisation this will cause.
It's not that I think SDN is a bad idea, I just think it's half baked right now. I also think that simplicity in design, and a good provisioning tool and excellent engineers will trump the cost of SDN, in terms of man hours, resources and impact.
Beer; because this is what SDN drives me to.
Can any of you tell malice from incompetence? Why assume malevolence when incompetence is the more likely answer.
This saga is non-trivial, and it's got many moving points, namely:
1. Netflix's peering policy ( https://signup.netflix.com/openconnect/guidelines , min 2Gbit _each way_ at 95th percentile).
2. Netflix's Open Connect program - ISP's dont want to lose rack space + power to these boxes.
3. ISP's want to make the content providers pay for content traversing their network.
What's more than likely happening is that Netflix traffic is taking the congested path (They are a victim of their own success) inside of Verizon's network. I dont think there is malevolence involved here, however at the same time, there's nothing to be gained right now for Verizon. They're not losing customers, and _if_ they do lose customers over this Netflix saga, it's the customers who tend to cost them more in transit bills.
Been saying it for years - Cisco is going to lose the switching market, and probably the routing market after that.
I see them being relevant in corporate IT (Voice, user access) and servers (I'm guessing that their switching market share will collapse into the embedded switches they're building into UCS chassis).
I'm not 100% sure that home grown devices are mature enough right now for the regular DC. However from an OS perspective, Cumulus Networks should change this in the coming years, and hopefully a larger install base of Trident I / II based boxes will also level the playing field.
Interesting times ahead, and not even one mention of SDN (d'oh!)
What I'd really like to see is event correlation in an intelligent way...
By parsing flow data in almost-real time, looking for patterns in syslog, interface changes (ie: flap, or an interface counter going +/- X% across samples), and snarfing up accounting data. Hell, even take an iBGP feed of updates from my eBGP peers, and a feed of OSPF LSA's and correlate an event with a specific set of updates. There's so much room for correlation, there's just nothing about that I've found that works for me.
I think overall, we have all the tools we need to do this, but the time needed to integrate them all, make them talk nicely, and set intelligent thresholds, relative thresholds and even a little historical predication based on previous events is just not worth it. Lately, instead of spending time on this, I'm fighting to get nfsen/observium/smokeping/homegrown scripts to talk together and give me a coherent view of my traffic patterns.
I'm battling with the stupidity of SNMP traps, SNMP's format and the absurdity of 5 minute samples when I have 40Gbit interfaces.
Monitoring makes me stabby.
In this case, it's a problem with how WiFi is setup, rather than TCP. WiFi is a shared medium, so you're going to get collisions (CSMA/CA attempts to give fair access to the medium). TCP is affected as it sees a collision as a drop, so it scales back throughput wise. That's why they're dropping new sessions, and giving priority to the existing data flows (It's kinda cheating, throughput wise).
TCP has quite a few throughput hacks in it (Window Scaling, SACK, binary backoff Vs exponential etc), and is quite predictable and mature. The real issue here is wireless ethernet being "non switched", thus having collisions and packet loss with many users.
RE: Unified communications. Pfaw
>> The reality is that no matter how many lines of communication you offer a given employee, you can't change the fundamental fact that it is their CHOICE to respond or not.
Agree, however you have presence awareness. At least now the boss knows the employee is unable to respond.
Nope, one machine can never live up to that level of service :)
But, say you have a farm of 10 machines behind a load balancer (in fault tolerant mode), with the correct type of probing configured on the load balancers (so that machines which still ping, but say, dont serve out HTTP, are not used for a load balancing decision), 99.999% should be an attainable goal.
You hardly think that MS or Apple use just one machine for these types of services?
I'd say their setup is a little more complicated than that, I'd suspect that they have GSLB in place, to direct you to the nearest (geographically) serverfarm.
Anything less than 99.999% these days is just not acceptable from a corproate entity.
Paris, cause she's had more downtime than Ubuntu's site..