Excellent - please tell us,,,
What tool did you use to collate and map this inter-server network traffic, and then what tool did you use to simplify this map?
Legacy, or technical debt – call it what you will – has always been a major challenge to techies looking to move forward and never more so than now, as you're being asked to shift data centre software to the cloud. Possibly the biggest challenge in dealing with legacy is identifying who “owned” an application when it was …
> Maybe a quicker alternative would be to turn off the system you want to move to the cloud and see who (or what) complains!
I hope that was an attempt at humor. I spent the better part of a decade leading a production support team that had to manage thousands of legacy and new apps. Invariably someone would float that suggestion, with the presumption that any critical failures would be immediately visible. This is simply not true often enough to be reliable. In one case a SOX process was archiving data to assist in yearly compliance attestation. Turning it off could be quite a problem, but not for up to a year. Don't ever shut down a process to see who complains. Ever.
Cisco came out with Tetration, which can totally solve the "what the heck is my app talking to?!?" part of this issue. I saw it take a two year data center mapping project (which really means it would never be current and accurate) and finish it off in two weeks. As long as the dependencies occur within the time window you're observing, you'll have it totaly mapped. (you'd miss connections for a monthly report run, for example, if it didn't occur for you to observe)
Its' a little (a LOT) spendy for small shops today, but there will be far more affordable, smaller-scale options in the very near future. The things it can do sound unbelievable, but it works. Works well without Cisco based networks if you're the adventurous type who prefers buying tech from the "Other vendors" wedge on the market share pie chart. Of course it works even easier/faster/better with Cisco gear if you prefer to have widely available, supportable tech for which you can easily hire more engineers running your production network. (I'm a success bigot, not a Cisco bigot :) )
Whenever I work on one of these legacy projects, getting sniffers on or router output is like pushing an elephant up the stairs. The reluctance and hoarding of info (I'm looking at you, network, firewall and security teams) is like treacle.
The irony is that, once in the cloud, the right call to the right place would get me the appropriate credentials in AWS or whatever, so I could find it myself. Except as the author points out, it's all on 80 and 443 anyway.
>The reluctance and hoarding of info (I'm looking at you, network, firewall and security teams) is like treacle.
<hand wave/> These are not the teams you are looking for.
Firewall rules are often far too lax and firewall logs are often not what you hope for. Do you go back over a year to capture all the "annual report" traffic? Was that server used for the same applications a year ago as it is today? Has any functionality been migrated to a different host?
There is no substitute for documentation. Your processes need to include new pages in your application documentation and service catalogue updates or the (firewall etc) change should not be approved. You should probably have changes tied to the application service catalogue with sub changes for network, firewall etc. Make sure the template forces people to do the right thing. Decommissioning a host? Did the firewall team get a note saying if the IP address had moved or was now free?
For micro services, I'd put together a micro-service catalogue and put different services on different ports. There is no good reason to put everything on port 80/443 and it will leave you with a massive indecipherable headache to clean up. They may not be well-known ports but at least you can add a modicum of differentiation. For discovery, netstat is your friend. You can get the firewall people to give you an idea of the hosts & services but it is the application maintainers who need to be the authority.
...is your friend.
Microservices et al make this much much easier. They generally do not (should not?) use port 80 (low ports, boo hiss, evil root etc), but that's ok.
Any scaled up microservices setup will also include load-balancing and service liveness/performance monitors, so all that mapping is built in.
InterNetworks can go back to being dumb. Sorry CISCO et al.
check it out: https://github.com/Netflix/vizceral
Squared Up is the tool for you. Has a feature called Visual Application Discovery & Analysis (VADA). You just pick a starting point i.e. your application server, and it uses netstat commands through SCOM agent tasks to visually map out your application, even if that application is hosted across multiple servers\platforms\loadbalancers etc. Worth a look.
Nearly 10 years back I was using Tideway Systems ADM product (now part of BMC Atrium) to discover what was lurking in a client's extensive IT estate and helping to determine that a brand new system had an unforeseen dependency upon ageing NT servers that had been forgotten about as the application installed on it, simply worked and the only person left only had some documentation taking up space in their bottom drawer. Okay that was in a private datacentre with a mix of physical and virtual servers and not a multi-tenant public cloud, which presents an added layer of complexity.
Using the tool's full potential required some human input to identify and label the systems identified on the network maps, but once done and inputted into the systems monitoring and management system enabled individual system failures to be flagged as business application and business service impacts and outages.
I guess capturing sufficient traffic might get enough context for someone to figure out enough dependencies to move systems on successfully. But may not be entirely true. Legacy libraries/software stacks could be tough to track - because systems assumptions are deeply embedded and won't typically be openly coded in comms traffic, Without full legacy documentation and/or knowledgeable people, may be very expensive (or well nigh impossible) to shift existing code to newer operational context without significant re-coding.. Depending on scale, even using open source won't solve that either.
Such is the rate of change, perhaps means that aging systems (i.e. more than three years old) will often have to be given the heave-ho and sent to the great bit bucket in the sky. Nothing lasts forever!
The worst type of mess to deal with is where the development team was fired to reduce costs. When this happens the documentation (if any) usually is lost at the same time, the status of the applications (development, testing or production) is unknown, institutional knowledge is lost (must run application A before application B etc) and the result makes Internet Explorer look like a perfect program.
If you are given the job of trying to sort out such a mess - do yourself a favor and look for another job (unless you enjoy pain or you can get a VERY high hourly rate).
In my experience, a sizable majority have no useful documentation. Sure, your CMDB (one of them, since unfortunately in too many cases there is more than one!) may have a brief description like "Oracle DB server for application XXX" but that hardly helps. If you're lucky you'll have an entry for the database itself, which will hopefully extend to all the way up to the application, and tell what that application depends on to function and what else out there depends on this application to function.
Generally where you find something like this, it was done when the application was originally set up - the architect or implementation team had a clear view of how all the pieces fit together, so it was known. The problem is over time, new functionality gets added here and there, or pieces are swapped out for something else, and nothing gets updated, or at least not completely updated. So the information is not only incomplete, it is dangerously incorrect if you rely on it.
It is 10x worse in a bespoke environment, because eventually the people who really know it move on, at which point all hope of ever fully understanding it is lost even if they maintained perfect documentation until then. Updating documentation is the sort of job that never gets properly handed off, and management never knows/cares to insure it gets done when the guy doing it has moved on. It is more important to get tasks done that will impress upper management than to do the basic housekeeping that they don't know about.
One point I raise is that a bus might be in my future, either off or under, in which case my superior is going to find themselves in deep kitchen when they can't support the work. That was not a problem in the military; transfer or death was always a consideration.
It's simply too tempting to not ask:
Suppose EVERYTHING you've built and those before you built suddenly went dark and you had to re-start from scratch - what would the affect be and what would you do about it?
Simulate it. Your rebuild roadmap will emerge. Stretch that brain, one-two-three. One-two-three.