Another one reinvents nagios.
Alibaba Cloud reveals network telemetry tool that helped cut number of engineers needed by 86%
Alibaba Cloud has detailed the telemetry tool it uses to look out for glitches in customers' virtual networks, and revealed it’s reduced the number of personnel dedicated to troubleshooting by 86 percent since developing the system. Detailed in a paper poetically entitled "Proactive Telemetry in Large-Scale Multi-Tenant Cloud …
COMMENTS
-
-
Tuesday 16th April 2024 19:10 GMT Anonymous Coward
The paper outlines some key differences
The architecture is interesting, but it is mostly agent based and needs integration into the network stack at a bunch of points to work. So even if it were published or leaked, it would take a bunch of effort to integrate with a different cloud/virtual network stack. That said the approach has it's merits as it's light and scalable.
One of the interesting points is using Arp to probe hosts that aren't running the agent. As the paper says it's light, unlikely to break anything, and will work on most IP enabled hosts. You don't see arpings or syn probing much these days.
Nagios runs an agent on a dedicated port, and all that chatter adds up at scale.
-
Wednesday 17th April 2024 06:52 GMT Anonymous Coward
Voodoo
“The tool comprises the following elements:
A data plane that uses host agent and arp-ping to protect tenants' privacy and defines an elegant generalization of ping and traceroute, which can work on heterogeneous middleboxes;
A control plane that conducts update batch processing and substantial probing path pruning to lessen the overhead;
An analysis plane that reduces noise and aggregates alerts based on temporal and spatial correlation and conducts the hop-by-hop telemetry mode to locate failures.”
… with added Voodoo and MBA-speak.
-