At my place it's not necessarily the monitoring that's the problem... but the alerting.
We've currently got an ageing Zenoss setup that we're going to be migrating to Prometheus. So the other day there was a memory-based issue on a JVM, the trigger for utilisation was passed and Zenoss sent a few emails out. But no-one in the team it goes to took any notice (for whatever reason). As a result the JVM wasn't very happy.
The leader of that team then said the problem was as the alert was an email, they get too many of them etc... So it was really important we get that sort of thing switched over to Prometheus so they could be alerted via a chatbot. Now he'd obviously forgotten but no so long ago they were receiving disk space alerts for an application via Prometheus/chatbot and you can probably guess what happened. Yes, that's right, there was too much noise from some other servers and the really important disk space alert was missed and the application ground to a halt.
Just moving to a newer/sexier monitoring/alerting platform won't always solve your problems. But on the other hand it will solve some of mine as I look after the current Zenoss setup, but the newer Prometheus/ELK stuff is run by another team. So the day I'm (mainly) no longer on the hook for any application monitoring will be a happy day indeed! I'll still have to worry about system monitoring, but that's far easier and less grief-laden.