Re: In bioscience, bigger is sometimes better ...
@philstubbington .... I agree yes, profiling has become something of a lost art, or at least a less prevalent one in some aspects. We created tools to assist with profiling, but the barrier has always been the sheer weight of new and updated apps, rapid churn of favoured apps and the availability of time, money and researcher expertise to actually do this.
Any software tool implemented in an HPC/HTC environment will perform differently at different institutions, on different architectures and, more importantly, in different data centre ecosystems where scheduling, storage and networking can vary considerably. Software tools are rarely used in isolation and there is normally a workflow comprising several different/disparate tools and data movements. Ideally then, tools and entire workflows need to be re-profiled, and this is not cost effective if a workflow is only going to be used for one or two projects.
We had over 300 software items in our central catalogue, including local code but most of which were created at other places, plus an unknown number sitting in user homedirs, and there was no realistic way to keep on top of all of them. There are one or two companies out there who have specialised in profiling bioinformatics codes, and this will work well for a lab who is creating a standardised workflow, e.g. for bulk production sequencing or similar over a significant time period e.g. many months or years. We had a different problem, where a lot of the research was either cutting-edge or blue skies discovery, so nearly all workflows were new and experimental, and then soon discarded as the science marched forwards.
Bioinformatics codes are generally heavy on IO, so one of the quickest wins could be achieved by looking at the proximity of storage placement of input data, and the output location, and how to then minimise the number of steps required for it to be ingested by the next part of the workflow.