* Posts by porlF

6 publicly visible posts • joined 6 Dec 2018

'Bigger is better' is back for hardware – without any obvious benefits

porlF

Re: In bioscience, bigger is sometimes better ...

@philstubbington .... I agree yes, profiling has become something of a lost art, or at least a less prevalent one in some aspects. We created tools to assist with profiling, but the barrier has always been the sheer weight of new and updated apps, rapid churn of favoured apps and the availability of time, money and researcher expertise to actually do this.

Any software tool implemented in an HPC/HTC environment will perform differently at different institutions, on different architectures and, more importantly, in different data centre ecosystems where scheduling, storage and networking can vary considerably. Software tools are rarely used in isolation and there is normally a workflow comprising several different/disparate tools and data movements. Ideally then, tools and entire workflows need to be re-profiled, and this is not cost effective if a workflow is only going to be used for one or two projects.

We had over 300 software items in our central catalogue, including local code but most of which were created at other places, plus an unknown number sitting in user homedirs, and there was no realistic way to keep on top of all of them. There are one or two companies out there who have specialised in profiling bioinformatics codes, and this will work well for a lab who is creating a standardised workflow, e.g. for bulk production sequencing or similar over a significant time period e.g. many months or years. We had a different problem, where a lot of the research was either cutting-edge or blue skies discovery, so nearly all workflows were new and experimental, and then soon discarded as the science marched forwards.

Bioinformatics codes are generally heavy on IO, so one of the quickest wins could be achieved by looking at the proximity of storage placement of input data, and the output location, and how to then minimise the number of steps required for it to be ingested by the next part of the workflow.

porlF

In bioscience, bigger is sometimes better ...

THE GOOD

I supported a biotech research computing environment for >3 decades and in my experience there were occasions when the availability of bigger systems enabled new research. For example, the emergence of TiB scale RAM systems (SGI UV) enabled the de novo sequencing of very large genomes (of plants) which, up to then, was impractical. Available software was written with much smaller genomes and smaller hardware platforms in mind, and it was inefficient and wobbly, but it worked. Phew!

Also, some researchers might not attempt ambitious computational tasks if they (mistakenly) believe it's not possible, when those of us in support groups can say "of course we can try, we just need to make the box bigger".

THE NOT YET PERFECT

Inefficient code delivered with cutting edge lab equipment required unnecessarily large resource to run. Some years ago, I spec'd and installed a multi-cabinet HPC/HTC cluster solution for the initial processing of data from a cutting-edge high throughput sequencing instrument .... only for the equipment vendor to later spend a bit of time optimising the code which meant it would now run on a single quad core desktop! This is the nature of cutting-edge laboratory coupled with cutting-edge software to handle it. The cluster capacity was then swallowed up with the unexpected volume of work to analyse the avalanche of (complex) processed data.

THE BAD

A lot of open-source research-grade software is written by people who are trying to prove a research objective, and will choose language, frameworks and platform that will just get the job done. Once achieved, there is precious little time or resource available to make the workflow or the code run efficiently, or even to re-write. This means a lot of HPC/HTC cluster time is lost to inefficient software and workflows ... and researchers use each other's code a lot, so useful (if inefficient or wobbly) software rapidly spreads within a global research community.

CONCLUSION

If we were to ask science to progress more slowly, and spend more time and money on software engineering, we could make much better use of existing hardware, but I end my comment with a defence of science that it would be wrong to do this in many cases. Sometimes the pace of change in science is so fast there is no time or funding available to optimise the many hundreds of apps used in research, much less to keep pace with advances in code libraries, hardware, storage and networking etc.. I feel the benefits of rapid advances in biological science often (not always) far outweigh the apparent background wastage of CPU cycles and IOs. Bigger is better if we want science to move ahead .. especially so when we need it to move fast ( remember covid? ).

Avira also mines imaginary internet money on customers' PCs

porlF

Parasitic

Allowing a (AV) corporate to parasitically do other work on your PC/laptop ... why would you?

1. Desktop/laptops are not optimised for 'coin mining, and will only do so very inefficiently. The AV companies are saving themselves the cost of running their own (efficient) mining operation in a bit-barn, and passing N x the cost on to AV end users. Gee thanks!

2. With energy costs on the rise, and predicted to increase by another 50% this year, do we really want to be be funding this?

3. Erm ... aren't we supposed to be concerned about conserving energy ..... you know, climate change and all that?

4. Can we trust their job engine to _only_ run the code they claim it runs? How would we know otherwise ( ha! install another AV product alongside!) ?

Best to swerve past this stuff, IMHO.

Big challenge with hardware subscriptions? Getting what we need, not what someone else wants us to have

porlF

The organisation I previously worked for was blessed with its own land and data centre facilities, and predominantly funded with large capital equipment grants, but with fixed expenditure deadlines and very little in the way of opex. Financial accounting rules therefore made the capital ownership of IT assets the only realistic option for large scale compute and storage.

From this fortunate, but less flexible basis, we built and operated a service that was, on a per-consumption-unit basis, very cost-efficient when compared with cloud and lease options.

Right to contest automated AI decision under review as part of UK government data protection consultation

porlF

G.I.G.O.

Aside from the political arguments for and against this approach, there is a long-standing principle of computer tech in that Garbage In results in Garbage Out.

Decision models based on any kind of acquired data, accurate or otherwise, are vulnerable to flawed interpretations and consequential flawed decisions.

Is the assumption going to be that models are going to be 100% accurate, both in source data and algorithm, and therefore no dispute is necessary? I mean, that does appear to be the implication!

The common sense of the IT and mathematical world should surely know that "AI" will not be 100% accurate and that we need a separate unbiased non-silicon method of arbitration?

Thoughts?

Huawei CFO poutine cuffs by Canadian cops after allegedly busting sanctions on Iran

porlF

Oversight

UK govt ( GCHQ and NCSC ) at least maintain a degree of oversight on Huawei. Whilst this is not a cast-iron guarantee against problems, it is encouraging to know that cooperation at this level has been taking place for a number of years.

https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/727415/20180717_HCSEC_Oversight_Board_Report_2018_-_FINAL.pdf