back to article Academics tell Brit MPs to check the software used when considering reproducibility in science and tech research

Brit MPs are being encouraged to pay attention to the role software plays as they prepare a report on reproducibility in the science and technology industry, which adds around £36bn to the economy. According to joint academics group the Software Sustainability Institute, about 69 per cent of research is produced with …

  1. Anonymous Coward
    Anonymous Coward

    Career development disguised as science

    The rewards are given for publication counts and quality tokens (publication in top rated journals). Long-term work on thorough methods development is not well rewarded. And it should not be forgotten that the art of good bias design - difficult to detect and easy to defend - is highly profitable when judiciously applied.

    1. Derek Jones

      Re: Career development disguised as science

      The other issue is that many of those writing the software are new to programming.

      This post generated a lot of discussion of the research software mailing list:

      https://shape-of-code.com/2021/02/21/research-software-code-is-likely-to-remain-a-tangled-mess/

      1. Doctor Syntax Silver badge

        Re: Career development disguised as science

        Software can be one of the tools of the scientist's trade. If it is then there's no reason why the skills to write it shouldn't be learned to a suitably proficient standard. If you use microscopy you need to know how to set up the microscope. If you need some piece of S/W that doesn't exist you need to know how to write something that isn't a tangled mess.

        Having said that it doesn't follow that everyone has the appropriate talent; over 50 years ago it was decided that ?everyone in our lab would go on a FORTRAN course. (Oddly enough SWMBO didn't, nor did our head of lab who sent the rest of us.) One of the research students who went on the course alongside me got back to the lab and proceeded to demonstrate that you can write a BASIC program in any language.

        It also doesn't follow that a complex piece of S/W will be within the capabilities of a beginner. That's where experience comes into play but that applies whether it's someone writing S/W for a living or as a tool of some other trade. It may also be necessary to consult someone who is a professional developer just as it may be necessary to consult someone who's a professional statistician to get the analysis right.

    2. Ken Hagan Gold badge

      Re: Career development disguised as science

      Rewards lose their value if the next person to come along gets a different answer. It does your career no favours to be remembered as the one who led the way down some notorious blind alley. On the other hand, being the person who published the corrective paper can be very rewarding.

      The sociologists might want science to be just another branch of politics, but Nature always has the final word, so it isn't. Also, do you really think that the rest of the community doesn't see through someone who bacon slices a load of questionable studies? Of course they do. That kind of questioning is literally their day job.

      1. Anonymous Coward
        Anonymous Coward

        Re: Career development disguised as science

        Reminds me of the 'code' that Neil Ferguson wrote to base his (now discredited) lockdown hypothesis on that was torn to shreds by actual software engineers:

        https://github.com/mrc-ide/covid-sim/issues/165

      2. W.S.Gosset

        Re: Career development disguised as science

        Ken:

        > do you really think that the rest of the community doesn't see through someone who bacon slices a load of questionable studies? Of course they do. That kind of questioning is literally their day job.

        But saying what they know can lose them that day job.

        And their entire career.

        See my comment (far) below for one field that exemplifies this syndrome. E.g., a major scientific journal's topic-entire editorial group was sacked, and their boss, 6 people in all, for allowing publication of a single paper that questioned just such "bacon slicing". Another major journal's editor in chief was sacked for allowing a different topic's (same field) dissenter to be published.

        Subsequent papers which completely destroyed the bacon slicing rather than question it, have never successfully been published in a journal. (Eventually resorted to creating a blog.) The journals had learnt their lesson.

        I.e., I love the theory, I grew up believing it. But it is extraordinarily hijackable via externalities.

        1. TheBadja

          Re: Career development disguised as science

          Someone said that scientific paradigms don't die because new studies disputing the findings, they die because the scientists supporting the paradigm die.

    3. The Man Who Fell To Earth Silver badge
      Boffin

      Re: Career development disguised as science

      When I was a Physics grad student in the 1980's, I remember Gene Wells, who was the editor of Physical Review Letters, which was & still is the "top" journal for cutting edge Physics, telling me that the more cutting edge the journal, the more stuff it will end up publishing that does not stand the test of time. It's the very nature of being on the bleeding edge.

      That also translates into article citations. If the article has a lot of citations right after publication, but then basically none after, it's probably because it's considered flawed. But that spike in citations still contributes to the authors h-index, which goes a long way when it comes to tenure.

      1. EBG

        this is true

        and it shows the need to professional interface layer between academic research science and policy. Sadly this function was vapourised in the UK in the 1990's. There is a bigger need than ever for a beefed up version of the traditional scientific civil service.

        1. M. T. Ness

          Re: this is true

          The importance of mediators is underrated. Fred Brooks (in Man-Month) highlights the essential role of leaders with broad knowledge and experience. Only with their help will there be fruitful communication between systems and people that do not communicate well unmediated.

          Mediators will be very powerful, and my experience is that the tops of the hierarchy only rarely tolerate them.

          A related question: How is it possible to recruit the right people, and how can their competences be maintained? I have no answer.

  2. Pascal Monett Silver badge
    Coat

    "an acute issue that can be swiftly resolved"

    So a crisis is something that can be quickly resolved.

    Then climate change is endemic to our civilization.

    Yup, sounds about right.

    1. Mike 137 Silver badge

      Re: "an acute issue that can be swiftly resolved"

      Strictly, a crisis is what we now call a 'tipping point' due to the misapplication of 'crisis' to mean anything serious that's occurring.

      'Crisis' is derived from the same root as 'critic' and 'critical', all relating to the ancient Greek for 'to decide'.

      The good old OED [vol 2, 1933ed.] defines it as" A vitally important decisive stage in the development of anything; a turning point" . Hence, when a nuclear reactor goes critical, it's the point where the reaction becomes self-sustaining, but a crisis can by definition not persist, so an 'ongoing crisis' is a nonsense.

    2. TheMeerkat

      Re: "an acute issue that can be swiftly resolved"

      Clumsy change predictions are based on computer modelling - I.e. software.

      The software written in science is influencing politics, but the quality of the software is questionable…

      1. W.S.Gosset

        Re: "an acute issue that can be swiftly resolved"

        ^questionable^risible

  3. DJV Silver badge

    What's the betting that...

    ...in most cases the software used will be Excel (probably an old unsupported version) that's running lots of macros that were programmed by someone who no longer works for the institution, and no one there has any clue how it works, so they will leave it to do its "magic" until the day it dies*.

    * which will be 2 weeks before it needs to be used for something super-critical whose deadline can't be moved.

    1. Arthur the cat Silver badge

      Re: What's the betting that...

      No idea why you're getting downvotes. I do know for certain that the government usually insists that computational deliverables from their research contracts are done as Excel spreadsheets rather than languages like Python, simply because Excel is regarded as universally available.

      1. Anonymous Coward
        Anonymous Coward

        Re: What's the betting that...

        >I do know for certain that the government usually insists that computational deliverables from their research contracts are done as Excel spreadsheets

        Your certainty is very much misplaced.

        1. Arthur the cat Silver badge

          Re: What's the betting that...

          My certainty is based on the last dozen or so government contracts (from BEIS and DEFRA) that my wife has worked on. Only once was she allowed to deliver something other than Excel (code in R), but they wanted Excel spreadsheets for all the ancillary stuff around it.

          1. Anonymous Coward
            Anonymous Coward

            Re: What's the betting that...

            Your second hand experience of your missus's contracts with a couple of pretty minor departments is not really a strong basis for telling the rest of us how "government" usually works now is it?

            Central government - like every other organisation in the land - covers the full gamut of analytical and research technologies. From underfunded csvs-and-excel teams scraping out a paper a year through dyed-in-the-wool corporate SAS/SPSS/etc types to all-singing-all-dancing kubernetes-and-notebooks on-demand labs environments run by the hipsters at ONS: it's all there somewhere.

            1. Arthur the cat Silver badge

              Re: What's the betting that...

              Sure, there's a wide spectrum used, I would never deny it(*). But whereas Excel may not be the mean in terms of complexity, I'd bet good money it's the median and the mode. As my friend in the Cabinet Office(**) says "ministers do like their spreadsheets".

              (*) And I greatly admire the ONS's work, but would note a lot of their output is made available as Excel spreadsheets however they produced it.

              (**) His other comments on ministers are usually very interesting, but best not published for a variety of reasons.

    2. navidier

      Re: What's the betting that...

      ...in most cases the software used will be Excel (probably an old unsupported version) that's running lots of macros that were programmed by someone who no longer works for the institution, and no one there has any clue how it works, so they will leave it to do its "magic" until the day it dies*.

      * which will be 2 weeks before it needs to be used for something super-critical whose deadline can't be moved.

      Hmm, looking at the latest version of our software:

      $ ls -R $BASE_DIR > foo

      $ find $BASE_DIR -type f | wc

      87781 87781 11047154

      $ grep xls foo

      No data

      $ grep '[.]cc$' foo | wc

      15382 15382 356232

      $ grep '[.]h$' foo|wc

      13890 13890 284324

      grep '[.]cpp$' foo | wc

      785 785 16326

      $ grep '[.]hpp$' foo | wc

      5 5 104

      $ grep '[.]py$' foo | wc

      17493 17493 471904

      $ grep '[.]sh$' foo | wc

      831 831 15764

      $ grep '[.]xml$' foo | wc

      5599 5599 95946

      I think I can confidently say there's not much Excel in our analyses. ROOT, of course, is another matter.

      1. Arthur the cat Silver badge

        Re: What's the betting that...

        I think you need to know about the -l flag to wc

        1. navidier

          Re: What's the betting that...

          > I think you need to know about the -l flag to wc

          It's three extra characters to type...

          1. Cederic Silver badge

            Re: What's the betting that...

            And thus we see the difference between the scientist and the software engineer.

            One of those communities is inherently lazy, and will add three extra characters because they know it's going to make life much much easier for everybody involved.

    3. TRT Silver badge

      Re: What's the betting that...

      Unfortunately Excel does tend to be heavily used. It shouldn't be. It's totally unsuited to most tasks that end up in publications. I've spent many hours correcting errors introduced by the kinds of slips and trips that are far too easy to make in Excel.

      1. Ken Hagan Gold badge

        Re: What's the betting that...

        This depends on your field, surely? The KISS principle applies (with Einstein's caveat) and a spreadsheet is a perfectly reasonable way to start playing with some data.

        1. TRT Silver badge

          Re: What's the betting that...

          Start playing with small amounts of data yes. You will note I said about publication data. Copying and pasting thousands of data points, calculating statistical error values etc.., importing CSV files of genetic data for example where genes called OCT get interpreted as dates in October. All fraught with problems.

          There are far better ways of playing with data eg OpenRefine or SPSS or a dozen other programs designed for the task.

        2. that one in the corner Silver badge

          Re: What's the betting that...

          *Start* playing with data, maybe.

          When you have an idea that there is something to the found in the data, move onto something - more robust.

          1. Arthur the cat Silver badge
            Unhappy

            Re: What's the betting that...

            When you have an idea that there is something to the found in the data, move onto something - more robust.

            Nice idea, but all too often that will be done "when there's enough time". And there's never enough time.

            1. TRT Silver badge

              Re: What's the betting that...

              TBBH that's a false economy of time. I recognise what you're saying but one learns that having to repeat everything with a more robust analysis anyway means one is better off doing it properly the first time around.

  4. Ian Johnston Silver badge

    The whole point about reproducibility is that it eliminates the possibility that the software (or the instruments, or the power supply, or the orientation of the lab bench or ...) is responsible for the results. If they can't be reproduced with a different set up, great. If they can't it might be of interest to investigate why, at which point all these details can be considered.

    None of this applies to psychology, which is almost all irreproducible because it's a load of bollocks. Technical scientific term there.

    1. tiggity Silver badge

      And let us not forget that in many papers published no detailed raw data in the paper / linked online, maybe you will get a few graphs & brief description of methodology and results (the cynic in me notes that in many cases you look for details on statistical significance of the results and its not present) ... if you want the actual raw data its normally a matter of contacting the paper authors & hoping they can oblige.

      Disclosure - Before IT I worked in biosciences research & still "follow" various research areas that interest

      me, subjectively it does seem that there's more pressure on researchers to churn out papers quickly and often & a fair amount of paper number inflation now (i.e. work "fragments" dribbled out over several papers that not so long ago would have just been published in a single paper, turning 1 chunky paper into 4 smaller ones dealing with different facets of the same work).

      1. Mike 137 Silver badge

        Not a new phenomenon

        "work "fragments" dribbled out over several papers that not so long ago would have just been published in a single paper,"

        Even in the late '80s to early '90s when I was working in science (both physics and ecology) this was recognised and referred to as 'ravioli publishing' (lots of little bite size pieces).

        1. Ken Hagan Gold badge

          Re: Not a new phenomenon

          The term I heard was bacon slicing, but this was from the same era so I would advise younger readers to be a little suspicious of claims that things have only recently started to go downhill. In truth, as the Ancient Greeks knew, the Golden Age (of anything) was always before your time, my son, and everything is shit now.

      2. Ian Johnston Silver badge

        The lack of raw data is understandable because journals simply could not print it. That in turn reinforces what most of us think; that the whole idea of printed journals is ludicrously outdated. In many areas of science they are completely irrelevant, because by the time a paper is printed everybody who cares has read it from a preprint server.

        The only purpose journals serve is one for which they were never intended; as proxy measures of the worth of a particular piece of research. The whole concept of a "high-impact journal" is a piece of self-perpetuating nonsense.

        The whole system of dissemination needs urgent and radical overhaul. Open access journals aren't the answer, because they "publish" any old crap. On the other hand a revamped version of the reviewer system isn't the answer either, because it's inherently unreliable, rewards personal connection over actual worth and demonstrably discriminates against non-white and women researchers.

        A preprint server with a star rating system? Weighted by the star weight of the reviewers, maybe?

        And yes, there are far, far, far too many papers, most of which are wrong and almost all of which are irrelevant. It's an artefact of the stupidly gladiatorial funding system. I have a friend who is a very senior tenured humanities professor in the US. He writes a book every five years.

        1. Doctor Syntax Silver badge

          "the reviewer system isn't the answer either"

          Eventually the community decides on reputation. What deserves to be cited multiple times gets cited multiple times although not necessarily for the reasons the author might have wanted.

          1. Ian Johnston Silver badge

            Yes, I agree. Trying to establish the merit of a paper in advance is pointless. Citations are better, but we know that papers get cited less when they are known to be written by women, so there are serious issues there too.

            1. LybsterRoy Silver badge

              and its very easy to establish a virtue chain of citation where no-one citing has read the paper they're citing

              1. Ian Johnston Silver badge

                Of course. I've done it myself. It's a PhD thesis thing, in particular. There are certain papers you have to cite, because everyone does, whether or not you have actually read them.

                1. TRT Silver badge

                  One of my jobs was to trace back to the root citation anything that was included in one of our papers. I uncovered quite a few mis-citations. Not every laboratory was as thorough as ours. And that was down to the ethos of just the head of our group. He understood both rigour and reputation.

        2. LybsterRoy Silver badge

          One big problem is how do you prove the reviewers have actually reviewed anything rather than just saying its a great piece of work?

      3. Derek Jones

        and then ignore email requests for the data

        And when you email asking for the data, the researchers don't reply, or at least 30% of them don't:

        https://shape-of-code.com/2017/02/02/i-have-been-reading-your-interesting-paper/

        However, the situation with regard to data being made available on sites such as Zenodo is getting better.

        1. Ian Johnston Silver badge

          Re: and then ignore email requests for the data

          And when you email asking for the data, the researchers don't reply, or at least 30% of them don't:

          And understandably so. In fact, I am surprised the author got as much data as they did. To many researchers data is the most valuable thing they have, derived with great efforts and at great expense (to someone) and it's hard to see why someone else should get it just for asking. This is particularly the case when analysis hasn't finished.

          There will also be inevitable worries about another Schlafly / acetate situation.

      4. W.S.Gosset

        Insider tips: peer-reviewed journals

        > And let us not forget that in many papers published no detailed raw data in the paper

        Supplementary Materials are NOT peer-reviewed. They are not even released to the reviewers -- they have to take it on trust.

        "Scientists" can and do get away with murder that way.

        Source: Nature's formal written advice to an author protesting an apparently fraudulent "reply" to his work, which he released. Nature is a, if not the, Gold Standard journal globally.

    2. Arthur the cat Silver badge

      The whole point about reproducibility is that it eliminates the possibility that the software is responsible for the results.

      I don't know whether it's still the case, but back in the 70s/80s there was a semi-joke amongst particle physicists that half the particles being found by the various accelerators round the world(*) were actually bugs in the Fortran code that was used for analysis everywhere.

      (*) Those were the days when accelerators were small and cheap enough that rich countries could afford their own rather than having to share one world wide one.

      1. Yet Another Anonymous coward Silver badge

        The LIGO (gravitation wave detector) team actually have a separate group whose job is to inject false results into the system to check that they are detected and properly discarded.

        Apparently when they turned it on and had that spectacular result almost immediately it took a lot of convincing that it wasn't an event injection.

    3. TRT Silver badge

      Although one can get discrepancies when data are analysed by different versions of the same software even.

      But will the Central IT allow people to freeze the software versions on their machines until the study is completed? Not a chance.

      1. Anonymous Coward
        Anonymous Coward

        They will in research-heavy organisations, yes. It's a key requirement when working with the likes of pharma or life sciences to be able to set in stone and verify a combination of software and data, exactly to enable this kind of working pattern.

        1. TRT Silver badge

          Someone should let them know then. I've* been banging my head on a brick wall about this for over 5 years.

          *I work in life-sciences in a research-heavy organisation supported by a central IT service that's totally focussed on "The Student Experience". And when they say "student" they mean "undergraduate student".

          1. Paul Johnston
            Happy

            Sounds familiar

            Is it raining where you are?

            1. Ian Johnston Silver badge

              Re: Sounds familiar

              I'm in Scotland. It's always bloody raining, or about to bloody rain.

            2. TRT Silver badge

              Re: Sounds familiar

              *checks out of window*

              Erm... yes.

              Though it may just be being pissed on from a great height. I'm not looking up to check just in case.

      2. W.S.Gosset

        > Although one can get discrepancies when data are analysed by different versions of the same software even.

        Compilers also.

        When GISSTEMP (NASA's temperature data creator/3rd-level adjustor for their official public GISS dataset) got leaked, reviewers discovered you got very different results depending on which version of compiler you used.

    4. LybsterRoy Silver badge

      You missed a few:

      sociology

      economics

      climate

      just the first three that came to me

      1. EBG

        your downvotes are due to listing climate. That gets a free pass.

  5. Paul Crawford Silver badge

    Documented software and a scripted demonstration that it builds/runs on a clean OS system should be published with any peer-reviewed paper?

    Would send shock waves through the industry but might just force some sound practice and the ability to analyse the process.

    1. Paul Crawford Silver badge

      The above would rule out GNU radio being used...

    2. Ian Johnston Silver badge

      How far do you go with that? Should any paper with a plot produced by Gnuplot have to include the full Gnuplot source (at the version used) in case apparently interesting features are actually artefacts? Can I analyse data using Matlab or do the hard maths with Maple, both of which are closed source?

      1. Anonymous Coward
        Anonymous Coward

        +1 etc to this. This is a genuinely hard problem, especially when you get into some of the more large-scale stuff happening with ML models. Sure, I could give you my code, and I can even tell you exactly which commit to use in the underlying library and what toolchain to build it with.

        But unless you've got a couple of hundred thousand dollars kicking around to download, store and process the training data and re-run the validations you've not got a snowball's hope of reproducing my work.

        We need to be a lot more clever than "just publish the source" or "do more testing"

        1. W.S.Gosset

          > We need to be a lot more clever than "just publish the source" or "do more testing"

          Edge cases should not be used to block profound benefit for the other 99.9999% of cases.

          If edge cases need additional and special treatment, they can each be considered and handled specially, separately, and severably.

          You are reading this on the web: a 50% demo intended solely to bump a similar "every single possible possibility must be entirely structurally captured at fundamental protocol level before we can even start" discussion which had locked on edge cases for years. I would suggest that we have had substantially more value from that approach than from waiting for the perfection discussion to resolve.

      2. Jellied Eel Silver badge

        Papers should include the data and methodology used in the paper, or as supplementary information. This doesn't always happen. One of my favorite examples came from climate science where a novel technique called 'Rhamstorf smoothing' was used. And it eventually turned out to have been a standard triangle filter. Other times, data aren't included at all.

        Not sure what the solution should be, ie including documented source for a climate model might be excessive, but it should be open to peer review. And I guess how much software engineerin we should expect.

      3. Paul Crawford Silver badge

        For generally available software then yes, the version numbers should be included just in case.

        Must it be open source? No, but it needs to be reproducible so your data run on my machine using XYZ's software gives the same result. And the method(s) used also possible with something that is open-source in broadest sense (i.e. can be inspected, need not be GPL or any other specific license).

        1. breakfast Silver badge

          It seems to me that alongside this, if you're using custom code along with your standard tools and that code is required for reproducibility, that absolutely should be open sourced and published alongside the paper.

  6. Anonymous Coward
    Anonymous Coward

    >"is infrequently subjected to the same scientific rigour as is applied to more conventional experimental apparatus."

    Part of me very much wants to argue that it probably shouldn't be subject to the same rigours, because fixing a broken/invalid piece of software is almost always a damned sight cheaper, faster and easier than replacing a possibly-very-expensive bit of kit, as it is generally cheaper/easier to validate the correctness of a piece of software. There is value to agility in software, and serious risk in saying we need to put the same levels of effort and cost into assuring the intangible as we do the tangible.

    We should absolutely be talking about software assurance and how to properly review software assets, but starting from "well we do this with the physical kit" is probably the wrong place to start.

    Ultimately it's a question of costs and funds. Academics are incentivised to publish their results. They are not necessarily incentivised to spend an extra few weeks tidying up their code data and getting them published on an equal footing.

    We also need to be careful conflating the "reproducibility crisis" as published by Ioannidis with issues of software quality, version control and provenance as discussed in the rest of the article. Ioannidis identified serious, structural and fundamental issues in research practices producing studies across medicine, and none of them had anything to do with technical issues like kit or code. Higher quality or peer-reviewed research software wouldn't have made the issues identified by Ioannidis any better.

    1. Doctor Syntax Silver badge

      "Where's your apparatus?"

      "Back in the Quickfit drawer."

  7. elsergiovolador Silver badge

    "scientists"

    The scientific community does nothing to root out hacks from its ranks.

    The litmus test I am using as a benchmark for rubbishness of the community is cannabis research coming from its bowels.

    Inadequate control groups or lack thereof, massaging the data, fitting results to match the politically correct outcome and so on.

    Not heard of any "scientist" being fired, department closed or defunded. Not a word of critique from the community.

    Then that "research" gets widely published and is being used to propel the governments' propaganda.

    It used to be that "scientists" could only receive grants if their research demonstrated bad effects of cannabis use.

    There is hope though Covid-19: Researcher blows the whistle on data integrity issues in Pfizer’s vaccine trial

    1. Anonymous Coward
      Anonymous Coward

      Re: "scientists"

      Political considerations affect all sorts of research. I have a pal in the Geography Dept of one of the UK's best known universities who tells me that even to propose research looking at the likely extent of global warming (not denying it, you understand, just looking for some numbers) is seriously career limiting. Likewise it is politically impossible to research whether parental behaviour leads to ASD/ADHD and so on in children, or whether children who go to nurseries do better than those raised at home.

      And so we end up with the nonsense of "neurodiversity" when in the overwhelming majority of cases there isn't a shred of evidence that there is - in computing terms - a hardware error rather than a software one.

  8. Anonymous Coward
    Anonymous Coward

    Great principle

    It's undeniably an important principle but it all comes down to implementation. I thought the CovidSim code check was a nice example of what it would look like in practice. I remember there was a spurt of complaints about the code quality (I thought the complaint that using a single letter variable names, like "r", was bad practice aged particularly well) and then an independent review found it worked fine.

    I didn't really get the sense we got a lot of scientific value out of the exercise

    Nature article on it: https://www.nature.com/articles/d41586-020-01685-y

  9. Binraider Silver badge

    Tell that the to the consultants that insist on using ANSYS, SolidWorks etc.

    I have lost count of how many recent PhD's involve running 50-year old finite-element analysis methods on some modern problem; inside the sandbox of one of the commercial modelling tools.

    Not saying it's not valuable work; quite the opposite, but call me weird they aren't quite the same as the PhD's of old where you had to actually write the model! (Perhaps just as well, given the effort involved in creating and validating such a thing).

    1. Ian Johnston Silver badge

      Nowadays we use software tools to do research. Thirty years ago we wrote the software tools to do research. Thirty years before that we wrote the operating system as well. Jolly good. Life moves on.

  10. Eclectic Man Silver badge

    Not just software

    A while ago (1994 https://www.computerworld.com/article/2515483/epic-failures-11-infamous-software-bugs.html?page=3 ) Intel produced the Pentium chip which contained flaws in the hardware arithmetic. Mostly they did not matter for general computing, but one researcher was concerned about the accuracy of his work, so did all computations twice, and differently, and found the bug. Made quite a few headlines too.

    I don't doubt that most experimental results are difficult if not impossible to duplicate, but the pharmaceutical industry, at least in theory, uses 'double blind' test to ensure that results of drug trials are valid.

    One of the problems is that a niche, one-off, tool used by a single person as a short cut to doing many tedious calculations will be jumped on by management and made into a product to be sold, or it will rely on one person to design, code and validate it. And then the UK's CRAMM (CCTA Risk Analysis and Management Methodology) which was a curious collection of questions about the IT system under consideration, which were 'processed' by a 'scheme' and frankly although I reviewed it once, and recommended several serious improvements, nothing was changed, and many of the questions, although mandatory had minimal to no affect on the recommendations, but several had major implications through 'weighting' which was never explained or justified. It was, however very repeatable.

    1. Ian Johnston Silver badge

      Re: Not just software

      I don't doubt that most experimental results are difficult if not impossible to duplicate, but the pharmaceutical industry, at least in theory, uses 'double blind' test to ensure that results of drug trials are valid.

      Well, in theory. I practice there is an astonishing level of skulduggery in pharmaceutical trials, starting with the almost universal failure to report negative results. That's how we ended up with a wide range of anti-depressants, almost all of which are worthless.

      1. Dr Scrum Master

        Re: Not just software

        How depressing.

      2. W.S.Gosset

        Re: Not just software

        > Well, in theory. I[n] practice there is an astonishing level of skulduggery in pharmaceutical trials

        Not just pharmaceutical science/work.

        In my personal experience with my own eyes: it is a serious risk in any area where money is on the table. Albeit not as egregious as people like to think, particularly the conspiracy theorists or left wing.

        But it is guaranteed in any area where there is a virtue meme on the table. Research "proving!" a virtue meme? Fraudulent. *cough* scientific misrepresentation

        2004 was my wake-up call. I'd been following the research in an area of grave concern to me for 15 years. But I'd only been reading the Abstracts. I trusted, you see. In 2004 for the first time I read the body of a major paper (amusingly, to prove someone wrong) and walked straight into scientific and systemic fraud. *cough* scientific misrepresentation. One serious deep-dive later and I completely reversed my position. Vehemently. And that field has only got jawdroppingly worse since then. (Replacing a global dataset with a model in 2006 was... stunning.)

        I've subsequently applied the same approach to a wide range of fields, with the results as above.

  11. gedw99

    Fraud is why we need data and code

    https://principia-scientific.com/8-top-leaders-in-field-of-medicine-say-dont-trust-science/

    If scientists question the validity of how science is being conducted, what can you expect from the layman? Your contributions which would allow direct access to the data and code I would certainly help restore some confidence among scientists.

  12. Tascam Holiday

    Reproducibility? What's that?

    Grizzled old university sysadmin here, ex researcher. I spend a fair amount of time explaining to our researchers (and not all young 'uns) that basing their critical work on code grabbed via 'git clone <last night's commit of useful looking repo>' is not a good start. Most are honestly baffled at the impact this will have on reproducibility, and I seriously wonder if basic scientific methodology is not formally taught any more.

    On the other hand freezing anything other than a very simple codebase and expecting it to continue to work for years afterwards is a very difficult problem to solve. The immense pressure to publish also relegates these issues to way down the list of priorities.

    Bioinformatics is the main culprit in my experience as it's a new field and there's been an explosion in ad-hoc tools and pipelines cooked up in labs and released to the world and embraced with little concern for long-term maintenance and preservation. We're at the stage in bioinformatics software cycle where the initial burst of software activity of 10-15 years ago is leaving a lot of abandoned applications in its wake. Whole research programmes have come to rely on some of these and it's always painful explaining to them that they must find an alternative as their cherished application or tools no longer work due to our necessary OS upgrade.

  13. Stuart Castle Silver badge

    I think the problem is, in computer science at least, that a lot of the researchers are the sort of people that will happily write their own script, spreadsheet macro or whatever to do what they need. They also tend not to follow best practice (designing everything properly, and documenting it), which is why I've frequently found software designed for designing and maintaining other systems is often the worst designed software.

  14. sreynolds

    Theranos

    Pretty sure that can be reproduced.

    1. Ian Johnston Silver badge

      Re: Theranos

      Elizabeth Holmes has reproduced. It's amazing what you can do with a tiny prick, isn't it?

  15. Norman Nescio Silver badge

    Nick Brown

    For some of the fun and games around reproducibility (or lack thereof), I can recommend a perusal of Nick Brown's blog:

    Nick Brown's blog: The adventures of a self-appointed data police cadet

    Nick's speciality is "forensic numerical data analysis finding errors in numbers", but part of that is trying to get hold of the original data (often surprisingly difficult), or find out what analysis method was used, and whether it was appropriate, or applied correctly. It is a bit dry, because he sticks to the facts, but he exposes quite surprising problems.

    NN

  16. jason_derp

    Encouraging

    Most of these comments boil down to "I believe wholeheartedly that science is important, now here's why all science is done wrong, here are the specific sciences I don't trust." Very encouraging.

    1. W.S.Gosset

      Re: Encouraging

      Subsequent to your post:

      Yes #1.

      Yes #2.

      Yes #3.

      Yes #4.

  17. W.S.Gosset
    Alert

    Careful!!

    This would mean the end of AGW global warming! If the CRU & co. have to write code that's not littered with --in their own leaked code's comments-- "completely arbitrary fudge factors", the whole human-impact goes away, as, indeed, does most of the warming (raw temps suffer at least 3 levels of software adjustment -- e.g., by level 2, Darwin's 20thC DEcrease of -0.7⁰C becomes an INcrease of +1.2⁰C -- e.g., observe the progressive anticlockwise rotation of the core global dataset (past colder, present hotter) by cycling through the graphs here (same input temps)).

    Hundreds of thousands of people globally will lose their jobs. And their virtue.

    Obama's Science Advisor for Energy, a renewables specialist, Koonin belatedly discovered "fraud" in the primary American climate assessment report (the USA IPCC), and lots of it. When he proposed adding a formal Disclosure and 3rdParty Replication step to climate work to address the demonstrated collapse of peer review ("red team" à la Infosec), he was pilloried in the press. All ad-hominem. If he hadn't effectively retired, he'd have been sacked.

    The MPs will experience a psychotic attack-dog pile-on like no other should they attempt to actually go through with this.

  18. W.S.Gosset

    Example failed control for Scientific Integrity: Peer Review

    Cliques can be easily formed, and act as gatekeepers rather than quality-improvers. Journals can be controlled to accept they have to consult key gatekeepers on sensitive areas, to get approved lists of peer-reviewers.

    Example: leaked email: CRU Climatic Research Unit Director Phil Jones, response re request (with suggestions) for list of reviewers (emphases added):

    "... We have Ben Santer in common ! Dave Thompson is a good suggestion.

    I'd go for one of Tom Peterson or Dave Easterling. To get a spread, I'd go with 3 US, One Australian and one in Europe. So Neville Nicholls and David Parker.

    All of them know the sorts of things to say -- about our comment and the awful original, without any prompting."

    And Keith Briffa (dominated tree ring research globally) coordinating Peer-Review to kill a paper in support of another:

    From: Keith Briffa

    To: Edward Cook

    Subject: Re: Review- confidential REALLY URGENT

    Date: Wed Jun 4 13:42:54 2003

    I am really sorry but I have to nag about that review – Confidentially I now need a hard and if required extensive case for rejecting – to support Dave Stahle’s and really as soon as you can. Please

    Keith

    And does it work? Well, the reply to the above email led to another key paper being blocked and we can measure the impact directly:

    "Now something to ask from you. Actually somewhat important too. I got a paper to review (submitted to the Journal of Agricultural, Biological, and Environmental Sciences), written by a Korean guy and someone from Berkeley, that claims that the method of reconstruction that we use in dendrochronology (reverse regression) is wrong, biased, lousy, horrible, etc. They use your Tornetrask recon as the main whipping boy.

    ...

    If published as is, this paper could really do some damage. It is also an ugly paper to review because it is rather mathematical, with a lot of Box-Jenkins stuff in it. It won’t be easy to dismiss out of hand as the math appears to be correct theoretically [he then explicitly states they need to use it because it is the only method that gives them the right numbers ("p-hacking")]

    It worked! Researcher took over 10 years (2003->2015) to finally get past the clique:

    "Specification and estimation of the transfer function in dendroclimatological reconstructions" , Maximilian Auffhammer [Berkeley], Li, Wright, Seung-Jick Yoo [Korea]

    We identify two issues with the reverse regression approach as implemented in several classic reconstructions of past climate fluctuations from dendroclimatologcical data series. ... the reverse regression method results in biased coefficients, reconstructions with artificially low variance and overly smooth reconstructions

  19. W.S.Gosset

    How to train your Journal

    Backdoor trick taught to the CRU by the old master himself, the chap who hijacked then took over the CRU, who PhD-supervised then hired "correct thinking" team for the CRU -- leaked email advice to the team from the wonderfully named Tom Wigley:

    "One approach is to go direct to the publishers and point out the fact that their journal is perceived as being a medium for disseminating misinformation under the guise of refereed work.

    I use the word 'perceived' here, since whether it is true or not is not what the publishers care about -- it is how the journal is seen by the community that counts."

    Works a treat! Publisher panicked and sacked 6 editors at Climate Research, including the editor-in-chief.

    Other examples:

    * Jones to Ben Santer Mar 19th, 2009: "I'm having a dispute with the new editor of Weather. I've complained about him to the RMS Chief Exec. If I don't get him to back down, I won't be sending any more papers to any RMS journals and I'll be resigning from the RMS."

    * Michael Mann: "We can't afford to lose GRL." (Geophysical Research Letters)

    * Tom Wigley in reply: "If you think that Saiers is in the greenhouse skeptics camp, then, if we can find documentary evidence of this, we could go through official AGU channels to get him ousted." Saiers got sacked.

    * Remote Sensing sacked editor-in-chief Wagner for publishing research showing IPCC cloud albedoes + radiation were wrong. BBC applauded. Various NGOs, uni groups, etc applauded. Authors only published there because every other journal was too scared to touch it. Those journals were right, Wagner was wrong.

  20. W.S.Gosset

    Example failed control for Scientific Integrity: Review Bodies

    Easily hijacked. See if you can spot the following very very subtle spin introduced to a major body of science, very successfully. Document = the global benchmark: the IPCC's SPM Report (Summary for Policy Makers), aka THE Report since it's the only one anyone ever reads.

    DRAFT: Researchers agreed and wrote the following group/joint statements which appeared in the Draft:

    "None of the studies cited above has shown clear evidence that we can attribute the observed [climate] changes to the specific cause of increases in greenhouse gases.

    "While some of the pattern-base discussed here have claimed detection of a significant climate change, no study to date has positively attributed all or part of climate change observed to man-made causes.

    EDITED: Ben Santer, PhD under Tom Wigley's supervision, freshly graduated but immediately appointed as an IPCC Senior Editor by personal intervention by Wigley's mate IPCC Chairman John Houghton, introduced some subtle spin. This is how the above statements appeared in the final SPM Report:

    "1. There is evidence of an emerging pattern of climate response to forcing by greenhouse gases and sulfate aerosols...from the geographical, seasonal and vertical patterns of temperature change. ...

    These results point toward a human influence on global climate.

    "2. The body of statistical evidence in chapter 8, when examined in the context of our physical understanding of the climate system, now points to a discernible human influence on the global climate..."

    And this is how you corrupt science.

  21. Henry Wertz 1 Gold badge

    IRIX software

    Yup, I helped out at a local lab once, they had some old data and a published paper that they wanted to run a followup on (process some new data using same techniques for a followup paper). Running the current software with the old data did not obtain the same results as in the paper (not a big change that'd change the conclusions, but still...). They had a copy of the binaries, but they were for IRIX, and the SGI it ran on had been retired. I did end up getting it to run -- Linux for MIPS intentionally picked the same system call numbers that IRIX did, and it was a simple console-based app (text prompts asking to set various parameters and what file to read the data from), so it actually ran on x86 linux using qemu-user-mips. The overhead of emulating the MIPS was more than offset by the x86 system being years newer and like 10 or 20x the clockspeed, it ran whatever it was doing in a few seconds (apparently it took about a minute on the original system.) It ran the old data with identical results to the paper (i.e. qemu was emulating properly), so they ran their new data with it and all was good.

  22. JulieM Silver badge

    Product Testing

    When I was working in the electronics industry, many years ago, listings of any computer programs we wrote for testing purposes -- e.g. to operate inputs and monitor outputs via experiment boards, so a module could be under continuous test while being cooked, frozen or shaken to bits -- were considered as integral a part of the eventual report as any other detail of the testing procedure. The stated aim of the report was, after all, to allow anyone to replicate of the experiments in future.

    We did not think at the time that it might not be fully reproducible if the experiment board (or, as actually happened, the 8-bit expansion slot into which it fit) became obsolete .....

  23. TheBadja

    One significant problem is the MP's are now professional politicians, and rarely have any STEM skills at all. They don't understand software beyond using Facebook and Twitter, and even then, they have assistants to do this. One prominent Australian MP was caught out because he didn't own a smartphone to be able to present his vaccination status at a pub for entry.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like