back to article Is it OK to use stolen data? What if it's scientific research in the public interest?

There's a fine line between getting hold of data that may be in the public interest and downright stealing data just because you can. And simply because the data is out there – having been stolen by online intruders and then leaked – does not mean it is right to use it. A paper published in Nature Machine Intelligence this …

  1. Binraider Silver badge

    This is not a new argument, and answers will always be legal grey areas.

    Medical data acquired through absolutely horrific means in WW2 for example, remains in (limited) circulation. Some of it is incredibly specialised and has practical, life saving or improving applications. (Equally, some of it was utter garbage and an excuse to indulge in torture for the sake of it). The question of whether it is right to use it is ethically impossible to mandate. Yes, it was forcibly acquired. But it is also able to save or improve lives. Does using it mean you are disrespecting the victim and legitimising the crime? Does it mean you are honouring the victims loss? Absoutely impossible to mandate as you can make any number of arguments for or against.

    The choice of whether it's right or wrong is at best, difficult to legislate for. Give people the informed choice as to what it means, but perhaps do not deny. This goes for other laws that have been making the news recently too, denying choice.

    1. Dr. Ellen
      Big Brother

      Fifty years ago, I was arguing about valid Nazi medical results obtained by horrible people doing horrible things to innocent victims. There is still no good answer to the question, but at least medical science is advancing so eventually those results can be ignored. (The arguments were carried out in fanzines, which were on paper in those days. No Internet, but there are similarities. Things change, but sometimes they only change venues.)

    2. Muscleguy Silver badge
      Boffin

      Indeed, what we know about how the human body reacts to immersion in cold water is from Nazi experiments on non consenting subjects. But there was no other way to understand this and we need to understand it.

      The Ashley Madison data is essentially a found experiment like a supernova of a new sort. Done carefully with anonymisation and no discussion of individuals only aggregated data yes you can.

      Getting folks to divulge honestly what they will on dating sites for a research project is difficult. So when one drops in your lap you’re a bit mad not to.

      1. Robert Carnegie Silver badge

        I thought Nazi scientists mostly were like Soviet scientists - they were told in advance what the results of experiments were to be. In which case, not required reading. The rocket scientists though - worth collecting apparently. So...?

      2. doublelayer Silver badge

        Your view is interesting as I saw the Ashley Madison data as a perfect example of an inappropriate use.

        In the case of cold water immersion, it's data which can never be obtained legitimately--people who accidentally fell into water are not common or detailed enough to use only data collected during their treatment, and there is no safe way of exposing subjects to that in controlled circumstances. It's also data which can help others by improving the treatments for those who end up in cold water nowadays. It's difficult for me to accept the use of results obtained by torture, but the preceding points make it a little easier.

        Neither point is true of the data researchers might use from Ashley Madison. It's extremely personal data which the subjects did not want released, but if researchers wanted to collect it, they have the option of getting subjects to agree to provide that kind of information. If nobody agrees to do so, that only accentuates the fact that people are not comfortable having their private lives scrutinized to that level. In addition, nobody is having their life saved from research done on sexual habits. Its benefit to people other than the researchers is minimal, an interesting article to read at best. One more point: if the researchers wanted to use the data, they could contact the very real people involved in the breech to ask for permission, as they would have to do for any other experimental subject. They didn't do that, likely because they know most or all would refuse, but that doesn't remove their ethical responsibility. They are using stolen data for personal gain with no consent whatsoever, and I see no convincing argument that doing this benefits anybody else.

        1. Ian Johnston Silver badge

          They are using stolen data for personal gain with no consent whatsoever, and I see no convincing argument that doing this benefits anybody else.

          On the other hand, if the data is anonymised, using it causes no harm whatsoever to anybody, regardless of consent.

          1. Citizen of Nowhere

            > if the data is anonymised, using it causes no harm whatsoever to anybody, regardless of consent.

            Unfortunately, nobody can guarantee that anonymised data cannot be de-anonymised. In fact, there was an article on El Reg a short while ago showing that in many cases doing so was a trivial exercise.

            https://www.theregister.com/2021/09/16/anonymising_data_feature/

  2. Potemkine! Silver badge

    Is it OK to use stolen data?

    No.

    What if it's scientific research in the public interest?

    No either. From a wrong you cannot do it right, mainly because it would justify the wrong in some way.

    1. Blazde

      I can't see how stolen data used for law enforcement action, or for research which assists subsequent law enforcement action and prevents future thefts of data in any way justifies the original wrong. It's a straightforward reaction to it.

      1. Pascal Monett Silver badge

        Fine. Let's go torture 20,000 people in various documented ways and create a gigantic store of medical data that will help save lives.

        Saving lives will certainly right the wrong. No problem, right ?

        1. HandleAlreadyTaken

          >Saving lives will certainly right the wrong. No problem, right ?

          That's certainly a position long supported by the ones in power. A certain illustrious organization, dedicated to working for God's greater glory used to say: Cum finis est licitus, etiam media sunt licita - that is, when the end is lawful, then the means are lawful.

    2. Roland6 Silver badge

      >Is it OK to use stolen data?

      No

      Well someone should tell Troy Hunt et al so that they can close down haveibeenpwned.com ...

      1. doublelayer Silver badge

        That's storing the data and allowing you to see if you're in it, not using it for other goals that you never agreed to. In addition, there should be restrictions on services like that to prevent them from storing certain types of data. Don't keep the name-to-email stuff, for example. And definitely don't share it. It's unclear, but that is designed to protect user's privacy and security rather than obtain someone else's goal with tenuous if any benefit to the person whose data was taken.

        1. Roland6 Silver badge

          But even adhering to your strictures, that is still using stolen data...

    3. hoola Silver badge

      What is interesting is the assumption that the stolen data is valid data. If the source has dodgy provenance exactly how do you know it is valid?

      The answer is you don't unless you have the original to compare it with, at that point, well, you may as well use the genuine source.

      Just like stolen art and other valuable pieces, if they are "found" exactly how do you know you have the original and not a very good copy?

      There will be records and many details to prove authenticity but it is still down to academics of computers with so called "AI" making the decision.

  3. a_yank_lurker Silver badge

    Ethics and Use Case

    Using stolen data will always be ethically challenging as Binrader noted. The better questions are why are is one using the data, what is the purpose of the use, and how will the results be used. They are interrelated. Also, one needs to go deep in when answering these questions. Often one is tempted to give a glib answer to the questions without careful examination of them. Just because you can do something does not mean you should do it. But we each have a bad habit of justifying our stupid behavior. And no, I do not have a very good answer to the issue.

  4. cjcox

    I only do car analogies...

    Wanted to some research on cars, so stole my neighbor's Lamborghini. He got mad. Told him it was purely for research in the public interest.

    1. Roland6 Silver badge

      Re: I only do car analogies...

      But your stealing your neighbours Lamborghini isn't really the issue here. The issue being discussed here is whether it is legal and ethical for someone interviewing you about your driving experience and then publishing the findings.

      1. doublelayer Silver badge

        Re: I only do car analogies...

        Neither analogy is at all connected.

        "whether it is legal and ethical for someone interviewing you about your driving experience and then publishing the findings" is clear, it is legal. Because they tell you at the beginning that they want to interview you and they will publish the findings and they tell you how much of your personal data is going out. Not telling people that either leads to no data (don't tell people you want to interview them) or is illegal (disclose information you haven't obtained consent to disclose). Research review boards have the responsibility to maintain such regulations.

        I can't think of many car-related analogies, but the closest is someone creepy puts a tracking device on your car to track you, someone who isn't the attacker gets the data, and they intend to use it for whatever they wanted without your consent and despite the fact that, if asked, you would probably refuse.

  5. Dr Paul Taylor

    when is collected data stolen?

    Phyllis Pearsall compiled the original London A-Z maps (in part) by trudging the streets and copying down their names. Was this breach of the copyright of whoever made the street name signs? Probably not. If someone else reproduces her index, is that breach of copyright? Yes. The difference is the "added value" of her legwork.

    Cecil Sharp collected folk songs from the west of England and published them. Was that breach of copyright? I think there was a case against him. Did he do a cultural service? Yes, because otherwise those songs would have been lost.

    1. Doctor Syntax Silver badge

      Re: when is collected data stolen?

      Would the folk songs collected by Cecil Sharp have been in copyright?

    2. doublelayer Silver badge

      Re: when is collected data stolen?

      Street signs are not copyrightable because they were created by the government for public use and in any case are probably too short to qualify. Folk songs were probably not copyrighted because they have no identifiable creators and are old enough that any copyright would have expired. If they were commonly known but the writer was still living, then compiling would have been illegal. In each case, neither piece of information was stolen.

      1. Michael Habel

        Re: when is collected data stolen?

        Surly the would be Trademarked, and not Copywritten, as Road Signs are generally not found in print, for which we have Copyrights to protect the Publisher, and eventual Author of the work. Hense why Micky (The Rat), Mouse, and the Coca-Cola logo are both examples of Trademarks, that in so far as their continued use, and protection thereof ensures vitrual eternal protection. unlike Copyright, (Which if memeroy serves was the life of the Author +75 Years before falling into the Public Domain.

        1. doublelayer Silver badge

          Re: when is collected data stolen?

          Trademarks do work for shorter things, but, for two reasons, neither street signs nor a compilation of street signs can be trademarked.

          Trademarks are for things which are associated with products, companies, or brands. They must be actively used for a stated purpose. For example, Apple (company) owns a trademark for the word apple when used to name a computer product, streaming video services, watch, or other products they use the name with. They do not own the word apple when used to talk about a fruit, or for that matter a product they don't make. I could probably start a company making something unrelated to their products and use the word apple to name it. Street signs are not limited to a business purpose, so they don't come under trademark protection.

          A compilation of street signs can't itself be trademarked or copyrighted. It is too long to trademark and in any case its contents would be someone else's work (the government who decided on the street names). The list of street names wouldn't be copyrightable because they weren't the creative effort of the compiler. The compiler could easily attach original work to their list, which would be copyrightable, but if it was just a list, they are out of luck.

        2. This post has been deleted by its author

    3. MachDiamond Silver badge

      Re: when is collected data stolen?

      " Probably not. If someone else reproduces her index, is that breach of copyright? Yes. The difference is the "added value" of her legwork."

      Not a very good example. Often, compilations of factual information doesn't qualify for Copyright protection. The legwork involved wouldn't come into it. It might be a big stretch to call the work "creative expression". There was a case of a phone book company suing another phone book company for copying. They lost. The next time the company had made up a bunch of entries and managed to argue that those made the list a creative work and won the case.

      All of that said, there could be something creative in the layout and surrounding design which would make it creative enough. Let's say the streets were color coded to show that parking was allowed on one side or the other or not at all. There might be a color code to denote that the pavements are wide enough for somebody in a wheelchair/mobility scooter. The more unique and creative, the better.

      If I take a photo of a red brick wall, chances are slim that I could get a court to recognize a Copyright claim. There wouldn't be enough "protectable elements" in that photo. The same thing can apply to other textures. There is a certain minimum bar to call a work "creative". If your style of art is to glue a piece of pasta to a large white canvas, you aren't trying hard enough.

    4. Muscleguy Silver badge

      Re: when is collected data stolen?

      Robbie Burns did the same thing in Scotland in the 18thC. He preserved and published songs which would have been lost had he not. The debt Scottish culture owes him for that alone is huge.

      1. Michael Habel

        Re: when is collected data stolen?

        Yes but, those "Songs" were probably largly considerd to have been in the Public Domain, even back then.

    5. Eclectic Man Silver badge

      Re: when is collected data stolen? - Folk Songs

      The song 'Scarborough Fair' (aka 'Parsley, Sage, Rosemary and Thyme'), recorded by Simon and Garfunkel, and is actually a song of the Anderson family in NE England. Simon and Garfunkel attribute it to P Simon and Art Garfunkel.

    6. General Purpose Silver badge

      Re: when is collected data stolen?

      Data doesn't qualify for copyright and legwork isn't creative. Thinking up fake streetnames to sneak into the index is creative, and so reproducing the A-Z index in full does breach copyright.

      1. Michael Habel
        Pint

        Re: when is collected data stolen?

        I think the Oracle just told Google to hold its Beer.

        Pretty damnd sure that Data IS Copyrightable, and Google were found guilty of infringing it.

        1. doublelayer Silver badge

          Re: when is collected data stolen?

          Data is such a broad term that you're talking about different things. Data in the sense of existing facts without additions isn't copyrightable. Data in the sense of a digital representation of something which would qualify is copyrightable.

          Also, you may want to check the outcome of the Google V. Oracle thing. Oracle could copyright their API, but Google was allowed free use of the part they wanted without having to ask for permission or provide payment.

  6. Mike 137 Silver badge

    "an effort to help guide data scientists and researchers through the ethical dilemmas"

    It's actually all rather obvious if you just stop and think. Thirty-odd years back when I was doing research, pretty much everyone seemed to be able to make these judgements for themselves and they mostly got them right. Now it seems they need to be guided. I can only assume that the generation of researchers who grew up with an open web have had their personal ethics blunted by notional "freedom of information".

    1. Binraider Silver badge

      Re: "an effort to help guide data scientists and researchers through the ethical dilemmas"

      I'd say the reason that guidance is needed on what is acceptable or not, is to provide some protection to those that thought they were doing the right thing - and then turned out not to be.

      30 years ago before widespread internet, your work probably wouldn't be subject to scrutiny in the same way.

      1. Pascal Monett Silver badge

        And you'd have had a lot more trouble selling the results to advertisers.

  7. Doctor Syntax Silver badge

    I'd have thoughts that data quality issues might loom at least as large as ethical. Provenance? Sample selection? Accuracy? Repeatability with independent data?

    However, psychology, social sciences, AI - probably no problem.

    1. Muscleguy Silver badge

      Sometimes you have to take natural experiments and do the best you can with a problematic data set because you ain’t getting another one any time soon.

      Ask the astronomers about that one, would they like everything, safely, in the neighbourhood and unobscured by dust clouds, occluded by other objects, not subject to major difficult to analyse redshift and they know absolutely everything which lies between us and the source (hydrogen clouds etc). But they don’t get that stuff most of the time so they make do with difficult to discern and measure data with large error bars. Because otherwise you don’t do anything.

      I did my PhD on a natural mouse mutant, turned up when a line was inbred. I had both the mutants and the line they came from as controls. You don’t get better than that in Biology. But when I explained it to a Physicist he had collywobbles over the degrees of freedom and couldn’t grok how I could work or do science with it.

  8. Anonymous Coward
    Anonymous Coward

    Misinformation here....in the guise of a "balanced discussion"!!!!!

    Quote: "....a fine line between getting hold of data that may be in the public interest and downright stealing data ...."

    No mention here about the behaviour of the NSA (see Edward Snowden revelations). Why do the NSA, GCHQ (both aka STASI) get a free pass in this discussion? And we don't even know what data these organisations are "getting hold of".

    .......and please let's not have any comments about that well known joke.....aka GDPR!

    .......and please lets's not get any comments along the lines of "good guys versus bad guys".

    THERE ARE NO "GOOD GUYS"!!

    1. doublelayer Silver badge

      Re: Misinformation here....in the guise of a "balanced discussion"!!!!!

      "Why do the NSA, GCHQ (both aka STASI) get a free pass in this discussion?"

      Because they're not in the discussion. The discussion is about the use of data in research, and whatever those institutions are doing with the data they've stolen, they're not writing research papers. So we tend to reserve our complaints about them, of which believe me I have hundreds, for topics in which their actions are relevant.

  9. Sub 20 Pilot

    For those who say 'NO' unreservedly, what about the stolen / captured machines and codes which eventually led to the end of the second world war. Is this still stolen data that should not have been used ?

    In almost 60 years of experience I can say that there is no black or white mostly, it is all a grey area.

    Governments ( as noted above ) will do what they like and fuck everyone else.

    To me, if stolen data can be used for good and can possibly prevent a reocurrence of a problem then at least something good comes out of it.

    What I do not accept though is any commercial profit made from the same.

    So to summarise, if someone can improve humanity's lot from stolen data and is doing so on that basis then yes.

    But for Google, Amazon and other similar parasites to be using stolen data to increase profits then no fucking way.

    1. doublelayer Silver badge

      Crimes committed against an enemy in wartime are very different than crimes committed against the public. In the second world war, soldiers were killing each other, which is how war works. Theft is not really relevant here. The theft of cryptographic equipment is therefore not at all a useful example of what the article is talking about.

    2. Falmari Silver badge
      Devil

      @Sub 20 Pilot “For those who say 'NO' unreservedly, what about the stolen / captured machines and codes which eventually led to the end of the second world war. Is this still stolen data that should not have been used ?”

      Now without trying to justify war your argument is not relevant. In war the rules change, that’s how both sides justify killing each other. If they can justify that, they can justify using their enemy’s data against them.

      That aside, on the issue of, “the stolen / captured machines and codes which eventually led to the end of the second world war” or did they prolong the war. Maybe the war would have been shorter just with different victors. Just playing the icon.

  10. MachDiamond Silver badge

    Why = money

    Many years ago it didn't cost the price of a small car to have access to scientific papers. A subscription to Nature might not be as cheap as Reader's Digest, but not as expensive as it is today. Researchers have to publish or perish and they also have to have access to other works to keep up with the state of knowledge and to properly reference other's work. It can also be embarrassing to submit something substantially similar to something that's already been presented or waste time doing work that's already been done thinking you have something original.

    For the researcher associated with a large institution that can pay the subscriptions is great but a high price means that other researchers that don't have a large organization behind them won't be able to compete or at least do good work. This can mean that the political leaning of only the largest institutions will hold sway. Researchers that don't conform will get shown the door.

    I don't think there is a bright line test. Circumstances will determine whether the data is truly stolen or if there is a usurious paywall blocking its access.

    Commercial applications would likely fall into foul territory. I have no doubts that big data companies are buying data sets that have been hacked from company's servers. Once that data is melded into what the data aggregator already has, who can tell that some of that information wasn't obtained below board.

    1. doublelayer Silver badge

      Re: Why = money

      "Circumstances will determine whether the data is truly stolen or if there is a usurious paywall blocking its access."

      You are not talking about the same thing we are. Stolen data does not mean pirated data, but rather the result of clearly illegal theft of that data, whether that is a breech, GDPR violation, or anything else that is clear but can't be reversed. This isn't about data that's in a paid-for journal. Incidentally, if someone did get a copy of that data without payment, one couldn't figure out whether they had or not, so the stolen label couldn't be reliably assigned.

  11. Robert Carnegie Silver badge

    Ethics is one thing but in the UK and EU isn't it simply illegal to hold or process people's personal data without their consent? Hmm... if we're talking about "Machine Intelligence", which apparently we are, then perhaps it isn't a human being doing this, therefore not breaking the law? Or, if you're the secret police or Britain's NHS, then you have legal permission to hold any data that you want to about people, even about body organs that they don't know they have. Or... this is from Switzerland and again, laws may be different there.

    1. Anonymous Coward
      Anonymous Coward

      "Ethics is one thing but in the UK and EU isn't it simply illegal to hold or process people's personal data without their consent?"

      Nope, 'Consent' is only *one* of the six defined lawful basis that can be used for processing personal data (and additionally there are 10 lawful conditions, including 'Consent', to choose from when processing Special Category personal data).

      So consent is not required if one of the other lawful basis, such as "Performance of a Contract", is used. Your bank does not need your consent to process your personal data as they need (some) of your personal data in order to provide you with banking services - they will typically use "Performance of a Contract" and "Legal Obligation" (i.e. to determine your identity as part of anti money laundering laws), and possibly "Legitimate Interests" as their lawful basis.

  12. Eclectic Man Silver badge
    Meh

    Past crimes

    In 1828, Burke and Hare murdered people and sold the corpses to Robert Knox, who taught human anatomy to medical students in Edinburgh.* Undoubtedly some of those students educated on the corpses or murder victims went on to save lives, and benefit 'society' as a whole. They were caught when one of the corpses was recognised as a young lady who had recently been reported missing. So pretty much anyone alive today in the West who has benefitted from medicine in any way has probably benefitted for the murders committed by Burke and Hare either directly by having surgery from someone educated in a direct line of medics to those taught using the corpses, or indirectly by benefitting from someone who benefitted either directly or indirectly.

    The problems with benefiting from historical crimes are that you don't get the choice and in some ways it can be taken as validating the crimes, which can mean that people today justify present day crimes by Machiavelli's dictum "the end justifies the means". In the 20th century there is a whole history of illegal medical experiments conducted on unknowing victims, often on black people**.

    Using illegally obtained data for beneficial research is probably not as contentious as conducting often fatal medical experiments on innocent victims, but the ethics of doing so are, in my mind related.

    I don't have an answer that I find comfortable. On the one hand, crimes are crimes because they should not be committed, but scientific advances benefit people and the environment, possibly including me.

    * https://en.wikipedia.org/wiki/Burke_and_Hare_murders

    ** https://en.wikipedia.org/wiki/Unethical_human_experimentation_in_the_United_States

  13. Anonymous Coward
    Anonymous Coward

    No.

    Is it OK to use stolen data?

    No.

    What if it's scientific research in the public interest?

    Doesn't matter. No.

    There you go, no need for an article.

    1. Dr Paul Taylor

      Re: No.

      Several good examples have been posted on this page to illustrate that this question is far from being morally clear.

      1. Anonymous Coward
        Thumb Down

        Re: No.

        You used the word "good". I do not think it means what you think it means.

  14. Anonymous Coward
    Anonymous Coward

    We're talking specifically about a breach of personally sensitive data here. I note that:

    1. If the company concerned had done their job properly, the data would not have been made public in the first place - in which case the researchers wouldn't have got their opportunity.

    2. It was the reasonable expectation of the data subjects that their data would remain private - and if they'd known it would be published, they wouldn't have provided it in the first place.

    ISTM that the only way to "right this wrong" is to prevent the data being used for any additional purposes whatsoever, thus minimising the impact of the breach, and leaving things as close to how they *should* have been.

    From the point of view of the data subjects: the more eyes that this data is exposed to, no matter how well intentioned or how well supervised, increases the impact of the breach.

    But that's about using this for "research". Where it becomes a trickier issue is when an accidental breach of personal communication reveals serious crime, or (say) politicians accepting bribes. It seems reasonable for the police to be able to act on this information, even if it's not directly admissible as evidence in its own right. But then again, should they be *trawling* through all data breaches, looking for signs of criminal activity?

  15. Ian 55

    Asking for a friend who had an account there

    What research has been done on the Ashley Madison data?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like