Smoke and Mirrors
If this law does not make it illegal for said vendors to obfuscate or encrypt the code they make available for review, then this new law is effectively useless.
(Icon for, "The Devil is in the Details")
Democratic lawmakers once again have proposed legislation to ensure that the software source code used for criminal investigations can be examined and is subject to standardized testing by the government. On Thursday House Representatives Mark Takano (D-CA) and Dwight Evans (D-PA) reintroduced the Justice in Forensic …
Eh, it doesn't really get that deep in the weeds, and the bill isn't just about source-code review of the tools.
What (the 2021 version of) the bill actually proposes is:
1. To direct NIST (the National Institute of Standards and Technology) to draft a set of testing guidelines "to be known as the Computational Forensic Algorithm Testing Standards", by which tools used in generating criminal trial evidence are evaluated for biases and error rates, and to set "requirements for publicly available documentation by developers of computational forensic software of the purpose and function of the software, the development process, including source and description of data used to develop the tool, and internal testing methodology and results, including source and description of testing data".
2. To remove the trade-secret protection that developers typically hide behind, when refusing to provide that information.
3. To mandate that defendants against whom forensic-software evidence is used are to be furnished with "Any results or reports resulting from analysis by computational forensic software", "and the defendant shall be accorded access to both an executable copy of and the source code for the version of the computational forensic software—as well as earlier versions of the software, necessary instructions for use and interpretation of the results, and relevant files and data—used for analysis in the case and suitable for testing purposes".
It doesn't have to be illegal to try to duck those restrictions, it's sufficient that any software so concealed fails to meet the NIST standards. Because an additional key point is:
4. Make results from any software that doesn't meet the new NIST standards inadmissible as evidence.
...Of course, the bill will never pass anyway, heck it'll never make it out of committee. But it's nice to dream.
In 2019 and 2021 the bill didn't even make it into a committee, let alone out of one. That's as firm a rejection as you can get and there's nothing to suggest it'll do better this time round. I assume the lobbyists have their arguments and other forms of persuasion ready.
It does sound like a good idea FWIW.
Does the defendant really need to see the source code? Run the same assessment done by company black box a on competitors black box b. Same inputs, different outcome is grounds for questioning the validity of one of the black boxes without having to see inside it. I'm assuming rigorous testing has taken place already so alleged bias has been measured for all of the software solutions on the market. And you'd be mad to put your source code in the hands of a defendant who'd then have to hire your competition to tell you how it works.
Yeah, "access to the source code" does sound like a sticking point. I'm guessing the bill's authors would be willing to take that clause out, if it would result in their bill getting further through the process.
But let's face it, even without that, the level of accountability proposed here would make most software vendors - in just about every business - shit their pants.
The problems with giving the same inputs to System A and System B, and then comparing their outputs are:
* System A might exist, but a System B might not exist.
* System A and System B might not provide the exact same functions.
* System A and System B might not use the same algorithms (quite likely, that).
* Even System A compared with itself might give differing results, depending on which compiler / interpreter is used, and which compile-time / run-time options are selected.
We all know that seeing source code isn't a viable option -- you'd have to reverse engineer it to figure out what it does and how it does it and "that might take some time". What is important, though, is to standardize how the code works and especially identifying systematic biases. The code and possibly the platform its run on have to be clearly identified -- its not enough to just produce a print out (its the court system -- it might even need to use a fax somewhere) and state blithely that the information is correct, it has to be provably correct and consistent. (Bit of a stretch for a lot of modern software....)
We all know that seeing source code isn't a viable option
No, we most certainly do not "know" that. I'd be perfectly happy with a law requiring the source code for any software product used to make any decision regarding the treatment of criminal defendants be published.
If that drives firms like Northpointe out of business, well, that's a consequence I'll accept.
Full disclosure: I'm an open-source developer and advocate. I firmly believe that it's possible to both create world-class software and to make money doing so, without having to hide your code in order to secure your revenue stream. There are sufficient examples of companies demonstrating that model's workability: Red Hat, Qt, Mozilla, sorta-Google, etc. I believe that security through obscurity is no security at all, so the concept of "trade secret" protections and software patents leaves me cold. Those are the biases I bring to this conversation.
I'm assuming rigorous testing has taken place already so alleged bias has been measured for all of the software solutions on the market.
You shouldn't assume that, because the only requirement that these products be subjected to any independent testing AT ALL is in this bill that's never going to pass. Right now, for software being used to produce evidence in trials today, no testing whatsoever is required to have been performed, so how much do you really think has been done? The free market ain't gonna incentivize the elimination of biases here, heck the market probably favors biased systems. (Doesn't it always?)
Does the defendant really need to see the source code?
IMHO, yes. It's the surest way to evaluate the algorithm being employed to make decisions that are, quite literally in some cases, matters of life and death. The stakes here are not exactly low.
To be clear, not every defendant will be expected to evaluate the software being used in their prosecution. But the path towards making it possible to do so, for those with the means and motivation, is to provide everyone with that option.
Programmer: "Boss, why do we have to lie on all these documents we're generating for the government?"
Boss: "Because our program generates a 'slightly'* false-positive rate, which in turn makes a conviction more-likely. More convictions make prosecutors happy, they make police happy (See how well we're doing!), they make the prison industry happy (We're building more prisons! And, we're getting more ultra-cheap prisoner labor!), and in the end, allow us to sell more copies of our software, which makes our Board of Directors happy."
*A run-time parameter which can be adjusted by the software seller's field engineers from "less-aggressive" to "more-aggressive".
This is my concern as well. A judge, issuing a sentence, elucidates his/her reasoning in a publicly available document. I don't know if this is actually legally required (I am not a lawyer). The use of software to aid in "legal reasoning" should require the same sort of transparency, at least in the US. I am not sure how this would interact with the UK "code is correct" unless proven otherwise assumption in legal cases as brought out in the Post Office debacle. Would an individual, brought in as an expert to aid in sentencing, be allowed to hide his/her calculations, and just issue a plain "the defendant should be sentenced to x years" statement that the court then enacts directly? Is this "trade secret" nonsense supported by the (frustrating to me) belief by some members of the public that if a computer says it, it must be so?
> A judge, issuing a sentence, elucidates his/her reasoning in a publicly available document.
Curiously enough, there is a way to make software that, after it reaches a conclusion, also generates a nice report that explicitly details how it arrived at that conclusion.
But that sort of thing is far too much work for our Modern World[1].
Why bother to write, say, an Expert System (which would be a great fit for this sort of Decision Making Advisor) when you can just code COMPAS as a couple of Excel sheets (probably with at least a dozen cells showing bad-reference errors "oh, don't worry about those, we'll just pop that sheet put of sight").
Daft thing is, if the systems actually just bothered printing out all the reasoning in the first place, along with suitable references, then these demands for "show us the source code" would not even need to happen! They could keep their precious trade secrets about how they generated that reasoning: just let the *content* of the print-out be challenged and overturned, not the mechanisms by which it was generated.
[1] cue rant about how LLMs and yer bog standard Neural Nets are incapable of any explanatory exposition compared to the "too much hard work needed to make them useful" designs, such as XPS, which are explanatory[2] by their very nature.
[2] opaquely, if you don't bother to make it format the explanations well, but still present
What happened to a jury of your peers?
In the US, juries do not sentence criminal defendants. They convict them (or acquit, or fail to do either, resulting in a hung jury and mistrial). Judges determine the sentences.
And that's assuming the defendant requested a jury trial. It's a right; it's not compulsory.1 And statistically criminal defendants do better without a jury, though obviously this depends greatly on the specifics for any individual case.
1And sometimes even famous defendants who have a history of getting away with, say, fraud, and have reason to believe they might do better in front of a jury, hire incompetent lawyers who forget to request one.
As someone that was on a criminal trial to completion as a jurist, I can assure you that they do.
It may be in some certain situations, like how there can be a bench trial or jury trial. But where I am, at the county level the jury deliberates guilt, then sentencing within the guidelines. The judge can overrule, but if not grossly outside the provided range it stands.
American legislators, with few exceptions, are in it for the money and the power and care VERY little about things that do not affect either of those (when it comes to us PEASANTS).
I was also considering that a simple NDA, whether explicit or implied, is all they should need to protect intellectual property. Just legislate THAT and all should go well.
(But it takes a gummint to inflate something simple into a money-laundering scheme for your donors)
Most legislators are also scared to death of the "soft on crime" bugbear, which has been used extensively by members of both parties for decades. The carceral fetish in the US is entirely decoupled from actual crime rates or any sort of rational critique. Most people enjoy being scared, and they enjoy taking revenge on their imagined enemies (rather than their actual enemies). See also the immigration "crisis".
If it cannot be disclosed and analysed as a regular part of the due process of sharing evidence, then it is inadmissible as evidence.
The output of the specific software package mentioned in the article, COMPAS, is not admitted as evidence. It's used in determining a sentence (by making a highly dubious1 estimate of the probability of recidivism), not in conviction. That is a process under the control of the judge, modulo statutory and judicial requirements such as truth-in-sentencing laws and sentencing guidelines established by the legislature and courts.
Certainly the rule you suggest helps with software results used during trial, and I agree it's a good rule, but it doesn't solve the larger problem.
1I'm aware of the recent study showing that judges using COMPAS and following its recommendation had a somewhat higher accuracy (in the sense of assigning sentences which subsequently correlated to actual recidivism) than judges who consulted COMPAS and overrode its evaluation. I don't think that's a particularly strong conclusion, but more importantly it has no bearing on the issues at hand. We know human judges aren't very good at assigning fair, just, and proportionate sentences.2 Using secret algorithms is a problem in itself, regardless of outcome. Bias in the results of those algorithms is a problem, regardless of overall results.
2And then there's the whole problem with America's incarceration fetish, grossly excessive sentencing, the prison-industrial complex, and so on.
Quote: "...barring defense attorneys from reviewing source code relevant to criminal cases...."
So:
(1) Software requirements written in english
(2) Software design written in english
(3) Various architecture diagrams (remember UML anyone?)
(4) Database design (you know ERDs and so on)
(4) Source code (oh dear....various.....C, shell scripts, SQL........)
(5) Assembler
(6) Actual machine code
......so why are we hearing here that there's a SINGLE LEVEL place to "review the source code" to determine "what the system is doing"?
......it's a fantasy................................a fantasy believed by PEOPLE WHO KNOW NOTHING AT ALL ABOUT COMPUTERS IN THE REAL WORLD!!!!!!!
......and that's even before we get to the nightmare of "The Agile Manifesto", scrum, "user stories"..................and so on...................
Why am I not surprised?
Well, ignoring your inclusion of assembler and binary (which even you admit are beyond the point we'd call it "source code", are you telling us that your source code doesn't include within it comments that contain all the relevant parts of the requirements, design, ERDs etc etc? That you *don't* consider all of that as being part of your project's sources?
And you don't believe that it ever could be (or that the NIST requirements can not require these products to be) that professionally presented?
If you have never had the chance to work on a project that properly and professionally comments its sources and how that makes said sources comprehensible to those already working on the project, those being onboarded onto the project (saving time and money) and those providing external auditing of the project, you may want (need) to expand your horizons.
You can get a (cheap) start by, say, grabbing a copy of Doxygen[1] and running a small demo project with it: write out your req specs as Doxygen input (and hand out the generated PDFs to the various parties for review and amendments - well, as this is a demo, read the ODFs yourself). Ditto the functional specs (and marvel at how you can now also directly reference back from the FS to the RS without fudging anything - or forgetting to do so - and can even do the inverse!). Drop all your design notes into the Doxygen files (and cross-ref back to the FS), add in all your ERDs and UML or whatever other diagrams you want[2]. As you get to it, add your SQL and/or Python and/or Java and/or batch files, marking them up to reference all the prior materials. At the end of it, you have one "project source tree", containing absolutely everything about the project (all in the version control system of your choice - and, at least if you've stuck to the tools suggested here for your demo project, all in nice plain text formats that really make sense when dropped into a VCS - e.g. are diff'able). And your top-level build should be spitting out not a set of binaries, neatly packaged for the relevant target system installers, but also a wodge of documentation that contains everything it is possible to know about those binaries, from the source code (neatly presented with hyperlinks between functions) to all the relevant diagrams, designs and original requirements.
Now, I realise this idea is alien to you - and, sadly, to many, many people and companies[4] - but it is possible to organise yourself. And for NIST to demand that level of decent presentation be a requirement IN THE CASE WHERE THE SOFTWARE IS BEING USED TO DIRECTLY AND IMMEDIATELY EFFECT AN INDIVIDUAL'S LIFE.[5]
PS
Yes, I have done this: the very best time, the sole final deliverable was a DVD with an autostart that opened up the index.html generated by Doxygen, which just had hyperlinks to entry points for the various docs (User Manuals, Installation Manuals - which indicated the directories on the DVD for binaries and PDFs, Specs, C/C++ code with hyperlinks etc) and a note that, to recreate *everything* (binaries, all levels of docs as HTML and PDF etc), just copy the entire DVD to your PC (purely for speed!), then type one cd and a make (copies of all relevant compilers and tools included on the DVD). The man in the neatly pressed uniform was happy with the result.
[1] other tools are available - I did say "cheap" as in free!
[2] preferably as graphviz or Mermaid sources, not as JPEGs dumped out of your drawing package[3]
[3] reminder, we are demoing how to get all of this material gathered together and are using free tools, so you can learn how it *can* work; if you have the dosh you'll be able to find some products that will work with your GUI diagramming tools, if that is all you are capable of using.
[4] far too many are fixated on writing important documents as Word files, no matter how much time they waste on recreating frontispieces, copy and pasting glossaries (let alone re-inventing glossaries from proposal to proposal) and looking down on VCS and anything not immediately WYSIWYG as "that stuff the grunts in the coding pool worry about".
[5] a pipedream, this whole thing will be fought by the companies who don't want to admit that their COMPAS is just an Excel 1998 spreadsheet with over a dozen celss that show error messages, but a lively dream nonetheless.
Amen to the excel spreadsheet being the core of the product - we have a couple of light to medium duty CNC lathes, they have a visual programming system built into the control so you can quickly set up basic jobs.
We wanted a modification added as at the moment it spits out code that doesn't retract the turret on tool change, often leading to crashes if it's not manually edited into the right spots, and someone eventually forgets. The manufacturer was really cagey and our internal investigations turned up that the whole system is running from VBA in an Excel spreadsheet, and that all the people who knew how it worked have left the company so they don't dare change it.
There is also reports of a gunshot detection system which is horrendously inaccurate and useless, it sells well as police can call up the support call center and ask if the system has detected anything near an address, which they always agree to. It's probable-cause-as-a-service!
> Should the same criteria that applies to software models that can determine the fate of one man be likewise applied to software models that can determine the fate of multitudes?
Yes.
But, sadly, we need a big, noisy, politically driven, easy for Joe Public to comprehend, hardened point driven into the software industry to start a crack in their stonewall that can then be levered open, finally allowing somewhere for Good Practice to finally seep in.
quote: COMPAS to be biased against African Americans.
One man's bias is another's ML.
If computers are trained on what happened yesterday to guesstimate tomorrow, they will reinforce the past. That may be legitimate (more men than women rape women) or it may replicate the consequences of prejudice.
You either train systems on the past, or you program them directly. Take your pick.
There is no objective result possible from human-derived data, because humans are subjective.
Over time, does a detective develop legitimate experience or forbidden prejudice? They may amount to the same thing. And they may both be accurate.
Computers don't interface well with humanity. It might be better not to assume that they do, or that 'artificial intelligence' is actually any form of intelligence, as in SF movies. Because it isn't.