That's not fair.
They can't blame incorrect data for the Windrush fiasco. They'd deliberately thrown the data away. It wasn't there it couldn't have been incorrect.
QED (Home Office style).
Government plans to throw money at automation and AI to develop public services risk "magnifying" problems around data quality residing in its own legacy systems, the National Audit Office has warned. In its report today, Challenges of Using Data Across Government (PDF), the spending watchdog noted the government uses data to …
Which still qualifies as progress.
If you can make your mistakes faster, you can correct them faster. What's needed is to combine the crappy system with a robust and high bandwidth method for people to appeal against them, and get a resolution from a different system within a matter of hours, not weeks.
The first step in correcting data is to create a channel for entering corrections.
No, the article about the Government that wants to create its own AI database.
What is AI database? It is a large set of data organized especially for rapid search and retrieval.
My patented AI database is organized for this. For example, there is a paragraph:
-- Alice and Bob train with joy, she trains a lot. She wears everything blue. --
My AI database can establish that the word "joy" is not the name "Joy" but an adjective, by analyzing its context and subtext. But all other systems do not see and analyze words, they all work with patterns/ see and analyze the pattern "Alice and Bob train with joy" as one word!
However, the meaning of the whole paragraph depends on what the word "joy" means (and its part of speech)! For example, if the word "joy" is the name "Joy" - not clear who wears blue and trains a lot, because “blue” and "trains a lot" can be attributed to both, Alice and Joy. Does the Government want to get an alike ambiguity talkig with the US?
So the British government should prefer my patented, well-organized AI database and ignore anything that doesn't pretend to be a database.
Can OpenAL, for example, guarantee that it can create a database that contains unambiguous information? No, it can't.
I can.
Your example is awful.
Not only is "joy" clearly a name in this context - no English speaker would use the noun like that - but also you can't dismiss the possibility that it's Bob who trains a lot.
What, you think the name "Bob" can't take a feminine pronoun? Much to learn, you have.
I'm not talking about "Bob" for the sake of clarity example. I want to demonstrate that you cannot parse using only patterns, you should take into account their words first. I also have to use extremely unsightly examples, because one step to the side, one more word and the number of patterns is growing like yeast.
My patented AI database is organized for this.
Bullshit.
What you are claiming is that you have cracked natural language processing. Since that's a problem that has been seemingly insoluble for the last 50 years or so, I'm calling you out on it.
Unless, of course, 'you' are a large nation state with extremely large amounts of funds which you have been spaffing on the problem for the last ten years or so, in which case... no, you still haven't cracked it.
Jog on.
First, somewhere 70-75 years, I think. Secondly , Yes, I solved the problem by replacing n-gram parsing with AI-parsing.
There is sentence:
-- Marpha and Ryslan exercise with joy.
n-gram parsing gets only one pattern - Марфа and Руслан exercise with joy.
AI-parsing gets two weighted patterns:
-- Marpha exercises with joy - 0.5
-- Ryslan exercises with joy - 0.5
About funding... The money was not. All is done by my intellect.
So... you can parse simple, well formed, non idiomatic sentences? That's a bit like saying you've written a C parser.
It's easy if you can tightly define and limit the language used. It's also not natural language processing if it can't handle ambiguity, partial sentences, idioms, context, or change of usage over time (e.g. 'wicked' in 1955 vs 'wicked' in 1995), because it's not natural language that you are parsing.
You see, Natural Language boils down, forgive the tautology, to language. That is, to the consideration of your "ambiguities, partial sentences, idioms, context, or change of usage over time (e.g. 'wicked' in 1955 vs 'wicked' in 1995)", to the consideration of the external form, the shell. This is the External theory's (of Analytic Philosophy, Moore-Russell-Wittgenstein) approach.
I see language as becoming (in the sense of John, St.Paul, Maimonides and Hegel) - as a differential function, with its limit. There are two sentences:
- Alice.
- Alice is getting better.
where the first contains a none-predicative definition, which is a limit for the predictive definition of the second; this is and my Differential Linguistics and "my" Internal theory.
AI learns, strives toward its limit, has a differential nature (as we are) - this is called Machine Learning, which makes AI different from a computer.
"Government has lacked clear and sustained strategic leadership on data, and individual departments have not made enough effort to manage and improve the data they hold."
A customer of mine had a DVLA scam email this morning.
What a shame that the DVLA's SPF record can't tell you this forgery is a forgery (11 DNS lookups=>PERMERROR, you're only allowed ten).
The Government can't even fix a 180 character DNS TXT record and I've been telling them what's wrong since 2016, through about a dozen different channels, including their Websites, my MP (Dennis Skinner - who reportedly says he's never sent an email in his life), the NCSC and hackerone.com.
A National Cyber Security Centre spokesperson said:
“Our priority is to limit harm to the UK and the public. ..." [https://www.ncsc.gov.uk/news/statement-eurofins-scientific-ransomware-incident]
They sure spend a lot of my taxes, why can't they get even the simplest things right?
They're just a very bad joke.
Me: This government IT system is riddled with problems, including some gaping security holes that the pen testers haven't noticed.
Project Sponsor: I want to replace the input clerks with software robots.
Me: Fix the basic system first. At the moment the clerks are trapping data errors that would cause problems if they were added to the database.
PS: Like what?
Me: Multiple spellings of the same name meaning money goes to the wrong person, other non-unique identifiers and attempts by the public to "game" the system.
PS: Those flaws have always been there, we can work around them. I want the robot up and running ASAP.
I left. I suspect that the robot was added to the system and is now happily working in GIGO mode.
It does not matter how much AI or any other tech you use to process the data, if you feed in poor quality information you just get a different version of the poor quality out of the other end. This obsession that AI can automagically turn shit into steak just sums up all the hype surrounding it.
On another note, destroying records appears to be the norm now everywhere particularly if they are paper.
I recently needed an MRI scan and needed to find out if a previous surgery had used anything that was not MRI safe. This was 17 years ago and when I rang up to sort this out was told that the hospital. records that old are destroyed if there has been no access for 10 years. There then followed a ridiculous number of telephone calls round in circles to sort out what was done. What annoyed me the most was the "MRI safety questionnaire" only arrived 4 days before the scan having waited for weeks. The fact that 2 days of that was a weekend only made it worse.
Yes, they cannot keep everything but it is not beyond the wit of man to create running summary sheet on the front of the file that is kept whilst destroying everything else. Whether that is paper or electronic or, an old paper copy that is scanned at the point the electronic records are implemented is not difficult.