* Posts by Martin Maisey

14 publicly visible posts • joined 5 Sep 2008

Why should you care about Google's AI winning a board game?

Martin Maisey

Re: Shall we play a game?

The point I think you've missed is that no-one "provide[d] it the logic about how to trim the decision tree". It figured out the board evaluation and move generation functions based on completely general (e.g. non game-specific) neural network learning algorithms using historical data and reinforcement from playing itself. And did so well enough to beat the world champion on the first try.

With a branching factor the size of Go's, that tree-trimming logic is the part that really requires "intelligence" - or at least that part of intelligence akin to intuition, which is historically something people think that computers lack, and why Go has been a target of the AI community for so long. Humans have been poor at explaining their intuitions about Go as heuristic rules (beyond a few basic good/bad patterns), so the "traditional" approach of hard-coding rules has resulted in computers that suck, because they search the wrong part of the vast search space.

With chess playing computers, there was a much easier argument that there was no real intelligence at play, because the hard-coded heuristic rules and the search algorithm were constructed entirely by humans, with the computer just supplying brute force execution. With AlphaGo, there's definitely 'learned insight' present in the form of the trained policy and evaluation networks in place of heuristic rules.

No, of course it's not a general AI - no-one's claiming that it is. But more and more of the domains that were thought to be difficult for even specialised AIs are falling - alongside this, witness advances recently in natural language processing, computer vision etc.. And if you look at our brains, they look an awful lot like a lot of these specialised AIs networked together. Read "Thinking Fast and Slow" for an interesting psychology-orientated take on how much of our thinking seems to be dominated by the type of intuitive functions that AlphaGo's just shown itself to be world-beatingly good at.

By the way, apparently DeepMind are currently working on a specialised AI that will learn to play any card game (rules, strategies and all) just by watching videos of humans playing... :-)

Behold, the fantasy of infinite cloud compute elasticity

Martin Maisey

Re: I think the author missed something

Also I think there's an element of setting up a straw man - the article says "We’re told it’s elastic, infinitely elastic even, with thousands of virtual servers spun up in minute". But I don't know who's really saying that.

To take Amazon as an example, their EC2 "Purchasing Options" page (https://aws.amazon.com/ec2/purchasing-options/) is very clear that on-demand instances are not guaranteed to launch at times on high demand. If you try to launch thousands of servers, you've likely just brought on that high demand situation yourself so it will fail.

In practice I suspect that Amazon will terminate spot priced instances below the on demand rate to make room for your on-demand requests, then turn them down. They will then retain a certain limited amount of spot capacity which you're free to bid for. But clearly you can't bid for more than is there, and attempts to bid for too much will drive the price though the roof.

If you have a workload which requires you to be able to occasionally reliably fire up lots of instances, you need the third commercial model, which is (and the hint's in the name) reserved instances. You can reserve up to 20 instances a month for no cost, or pay up front to get reduced fees. More than that and you have to put in a manual request to them.

My suspicion is that if you say "can I reserve 1000 instances, please", they'll come back and say "please can you pay a bit up front, so we know you're serious". Then they'll try to flog any unused capacity as spot priced instances.

All pretty transparent, if you ask me...

Want security? Next-gen startups show how old practices don't cut it

Martin Maisey

Re: Not fully convinced

Thanks for the answer, and for the Twitter conversations. I'm definitely going to take a look at the tech in more detail. The interesting thing for me is the rules that determine isolation, how robust these are in terms of false positives, and the interplay between confidentiality / integrity concerns and availability - the latter being, unfortunately, probably all your manager's manager cares about in the average enterprise (separate issue, that, and unfortunately one technology won't solve). I can't help but think that after a couple of false alarms, the response will be "the CEO's said turn the bloody thing off, or it will kill our business". Particularly if there's any way those events might be triggered as a DoS by - for example - disgruntled insiders etc.

By the way, I have for some time thought that honeypots have been a much undervalued component of most security solutions, as they reverse the attacker advantage and generate really, really, high quality red flags for action. John Strand and Paul Asadoorian's book ‘Offensive Countermeasures: The Art of Active Defence’ is a nice read on this topic. Any security system that integrates them gets a gold star in my book.

But to continue play a devil's advocate for a moment, are most data stealing attacks really actually that fast that they require fully automated isolation? Even in an eggshell datacentre, most require a human to do external recon, break an endpoint, work out what security is actually in place, establish C2, break internal systems to research the DC/app environment, run some scans, move laterally to obtain some credentials, actually break the systems with the sensitive data, then potentially stage the data to systems where it can be actually exfiltrated. At each stage, they have to be really careful, as getting discovered may mean complete failure. I don't have hard numbers to hand on this, but I suspect most attacks evolve over days or weeks. Most internal security teams are horribly underfunded/undertooled, but could do a decent job if that were fixed.

On the other hand, DoS type attacks can happen quite quickly, as often the attacker doesn't care about leaving lots of tracks etc., particularly if they're out of reach of the long arm of the law/not thinking straight; they just want to damage the organisation.

Martin Maisey

Not fully convinced

Completely agree on the prevalence of eggshell security, I'm just not sure automated response is viable a lot of the time. Rapidly and automatically fusing lots of correlated sources (including honeypots etc.) with a recommendation action for a human to eyeball - absolutely.

The reason is that if attackers can rely on an automated action being taken, they can manipulate this for there benefit. Standard example with traditional burglar alarms that silently notify police - turn up on 3 consecutive days and do something that triggers the alarm without breaking in. Police will turn up once, max twice, then decide they're dealing with a faulty alarm. On fourth day, turn up, break in.

Vision? Execution? Sadly, omission and confusion rule Gartner's virty quadrant

Martin Maisey

Re: In the market for two years

Agree with Trevor here.

Even cheap commodity servers have huge amounts of available CPU and RAM now, and for most workloads IO is the limit. It's abundantly clear to anyone with actual systems engineering knowledge that a technical approach that puts the processor on the same PCIe bus as the storage for read IOs while offering similar benefits to a traditional SAN for write workloads is going to offer dramatically better performance than one which requires going out over SAN or iSCSI. A single SATA flash drive can flood most FC HBAs or a fair number of 1GBe NICs for sequential IO. Given that a typical 2U modern server is going to be capable of housing a fair number of those, plus some really fast PCIe server flash, it will whip a virtualisation host + SAN based approach for all but the most write intensive workloads. Given the recent revelations from Intel regarding their new persistent RAM technology, I can only see this gap widening.

The reason Trevor's been shouting about Nutanix is that they have a very clever technical approach and have managed to get real customers. This is a rare combo. VMware have responded aggressively with VSAN/EVO:RAIL etc and it will be an interesting fight. Microsoft will no doubt lumber onto the converged infrastructure train at some point and probably ultimately do quite a good job if a fair bit later than the leaders, if history is any guide.

For what it's worth, I have absolutely no connection with any of the companies mentioned. I've not played with Nutanix or EVO, although I have read a fair number of white papers in detail.

Speed freak: Kingston HyperX Predator 480GB PCIe SSD

Martin Maisey

Using with FreeNAS / MicroServer G8

I got the 240GB version of this recently to put in a new HP MicroServer G1610T / FreeNAS build. I use it for L2ARC read cache, for which it is absolutely outrageously fast for both large sequential read and small random IOPS type workloads. It's paired with a single Crucial 80GB SSD (supercapacitor-backed) for ZIL write log, connected to the optical SATA port - SATA-II, but not much bandwidth is required for ZIL logging as by definition the writes are small. Volume storage is 4x2TB WD RED SATA. 16GB of RAM completes the picture, of which 8GB is used for running FreeNAS and for ARC read cache. FreeNAS boots off MicroSD.

This provides a bit of a beast of a SAN for my network (client access via CIFS/Plex, Proxmox VM/container storage over NFS/iSCSI - all via 2 aggregrated 1GbE NICs) in a *very* small physical package.

The expansion potential of the MicroServer is pretty limited (4xSATA 3 via SAS HBA, 1 x SATA2, 1 x PCIe) and this feels to be a reasonably balanced configuration for converged end user storage and VM NAS pool use cases. This pretty much describes my home network - file, media and print serving have to just work otherwise my wife gets upset, but I also want to be able to play with stuff ;-)

As an added bonus, I can run a few VMs directly in the remaining 8GB on the box using the FreeNAS VirtualBox plugin. These VMs then get very fast access to storage over the local PCI and SATA buses, rather than having to access over the network. There are a few temporary 'lab' style workloads (for things like Cassandra etc.) that might benefit from that, though they would have to not require much CPU given the relatively weedy Celeron processor provided as standard with the MicroServer. There are options for upgrading the CPU - e.g. to the Xeon E3-1265Lv2 4 core / 8 thread Ivy Bridge - but I'm not sure I can really be bothered with that... .

The only limitation I've so far found is that there's no real option to put 10GbE NICs in to see how fast the NAS is capable of servicing large block reads from the HyperX-cached working set - strongly suspect the limitation here will be the 1GbE NICs built into the MicroServer.

Reg probe bombshell: How we HACKED mobile voicemail without a PIN

Martin Maisey

Hmm

'We approached Three about this, and a spokesman said: "The advice we've always given customers about security is to mandate their PIN. This is particularly so for people who worry that if a phone is stolen, it might be used to access their voicemail. This advice is given under the voicemail security pages of the Three website."'

Unfortunately, that's describing a completely different threat model from being hacked by any random person who knows your mobile number. Also, their voicemail security page says "You'll always be asked to enter your phone number and PIN if you access your voicemail from another mobile or landline phone." - which is manifestly wrong. Not impressed, good thing I'm not using them for voice at the moment.

Cloud doctors, DevOps and unconferences: Pass the Vicodin

Martin Maisey
Thumb Up

Ops is never going away...

Completely agree - recent post with more detail at http://mjmaisey.tumblr.com/post/41018554265/devops-and-noops

When your squash partner 'endorses' your coding skills on LinkedIn...

Martin Maisey

Actually potentially not so stupid...

Agree that the number of endorsements you have for a particular skill will still be a very poor indication of how good you are at it. This is because an endorsement as currently displayed in LinkedIn doesn’t provide any indication of the expertise of the endorser. Therefore gaming this metric becomes trivial - I just need to find people to ‘swap’ skills with.

But I suspect that this is only the first data collection stage in a sophisticated plan. I spent 4 years in a role specialising in social network analysis (have moved to a different, unrelated job now), and combining analysis of networks with good quality real world data is powerful, even in the face of large amounts of "noise" of the type described in this article.

Building an expertise score for each person/skill combination is much more involved than just counting the endorsements. It's analogous to Google’s original PageRank algorithm, and in particular its cousin TrustRank. Implementing this in large networks is technically hard, but the basic idea is that you ‘flow out’ reputation scores for each skill through the network of endorsements, starting at known experts. Endorsements for a particular skill from someone would be weighted by their own score. As a result, an endorsement wouldn’t count for much or anything unless the person providing it had been directly or indirectly endorsed by someone who is definitely an expert.

As they’re clearly collecting lots of endorsements very quickly, LinkedIn’s main challenge will be to identify good experts. It could probably be done manually for common skills in highly LinkedIn connected industries such as IT, although this wouldn’t scale for less connected industries and less common skills.

However, LinkedIn will have lots of ways of doing this fairly well on an automated basis given the incredibly rich datasets they have access to. In the IT space at least, they are rapidly approaching saturation. This means they know about many of the approaches recruiters make to potential candidates. They also know about virtually all job moves and whether they resulted in a person staying with a company for a while and forming a stable professional network. This is valuable data to mine.

LinkedIn is, if they play their cards right, about to establish a high quality set of skill scores ranking a significant proportion of the world’s population. They may well not make this directly visible to users, but use it for - for example - refining recruiter search results. Once established as a key recruitment tool, being a participant in this system will not be optional, and it becomes a self reinforcing monopoly - similar to the one Google have managed to manufacture for themselves in advertising, but I’d argue more valuable.

One final thought. A really interesting (meta-)score you can generate for each person is how good they are at endorsing people who are already highly scored by others. You can then use this to further weight the reputation flow in the network and filter out people who are genuinely skilled, but endorse unreliably. I’d argue this is a very interesting metric in its own right. Particularly for senior people, the ability to accurately assess another’s skill level is possibly the most important skill of all.

I’d recommend people are highly selective about who they endorse and for what.

(The above is mostly rehashed from a blog post I've done at mjmaisey.tumblr.com, which also talks about the major problems with Klout if people are interested...)

Search engines we have known ... before Google crushed them

Martin Maisey

Excite

Probably worth pointing out another important alumnus of Excite - Doug Cutting, who went on to write the initial version of Lucene (probably the leading open source search engine), moved to Yahoo, and while there co-wrote the initial version of Hadoop.

User data stolen in Sony PlayStation Network hack attack

Martin Maisey
Thumb Down

Incompetence

This simply should not have been possible, and even if it was it is completely unacceptable they still don't know what's gone and what hasn't.

What was Sony doing storing sensitive information like date of birth, address, security questions and answers, and possibly c/card details on online servers where the data could be exfiltrated?

For some of this (e.g. DOB) there is simply no reason at all to keep this accessible on Internet connected servers - just categorise the user into age bands and punt the data somewhere offline, then periodically update through a one way batch update when they change into a new age band.

Other data (e.g. C/card details, security answers etc) should have been stored between a double layer of firewalls from different vendors, with carefully controlled access to minimal required logic (e.g. execute this transaction, check this security answer) with application level firewalls that alert on any unusual activity and logging of every single packet on the network to something like a LogLogic monitoring appliance.

This should have been required to get through their PCI DSS compliance. Some hard questions need to be asked of Sony executive management, their security team and their PCI auditor.

And I cannot believe that with this level of incompetence, they are not even offering free credit monitoring to those that want it.

Researcher warns of data-snooping bug in Apple's Safari

Martin Maisey
Alert

The workaround in the article is no longer sufficient

There's a revised workaround at the blog page (more complex, unfortunately)

RIM Vodafone BlackBerry Storm

Martin Maisey

Had one, returned it.

Completely agree with this review. I tried out an early demo unit in store and quite liked it, plus had loved the Blackberry Pearl I had previously, so put in a pre-order.

However, within a few hours of getting home it just annoyed the hell out of me. Killer problem for me was that I was frequently missing menu items and getting the one above/below. Either there was something wrong with the touch detection or the menus are just too small for touch operation - I've not had this problem with any other touch devices. Combine this with the lag inherent in correcting a missed menu item and it's an incredibly frustrating. And for extended typing it is awful. Anyone used to the usually snappy Blackberry response is going to be very disappointed.

And yes, it has (for example) a Facebook app, but it's exactly the same as the one on the Pearl. Only they haven't even replaced the low res icons, which are incredibly ugly and blocky on the bigger screen. This is symptomatic of the attention to detail shown throughout the phone - everything just feels very, very clunky and rushed.

One of the main reasons I the Storm instead of an iPhone is that some of the pre-launch reviews (and the fair heft to the unit) also suggested it should have much better battery life, and battery life is something I care about. Whereas, in fact, it's worse. So I returned it, got an iPhone 3g instead, and am really happy. Everything's just so much slicker and faster to use and even the battery life is not too bad - as long as you turn off 3g when you're not actively browsing...

I think Blackberry need to go back to the drawing board and design a UI from scratch for touchscreen. If they can do this, and sort the battery life, they've got a winner as the hardware seems pretty good. As it is, avoid unless you are very patient and really can't stand the idea of buying from Steve Jobs.

Group Test: Wireless music streamers

Martin Maisey

@James Bassett

SqueezeCenter runs directly on a number of NASs via a piece of free software called SSODS or SSOTS (depending on model), which also makes installation very easy. I run it on a QNAP 209 Pro II and it works beautifully in a two-room setup with a new Squeezebox Duet and an old, original Squeezebox. I bought mine from qnapstore.co.uk, who preloaded the SSOTS software.

The only slight thing to watch is that while NASs are much quieter and less power hungry than PCs, it is at the expense of some processing power. So if you're looking to have it stream to _lots_ of rooms simultaneously, it might run out of juice: a typical FLAC stream seems to take about 20% of CPU. Things like the web browser interface and full collection rescans are also a fair bit slower than they were when I was running SqueezeCenter off a full PC - not that I use them much.