Reply to post: Re: Random identifier

Apple says sorry for Siri slurping voice commands of unsuspecting users

eldakka
Holmes

Re: Random identifier

How would they re-associate it?

Because each recording has a unique identifier assigned to it as it traverses - and sits inside - their systems.

It could, for example, be a hash of the device/account ID plus sequence number (or some other salt). But for that hash to not be anonymous for the first 6 months, it either has to be reversible (e.g. a weak MD5 hash) or, more likely, there is an index (database) somewhere that maps the hash back to the account ID and sequence number. Therefore if you have the hash, you also have a reference that can be used to get the account ID, and from the account ID you now have a reference to all that accounts personal data. To anonymize it after 6 months, in the first instance they'd have to delete the hash from the recording metadata entirely since it is reversible, in the latter case they'd have to delete the mapping they've stored in the index.

The problem is, what about the backups? Where do they backup the recordings, and the index, to? So if they delete the hash from the recording, could you go into the backup systems and restore a 5 year-old backup of the recording that still has the reversible hash in it? The same is true for the index, could an index be restored from before the mapping was deleted, again say a 5 year old backup?

What about the resulting transcript itself? Do they do the same (attach a hash) to the transcript that has the same issues? Maybe they include the original hash from the recording in the transcript, so even if the recording itself is 'anonymized', have they also removed that hash from the transcript?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon