back to article Dirty data, flogged cores: YES, Microsoft SQL Server R Services has its positives

The R language has enjoyed a great reputation in statistical computing and graphics for decades. However, it is also known as something for statisticians. Born around the time of Java, PHP and Python, R lags behind all three by a long chalk on the TIOBE rankings. Yet Microsoft spotted an opportunity in this era of analytics …

  1. GreggS

    Or did they just include R because they're Pirates?

    1. Gene Cash Silver badge

      Yes, it'd use a lot less memory if you didn't have to say "matey" at the end of every line of code...

  2. Joe 35

    Dirty data

    If the data in your transactional database is "dirty, inconsistent and full of errors" you have a fundamental problem to deal with first, and no doubt some very aggrieved customers..

    1. Adam 52 Silver badge

      Re: Dirty data

      You've not looked at many real-world transactional databases have you?

      I'll give you some examples:

      1. Has your business remained unchanged for decades? You didn't start asking for email address at some point around the late 90s? Or maybe storing full years around 1999?

      2. How many online orders to you have from little bobby tables?

      3. Have you ever sold a product at the wrong price and had to do a refund?

      4. You remember when Yugoslavia split? How has that affected your regional roll-ups?

      1. Mark 65 Silver badge

        Re: Dirty data

        Thing is, and that the article misses, is that you need to do some kind of analysis first to see how dirty or useless your data is.

  3. Doctor Syntax Silver badge

    Here's the real rub: running R Services in SQL Server 2016 is running analysis on your transactional databases. That's your live database, your R code is running inside your production database, eating the CPU cycles and disk access, slowing down your expensive SQL server.

    You could use a second server to run R, but then you've got the potential network bottleneck of moving the data back and forth between the machines.

    That's only one of the problems. The data inside a transactional database is not designed for analysis; it's likely to be dirty, inconsistent and full of errors.

    Let me second Joe's comment about dirty data.

    Apart from that, you can always restore your transactional DB backup to your analytical server. That way you get real data and test the restore procedures at the same time. There may, of course, be other issues with this - such as data protection - but the objection as quoted really doesn't stand up.

    1. Bronek Kozicki

      ... or just use replication?

  4. Anonymous Coward
    Anonymous Coward

    The fix

    The fix is not to let baldies collect your data set.

  5. W. Anderson

    Not a new R language/Relational Database capability

    Scientists , economists and others have reprted using R programming language in conjunction with PostgreSQL Object/Relational database as well, so I do not se this opportunity as specific or original to Microsoft SQL Server only, other than the company promoting this new found functionality as their own.

    1. Adam 52 Silver badge

      Re: Not a new R language/Relational Database capability

      Might I suggest you take some time to research what Microsoft's product is? Reading the article you commented on will give a few clues.

  6. FozzyBear
    Devil

    Anyone

    That attaches R or other analytical services directly to their production databases should be shot, drawn and quartered, stabbed, poisoned, drowned, impaled, tortured and then finally their tattered remains hung above the entry to the IT department as a warning to others

    And I hope everyone recognises the restraint I have shown in the punishments that should be inflicted on the individual

    1. s2bu

      Re: Anyone

      I never had any problems using PL/R in PostgreSQL, even in production.

      The whole argument in this article about using Python instead isn't even a real comparison. Using server-side R vs client-side Python are two completely different things. If it involves retrieving huge amounts of data, even if R is slower, it's going to be faster on the server side as its not going to involve transferring huge amounts of data.

    2. Mark 65 Silver badge

      Re: Anyone

      I believe Oracle has an R option also, not that the above treatment should not be meted out to Mr Ellison.

    3. Ken Moorhouse Silver badge

      Re: Anyone

      You forgot about slicing and dicing them.

  7. Mark 65 Silver badge

    Another issue for newbies will be the need to get your head around the practicalities of R being a domain-specific language. You will almost certainly need an understanding of statistics to get meaningful answers out of your code.

    I'm curious as to what analysis someone would be doing of the data without an understanding of statistics - mean, max, min?

    1. o p

      The median for example

  8. P.B. Lecavalier
    Megaphone

    Hard is a Good Thing

    "For all the criticism of the R language – it's hard to understand, slow and a memory hog"

    If you find R hard, then you need to spend some time with SAS, which consists of 4 or 5 languages patched together because individually, each of them is utterly inadequate to get you anywhere. Then whatever you do in R with 1 or 2 lines of code will take at least five times that in SAS, and forget about the concept of "package". R is made for smart people by smart people, and it better stay that way to keep Excel and VBA kids at bay.

    The article mentions python as a great language (I agree, it is), but R's internal documentation is so much better than python. Why? Because it always features examples for simple things! Whenever I look into the standard library reference of python and look for a module that could be of some use, I have to look somewhere else to figure out how to do anything with it (i.e., ok, so this is some instance of a class, then what do we do with that?). Without Google, most people using python would have a really hard time (myself included). Without Google, for R, it would be an inconvenience, but I could still find my way around.

    Should it be a "memory hog" on a server (never had any issue with it, but never run it on a production server either) is not surprising because it's development did not focus on making it a daemon.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2021