Facebook boffins bake robo-code converter to take the pain out of shifting between C++, Java, Python • The Register Forums

Friday 12th June 2020 20:28 GMT martinusher

Coversion costs?

The way this is written implies that the $750 milliom spent converting a code base from COBOL to Java was merely line by line translating of the source. I don't have any knowledge of this project but based on my experience the transation would be taken as an opportunity to upgrade the systems and its functionality. That is, the 'conversion' was really a system rewrite with all the complexity in design, coding, integration and testing that would entail.

15 1 Reply

Saturday 13th June 2020 00:39 GMT coconuthead

Re: Coversion costs?

Indeed, my observation as a customer of the bank is that the systems have gone from processing a lot of stuff at night for availability the next day to real-time. Their marketing now emphasises some of these features.

There also used to be too many Byzantine procedures which required interaction with a human at the bank, e.g. by phone. These days they only seem to want to talk to you to avoid fraud, overextension (which they are legally required to do) or to otherwise satisfy regulation. Given the size of the bank, it may have been rational to get the conversion done sooner rather than later and cheaper, on the basis of staff costs alone.

8 0 Reply
Sunday 14th June 2020 12:29 GMT J27

Re: Coversion costs?

The author is clearly not familiar with the process of porting code between languages/frameworks.

2 0 Reply

Friday 12th June 2020 21:10 GMT Mike Shepherd

"faster and...more maintainable"

TransCoder...could help port a project from Python to C++...It may make the code faster and also more maintainable since code written in strongly-typed languages can be easier to understand.

Get real. Anyone who's maintained human-written code will have experienced wanting to shake its author by the throat, because typical source code is of abysmal quality. We're a long way from any machine that will improve on that. So don't ask me to work on TransCode's output, because it will have to do a lot more than change x=2 to int x=2 (even if it gets that right) to compensate for the general mess with which it will likely start and to which it can only add.

As for speed, there has been very little code worth speeding up for the last 40 years or more. The overwhelming problem, then and still now, is how to write clear and reliable source that reflects the requirements accurately. Making code faster is appropriate in niche cases, which most of us need rarely to address.

10 4 Reply

Friday 12th June 2020 22:04 GMT Anonymous Coward

Re: "faster and...more maintainable"

I don't think I can name a single application I've ever used (or a lot I've worked on for that matter) that wouldn't have been better if they'd been faster.

Making code faster is only becoming a niche as higher level languages take more control away from the programmer, and as fewer and fewer programmers really understand the performance consequences of their choices and know how to optimise their code.

6 3 Reply
1. Friday 12th June 2020 22:51 GMT RM Myers
  
  Re: "faster and...more maintainable"
  
  Wow, two comments that I can both agree and disagree with, at the same time. Having designed and helped write a series of programs almost 40 years ago which saved over $300K per year in processing time, with probably less than $30K in programming time, I definitely disagree with the "no program in last 40 years comment". Plus, having been involved with systems that had thousands of users at my former employer, I can tell you performance can be critical. We had several large projects that were never implements because the systems were too slow.
  
  At the same time, there were many times where performance was much less important than maintainability, both from a quality perspective and overall cost. When a limited number of management people are making long term billion dollar decisions based on reports and analysis from your system, being fast is much less important than being accurate. And senior management doesn't tend to like it when small changes take months to implement, so lack of maintainability can be career threatening.
  
  YMMV
  
  21 0 Reply
  1. Saturday 13th June 2020 06:53 GMT Anonymous Coward
    
    Re: "faster and...more maintainable"
    
    I never said anything about sacrificing maintainability for performance.
    
    1 0 Reply
  2. Monday 15th June 2020 08:41 GMT big_D
    
    Re: "faster and...more maintainable"
    
    Except optimization and readability / maintainability are not mutually exclusive. You can optimize maintainable code and still have it be maintainable.
    
    2 0 Reply
2. Saturday 13th June 2020 09:18 GMT Rafael #872397
  
  Re: "wouldn't have been better if they'd been faster"
  
  Back in the Old Days, I worked on developing dBase/Clipper applications in DOS (ask your grampa) for a local retail shop. We also had a Turbo Pascal program that printed some random numbers, a simulated countdown and a message "Don't Stop Me - Reindexing and Optimizing the Database".
  
  Of course it was just what we used for a "coffee"-break. And we worked hard on making it slower.
  
  6 0 Reply
  1. Sunday 14th June 2020 23:02 GMT Anonymous Coward
    
    Re: "wouldn't have been better if they'd been faster"
    
    You want to admit to that?
    
    I bet you also played on a CP/M machine too.
    
    0 0 Reply
Monday 15th June 2020 08:39 GMT big_D

Re: "faster and...more maintainable"

Niches, such as desktop computing or web applications... Yes, very little need for optimization there.

As someone who spent a lot of their programming career firefighting poorly optimized code, I can tell you that code optimization is very important and optimizing code and writing human readable code are no mutually exclusive, just that human readable doesn't automatically make it fast to execute.

I've worked on desktop projects that, for example, have reduced the runtime of a financial data collection system from 22 hours to 2 hours on a PC, multiply that up by hundreds of accountants around the world and the fact that it locked the whole PC up for those 22 hours, so they couldn't do anything else, the time invested in optimizing that paid for itself within a month.

Likewise, I've worked on web projects, where the load balanced servers and back-end database would collapse under the load of 250 simultaneous transactions, after a few hours of optimization, the same server configuration didn't break a sweat with over 1,000 simultaneous transactions. Those few hours of work were a lot cheaper than throwing 4 times the hardware at the problem.

I can give dozens of other examples.

2 0 Reply

Friday 12th June 2020 22:58 GMT Anonymous Coward

Based on language translation

It's one thing if you translate "I seem to be having this tremendous difficulty with my lifestyle", into a Vl'Hurgs and I it comes out as the most dreadful insult imaginable.

But if you're dealing with code, less than 100% accurate translation will cause errors which may be catastrophic and will be difficult to track down. Having a neural network do this instead of programmers just makes it faster to generate errors.

16 0 Reply

Saturday 13th June 2020 05:32 GMT Anonymous Coward

Re: Based on language translation

"The generated functions and production code have to be tested; they are not guaranteed to be correct."

Hey, you did have tests for the old code, that you can run against the new code, right? Cuz otherwise who can tell if it's code... or garbage?

But maybe the new code won't be testable if "... accuracy, with 74.8 per cent and 68.7 per cent in the C++ → Java and Java → Python directions ..." . That sounds more like "Here's what the de-obfuscator produced - now make it golden. Only a couple minutes per function, right? All done today?"

Yet elsewhere in the article there's a bit of wonderful saying "... that accurately translates functions ..." How do you get that from the claimed 74.8% "not trash"? Or as I read it, that's one quarter crap.

13 0 Reply
1. Saturday 13th June 2020 07:43 GMT Warm Braw
  
  Re: Based on language translation
  
  The generated functions and production code have to be tested; they are not guaranteed to be correct.
  
  That's not generally a characteristic you would welcome in a compiler. It's not as if there isn't open source code available* for parsing and lexically analysing COBOL and Python, so you just need to glue a code generator on the back end for the target language. You might not get a result that you can visually associate with the original (though you could put that in comments), but at least it would be functionally correct^.
  
  I'm not quite sure how an enormous amount of effort to produce a flawed AI solution + unknown effort to correct the result actually saves time and money. Particularly when elderly code has a habit of working in mysterious and undocumented ways - usually the main reason it is preserved.
  
  Edit: And given the world is supposedly moving towards container-based microservices, why convert stuff anyway if it's working?
  
  *With the exception of proprietary language dialects.
  
  ^Assuming conforming data types for source and target
  
  7 1 Reply
  1. Sunday 14th June 2020 06:01 GMT thames
    
    Re: Based on language translation
    
    As I understand it the main interest in converting from COBOL to a more recent language is due to the limited and shrinking pool of COBOL programmers available.
    
    The old code can be kept running, but as a business changes and evolves, the software must evolve along with it. It's hard to find programmers who want to learn COBOL, because specialising in that field limits themselves to maintaining old COBOL programs, and that sort of work can be feast or famine.
    
    The interest in converting from Java to some other language (e.g. Python) is driven by a determination by some companies to get as far away from Oracle as possible. It's all down to Oracle being Oracle, rather than anything about the language itself.
    
    In either case if the original program had a really good set of unit tests, then you could run the original source code through a translator, run the unit tests, fix what fails, and you would have a working system. You could then gradually re-write the machine translated code to something more understandable by humans to make it more maintainable.
    
    Unfortunately, those sorts of programs rarely if ever have any unit tests at all, let alone have comprehensive ones. That means the first thing you would need to do is analyse the original code and write unit tests manually. And if you're going to do that, you may as well just re-write the source code by hand while you're at it.
    
    2 0 Reply
    1. Sunday 14th June 2020 14:28 GMT Anonymous Coward
      
      Re: Based on language translation
      
      "... run the unit tests, fix what fails, and you would have a working system."
      
      It's a nice idea, but unit tests (specifically, as opposed to other levels of testing) tend to be written in the same language as the actual code. Which means you also need to convert the tests - if they remain in the original language then they probably can't exercise the new code as older systems tend to be less flexible in terms of what they can call.
      
      So how do you test that the tests have been converted okay? (Recursion! Who watches the watchers?)
      
      0 1 Reply

Friday 12th June 2020 23:35 GMT Anonymous Coward

Can’t wait

Looking forward to the logical endpoint of this work: a ML system which, shown images of a language manual and an ISA manual, generates a compiler.

7 0 Reply

Sunday 14th June 2020 14:30 GMT Anonymous Coward

Re: Can’t wait

If the manual contains an image of the language syntax in BNF or similar, we're nearly there (see Bison, Antler, etc) ...

0 0 Reply

Saturday 13th June 2020 09:17 GMT Lunatic Looking For Asylum

GIGO

Struggling to see why you would need something like this. If the code is good and performant, why would you want to convert it. If it's crap then what comes out will be crap as well and you may as well revisit the design and do it better (note I didn't say properly) in your language of choice.

5 1 Reply

Saturday 13th June 2020 10:24 GMT thames

Cython, Numba, etc.

If the objective is performance, there are already multiple solutions for converting Python source code to C or C++. Some are domain specific and intended to just convert a few functions, but some like Cython will do the entire program (to C in this case). Cython has been around for years and is widely used.

The thing is that there is little or no real demand for converting Python to C or C++ and then throwing away the Python and using the C or C++ as the new code base. Things like Cython are used by people who want to write code in Python and compile to machine code. The C or C++ are just used as a form of intermediate code in the compilation step, and the user doesn't normally look at it.

In certain particular applications, the resulting binary is faster than the Python version. However, in other applications, the C or C++ actually runs slower than the Python version. This is why the technique has specific applications (such as numerical algorithms) instead of everyone using it on everything.

The subject of converting Python to C or C++ has been very thoroughly researched by multiple people, and the conclusion that everyone seems to reach is that while it can work in some cases, in order for a static language such as C or C++ to be able to do everything that a dynamic language such as Python can do, it would have to incorporate the equivalent of a Python interpreter inside it.

And indeed this is what Cython actually does. It does what it can in C, but it also calls back into the Python interpreter to do many things. The overhead involved in this means that the translated C program can be slower than the interpreted Python program.

The way these tend to get used in practice is that people benchmark their Python programs, identify bottlenecks, and if they are a good candidate for using Cython, write just those parts in Cython and call them as a function. It's basically used by people who want the equivalent of C extension (many Python libraries are actually written in C) but don't want to write it in C. You can gradually add code hints for the Cython compiler to help it generate better C code.

So, while this may be an interesting research project, it's not introducing anything revolutionary.

5 0 Reply

Saturday 13th June 2020 11:36 GMT Anonymous Coward

Re: Cython, Numba, etc.

"In order for... C or C++ to be able to do everything... that Python can do"

Umm, do you to take a guess at what static language Python is written in? Everything Python can do by definition C can also do. And FWIW modern C++ can do it just as easily and with greater functionality.

5 1 Reply
1. Sunday 14th June 2020 06:01 GMT thames
  
  Re: Cython, Numba, etc.
  
  Yes, Python is written in C. And to do everything that Python can do, a C or C++ program would have to include the Python interpreter in it. C and C++ just don't have the language semantics to do everything that Python can do without duplicating the Python run time, which effectively embeds Python within the C program (embedding Python inside C or C++ programs is something that is done by the way).
  
  There have been numerous attempts to create static compilers for Python by translating the Python source code to C or C++ but without relying on the Python run-time to be present (Cython relies on it). In every case the authors get about 90 to 95% done before running into a brick wall and giving up. If it was possible it would have been done by now.
  
  1 7 Reply
  1. Sunday 14th June 2020 20:25 GMT Anonymous Coward
    
    Re: Cython, Numba, etc.
    
    So C cant do what Python can do even though python is written in C?
    
    *boggle*
    
    Are really this dumb or are you just having a bad day? Either way you clearly dont have a clue what you're talking about and I suspect you're just parroting stuff you've read without understanding it. I suggest you educate yourself about how interpreters and compilers work.
    
    What gem will you come up with next - that assembler cant do everything either and would need Python embedded in it in order to generate the assembler required to carry out Python functionality??
    
    5 2 Reply
  2. Sunday 14th June 2020 22:15 GMT Tim Parker
    
    Re: Cython, Numba, etc.
    
    I think you're getting confused about the difference between the functionality of a program written in a given language and an interpreter for a language. Also the role of the embedded interpreter in a compiled Python program. You do not need an embedded Python interpreter in a program translated from Python to C/C++ (or other language) unless you are deliberately using some specific Python runtime feature, e.g. Idle, nor the Python runtime.
    
    2 0 Reply

Saturday 13th June 2020 10:27 GMT Anonymous Coward

Does I error if it cant do it?

Or does - as neural nets have a habit of doing - just produce output regardless or whether its correct? For example if you had some C++ code that had it's own complex non STL data structures that involved raw pointers pointing to memory mapped files, shared memory or similar, you simply cannot do a line for line translation in Java or Python, it wont work. You have to understand the code and for want of a better word , paraphrase it or even do a ground up rewrite if the language you're translating to doesnt support the functionality.

2 1 Reply

Saturday 13th June 2020 18:41 GMT RichardEM

from Intel to ARM

I wonder if this type of translation could help in Apples move from Intel based Macs to ARM based Macs?

0 0 Reply

Sunday 14th June 2020 23:45 GMT Richard 12

Re: from Intel to ARM

Only if the set of goals does not include "actually works"

This is an interesting experiment, and I'm sure much will be learned from it that can be applied to other problems.

However, it will not be directly useful in of itself within the next 20 years, if ever.

1 1 Reply

Saturday 13th June 2020 19:44 GMT Robert Grant

Time to test it

on the Facebook codebase, then? PHP to Rust, perhaps?

1 0 Reply

Sunday 14th June 2020 06:19 GMT YetAnotherJoeBlow

Even better

Now for something really useful - Java to C. Ditch all the frameworks.

2 1 Reply

Sunday 14th June 2020 21:41 GMT Anonymous Coward

I worked on a system that had beens largely written in Pascal and converted to C with pascal2c. While the converted code compiled and ran, it was impossible to maintain in its C form. That left us tacking on new functionality around the edges but having little or no interaction with the core code. I'd like to know how this FB system with 75% reliability that the conversion even wirksis any advancement.

2 0 Reply

Monday 15th June 2020 06:48 GMT John Smith 19

"they are not guaranteed to be correct.""

75%

Is that better than some s**t flinging code monkey cut and pasting off stack exchange?

So let me see if I got this straight.

It takes a function (which is presumed to be 100% working) and maybe converts it to one that has 75% chance of working, but could be f**ked up in some way ?

One of the usual suspects (EDS perhaps?) was doing something like this where chunks of COBOL were emailed to their mainframe and chunks of C came back. These are big applications. I'm very unconvinced a bunch of ANN's is going to cut it for this. Computer languages <> human languages. 99% of human language ambiguity is designed out from day one to ensure they can be compiled.

1 1 Reply

Monday 15th June 2020 11:17 GMT Spoonsinger

I can just imagine the joys...

of maintaining a set of source which has been generated from a totally different language. Glad it's not going to be me,

3 0 Reply

Monday 15th June 2020 20:18 GMT John Smith 19

Back in the day the sign of a good CASE tool was...

If you could make all the manipulations you needed without digging into the code the system generated.

I worked with 2 such systems.

One never touched it. No problems.

Other. Some nappy had bought the code generated by the CASE model rather than the model itself.

It was f**king horrible.

0 1 Reply

Topics

Special Features

Vendor Voice

Resources

COMMENTS

Coversion costs?

Re: Coversion costs?

Re: Coversion costs?

"faster and...more maintainable"

Re: "faster and...more maintainable"

Re: "faster and...more maintainable"

Re: "faster and...more maintainable"

Re: "faster and...more maintainable"

Re: "wouldn't have been better if they'd been faster"

Re: "wouldn't have been better if they'd been faster"

Re: "faster and...more maintainable"

Based on language translation

Re: Based on language translation

Re: Based on language translation

Re: Based on language translation

Re: Based on language translation

Can’t wait

Re: Can’t wait

GIGO

Cython, Numba, etc.

Re: Cython, Numba, etc.

Re: Cython, Numba, etc.

Re: Cython, Numba, etc.

Re: Cython, Numba, etc.

Does I error if it cant do it?

from Intel to ARM

Re: from Intel to ARM

Time to test it

Even better

"they are not guaranteed to be correct.""

I can just imagine the joys...

Back in the day the sign of a good CASE tool was...

POST COMMENT House rules

Enter your comment

Add an icon

Other stories you might like

Google Cloud chief is really psyched about this AI thing

What's up with AI lately? Let's start with soaring costs, public anger, regulations...

What if AI produces code not just quickly but also, dunno, securely, DARPA wonders

British watchdog has 'real concerns' about the staggering love-in between cloud giants and AI upstarts

AI spam is winning the battle against search engine quality

Tough luck, bosses, AI is coming for your job, too

How to coax ChatGPT into making better predictions: Get it to tell tales from the future

Microsoft rolls out safety tools for Azure AI. Hint: More models

Boffins deem Google DeepMind's material discoveries rather shallow

OpenAI's GPT-4 can exploit real vulnerabilities by reading security advisories

Industrial robots make people feel worse about jobs and themselves

75% of enterprise coders will use AI helpers by 2028. We didn't say productively

About Us

Our Websites

Your Privacy