> If you’ve ever talked to me in person, you’d know that I’m a disbeliever of AI replacing decompilers any time soon
Decompilation, seen as a translation problem, is by any means a job that suits AI methods. Give time to researchers to gather enough mappings between source code and machine code, get used to training large predictive models, and you shall see top notch decompilers that beat all engineered methods.
> Give time to researchers to gather enough mappings between source code and machine code, get used to training large predictive models, and you shall see top notch decompilers that beat all engineered methods.
Not anytime soon. There is more to a decompiler than assembly being converted to x language. File parsers, disassemblers, type reconstruction, etc are all functionality that have to run before a “machine code” can be converted to the most basics of decompiler output.
> Decompilation, seen as a translation problem, is by any means a job that suits AI methods.
Compilation is also a translation problem but I think many people would be leery of an LLM-based rust or clang -- perhaps simply because they're more familiar with the complexities involved in compilation than they are with those involved in decompilation.
(Not to say it won't eventually happen in some form.)
It's also perfect for RL because it can compile it's output and check it against the input. It's a translation exercise where there's already a perfect machine translator in one direction.
It probably just hasn't happened because decompilation is not a particularly useful thing for the vast majority of people.
“Resurgence” not “resurgance”. I wanted to leave a comment in the article itself but it wants me to sign in with GitHub, which: yuk, so I’m commenting here instead.
> If you’ve ever talked to me in person, you’d know that I’m a disbeliever of AI replacing decompilers any time soon
Decompilation, seen as a translation problem, is by any means a job that suits AI methods. Give time to researchers to gather enough mappings between source code and machine code, get used to training large predictive models, and you shall see top notch decompilers that beat all engineered methods.
> Give time to researchers to gather enough mappings between source code and machine code, get used to training large predictive models, and you shall see top notch decompilers that beat all engineered methods.
Not anytime soon. There is more to a decompiler than assembly being converted to x language. File parsers, disassemblers, type reconstruction, etc are all functionality that have to run before a “machine code” can be converted to the most basics of decompiler output.
> Decompilation, seen as a translation problem, is by any means a job that suits AI methods.
Compilation is also a translation problem but I think many people would be leery of an LLM-based rust or clang -- perhaps simply because they're more familiar with the complexities involved in compilation than they are with those involved in decompilation.
(Not to say it won't eventually happen in some form.)
Typical obfuscation sure. vms, obfuscation and everything in between is just noise to AI.
It's pattern matching, plain and simple, An area where AI excels. AI driven decomp is absolutely on its way
It's also perfect for RL because it can compile it's output and check it against the input. It's a translation exercise where there's already a perfect machine translator in one direction.
It probably just hasn't happened because decompilation is not a particularly useful thing for the vast majority of people.
Maybe in conjunction with a deterministic decompiler.
precision wrt translation, especially when the translation is not 1-to-1, is not excellent with LLMs.
In fact, their lack of precision is what makes them so good at translating natural languages!
“Resurgence” not “resurgance”. I wanted to leave a comment in the article itself but it wants me to sign in with GitHub, which: yuk, so I’m commenting here instead.