Decompiling 2024: A Year of Resurgance in Decompilation Research

benob 3 hours ago

> If you’ve ever talked to me in person, you’d know that I’m a disbeliever of AI replacing decompilers any time soon

Decompilation, seen as a translation problem, is by any means a job that suits AI methods. Give time to researchers to gather enough mappings between source code and machine code, get used to training large predictive models, and you shall see top notch decompilers that beat all engineered methods.

__alexander 9 minutes ago

> Give time to researchers to gather enough mappings between source code and machine code, get used to training large predictive models, and you shall see top notch decompilers that beat all engineered methods.
Not anytime soon. There is more to a decompiler than assembly being converted to x language. File parsers, disassemblers, type reconstruction, etc are all functionality that have to run before a “machine code” can be converted to the most basics of decompiler output.
wzdd 17 minutes ago

> Decompilation, seen as a translation problem, is by any means a job that suits AI methods.
Compilation is also a translation problem but I think many people would be leery of an LLM-based rust or clang -- perhaps simply because they're more familiar with the complexities involved in compilation than they are with those involved in decompilation.
(Not to say it won't eventually happen in some form.)
kachapopopow 2 hours ago

Typical obfuscation sure. vms, obfuscation and everything in between is just noise to AI.
donatj 2 hours ago

It's pattern matching, plain and simple, An area where AI excels. AI driven decomp is absolutely on its way
- ChrisKnott an hour ago
  
  It's also perfect for RL because it can compile it's output and check it against the input. It's a translation exercise where there's already a perfect machine translator in one direction.
  It probably just hasn't happened because decompilation is not a particularly useful thing for the vast majority of people.
- dartos an hour ago
  
  Maybe in conjunction with a deterministic decompiler.
  precision wrt translation, especially when the translation is not 1-to-1, is not excellent with LLMs.
  In fact, their lack of precision is what makes them so good at translating natural languages!

loloquwowndueo 3 hours ago

“Resurgence” not “resurgance”. I wanted to leave a comment in the article itself but it wants me to sign in with GitHub, which: yuk, so I’m commenting here instead.