A Ghidra loader for the Linear eXecutable format

Oct. 11, 2019, 5:27 p.m.

I recently was looking at some old games. Really old games. Games that were running on the DOS. When I grabbed a random game, it turns out that it was built using the DOS4GW extender. 

When I was young (almost 10 years ago) I helped a little bit with porting swars to the modern OS. When I looked around I couldn’t find any free loader for the LE/LX format, so I decided to create one.

TLDR
Here is the loader. All information about installing and running the loader can be found in the README file.

Why ghidra?
Basically, why not? It’s open-sourced and works on FreeBSD! :)

Options
I looked into the ReWolf blog who mentioned that there is an option to use IDA Free4.1 to decompile the LX format. The only problem is that this version of IDA is super old – so old that it’s still a console application. Fortunately, there is a possibility to load an IDA 4.1 database to IDA 5.0 which is a little more convenient to use. Unfortunately, we can’t load it to a more modern IDA version. As far as I know you can buy an LX/LE loader from hex-rays - but it’s not cheap.

There is a boomerang decompiler which has a LX/LE parsing so this was my base. Unfortunately, the loader does not support all sections, and is very aggressive while parsing the page table:

// at this point we're supposed to read in the page table and fuss around with it
// but I'm just going to assume the file is flat.

I decided to do the full support for page tables.

As far as I know, neither BinaryNinja nor ghidra support the LE/LX format.

Writing Ghidra loader
I never implemented any Ghidra loader. I did some research. There is a nice tutorial about writing loaders - here. To be honest, I much prefer to experiment so my mostyle was based on the opensource loaders like GhidraPS4Loader and mclf-ghidra-loader.

Documentation problems
Fortunately, the documentation for the format is published. However, the documentation for the LX/LE is not very detailed. I was very happy that I was able to use the output of the IDA and the boomerang decompiler. I also noticed some mistakes in the documentation.

The first mistake that I notice was that the ObjectPageTable was supposed to have a 64 bits structure:
      63               32 31       16 15      0
       +-----+-----+-----+-----+-----+-----+-----+-----+
   00h |    PAGE DATA OFFSET   | DATA SIZE |   FLAGS   |
       +-----+-----+-----+-----+-----+-----+-----+-----+

I noticed that during parsing the file flags start to surge to a lot of different undocumented values. When I started to parse the number as 32 bits everything started to look good.

Another odd thing is the PAGE OFFSET SHIFT. As per the documentation, it is supposed to be:
Every  page begins on a PAGE OFFSET SHIFT  boundary  aligned offset
from the demand loaded pages base specified in the linear EXE header.

It doesn't make much sense and I think that this is just the size of the last page as documented here (This website is kind of weird and has a lot of ads – sorry for that!)

Upstreaming
While writing this post I noticed that there is stub for LX in ghidra. Maybe if there is a lot of interest from the community I will upstream to the GHidra project.

Feature work
I was able to decompile a few games using this loader. I compared the output of my loader vs the IDA 4.1 Freeware and it looks pretty much the same, but there is still some work to do. To see the up-to-date to-do list please visit the github page:
- Implement rest of the source types.
- Implemented entry points (on the binaries I tested the section was 0s - so if you have another binary please contact me).
- Implement debug section.
- Code cleanup (I was rewriting everything from spec, and some methods/names could have better naming).
- Don't use hardcoded values.
- I'm not a java developer (get rid of the emitU functions).

Have fun hacking! :)