I'm making a compiler!

May 10, 2017

It’s been almost 3 months since I joined Intel as a BIOS developer. I like it here, and mostly I’m relieved that later this year I will celebrate my 30th birthday and not be living in my parent’s house. Not that there’s anything wrong with that, but my sense of self-worth was starting to get hit because I was not being a productive member of the economy.

I have been working on Milton. I still love my little paint program and if the gods permit I may get off my proverbial ass and do a new release.

But today I’m not here to write about Intel or Milton. I am here to write about my shiny new project! It is a compiler for a superset of the C programming language.

Here’s the plan: Take the standard C99 that we all know and (mostly) love and add some new language features. Nothing crazy, mostly things that I often wish I had whenever I write C, but not so many as to change it into something too unlike C.

One thing that would be nice in C would be to make the compiler pick the order of struct members to optimize memory layout. We nearly always organize struct members conceptually and don’t care very much about their location in memory. I haven’t given much thought to syntax, but perhaps something like this would work out:

auto struct MyStruct {
    char bar;
    int foo;

    char my_pretty_variable;
};

In a C struct, if we kept the struct member in this order there would be 3 bytes of padding before member foo. Instead, this compiler would emit code equivalent to:


typedef struct MyStruct {
    int foo;
    char my_pretty_variable; // No padding!
    char bar;
    char pad[2]; // Just use pragma(pack)
                 // if you really want to get rid
                 // of the padding at the end
} MyStruct;

Also notice how auto struct is also syntactic sugar for a typedef.

Speaking of auto, C++’s auto for type inference is a good thing. Let’s add that!

I want scc (by the way, I’m calling it scc) to prioritize compilation speed and runtime safety over speed optimization. Say you write

int arr[10];
/* do stuff */
x = arr[i];

scc should generate code that is similar to this:

int arr[10] = {0};
/* do stuff */
if ( i < 10 ) {
    x = arr[i];
} else {
    // scc will call a handler for out-of-bounds access, then exit gracefully.
}

All variables should be zeroed-out by default and access to arrays should be checked whenever the compiler knows its size. And it should be possible to disable these checks with a #pragma.

For pointers in general, it would be nice to implement something like clang’s address-sanitizer.

I would like to support a couple more far-fetched features. How about something like ISPC to write SIMD kernels? How about supporting GPU kernels a la CUDA/OpenCL?

Current status

The compiler at the moment is about 1000 lines of C, but it’s already a working compiler. It parses simple expressions on integers respecting operator precedence and it emits x86 code, which gets linked into an .exe on windows and an elf binary on linux. That is all there is right now. It can be pretty overwhelming to tackle a big problem like a C compiler, so I’m attacking it by starting with something short and simple and iteratively building it into something more complex.

Closing thoughts

noidea

I took automata theory in college at least 5 times. (It’s really hard to get me to do homework so I tended to fail a lot of classes). I took a compiler class which didn’t get much further than simple parsers. Also, I bought a compiler book about a week ago. Other than that, I have no idea what I’m doing and I wouldn’t be too surprised if I said something really dumb! I’m doing this to learn (but I secretly wish to conquer the world).

Tune in for the next blog post! You can also follow me on twitter if you want. You don’t have to. Would be nice, though <3