Today is a good day to code

Trials and Tribulations of a PHP Interpreter / Compiler Writer

Posted: June 7th, 2009 | Author: | Filed under: android, java, Programming | Tags: , , , , , , , , , | No Comments »

For the past couple of months, I have been struggling to write a PHP interpreter for an upcoming Android version of Mides.  The tokenizer was fairly easy, building the Abstract Syntax Tree was relatively uncomplicated, but now we come to actually writing the executor.  This is giving me pause.

I totally understand why Joel Spolsky says that there are two classes of programmers, one class that sort of hacks their way through life, and another that writes compilers for fun.  I guess I am trying to blunder my way into the latter class.  At first, it didn’t seem to be so difficult, but eventually it has gotten so hard that sometimes I feel like just dropping the whole thing entirely.  However, when I feel like bashing my head into the desk, and / or throwing my MacBook Pro out a window, I know I am on the right track.  That is how I felt when I was trying to learn Java and Objective-C a few years ago, or attempting to understand domain modeling as it pertained to Object Oriented Programming, before understanding Object Oriented Programming.

I guess these are just things that we all must get through, I also see why most CS majors don’t every actually finish their compilers.  I, however have a dream that I can develop web applications without being encumbered by any sort of connections from my handheld.

The first part that slowed me down was that I was trying to approach the executor by compiling the program to opcodes and then running them against the system, sort of like the asm function in C.  That won’t work with Dalvik, at least not in an elegant way.  Then I asked this question on StackOverflow, and the answer made me think;   I don’t think I quite got it immediately, but now I understand.  It basically comes down to a large hashmap of keys ( variable names ) and their values, ( the statements to evaluate into them ).  OK, I get that, but the new problem I am facing with the top-down recursive descent parser is how to handle classes and functions efficiently.

I realize that generally a class is just a new hashmap of values, scoped to a variable handle in the global hashmap, a function is similar in that it has its own scope that is not shared with other objects in the main hashmap.  The recursion here can make your head hurt.  The problem I am facing now is that if I have a statement ( series of tokens ) mapped to a variable, how do I handle, a ) parameters, and b ) not re-interpreting the tokens each time the class is instantiated.  Another issue is how to keep the scope of a single instance different than another instance, unless there is a shared static variable in the hashmap between them.  I am looking at V8’s source to try to understand the concepts behind their VM.  I think that is how they boosted speed so greatly for OO based JavaScript applications.  They must only parse those tokens once.  There are all kinds of bugs I can see introducing into my fledgling interpreter by doing that.

I guess what I am understanding is why PHP has taken more than a couple years or more to get where it is today, why Ruby still has quite a few oddities to the language, and why Python basically needed a complete rewrite for version 3.  This is not easy stuff, but nothing worthwhile is.  I have never considered the possibility that this may not fully work in any practical sense.  When I do release the Android Mides, the PHP parser functionality is likely to be beta.  I did look at the code once for Quericus, but without some sort of object diagram, I can’t really understand what I am looking at.  Much of the code is mixed up with the server code so it is difficult to figure out what is doing what.  I have read a few pages of the Mak book and I’m thinking about buying it just to see how he implemented the parser, but I am not sure it will do me any good.

I may consider creating my own language for use with Mides, and then compilers that generate PHP and Ruby source code from it while I work on versions for Apache, Nginx, etc… In lots of ways I think it would be much easier, and I could offer type checking and other things from compiled languages that help build more robust code.  I would maintain the PHP documentation that is present in Mides, and naturally you could FTP up the code to your server and test it there.  But I think that would be copping out, and I don’t like to have something like this beat me.  I do however have to realize that PHP wasn’t built in a day, and that this process will not happen immediately.

I keep asking myself what would Steve Jobs do, and the answer I keep coming up with is that he would go his own way, well I’ll keep stewing on it, and hopefully the end result is that I will become the sort of programmer that Joel would want to hire, although somehow I doubt I could ever achieve that level of software engineering excellence ;-).