Corey Roth and Friends Blogs

Group site for developer blogs dealing with (usually) Ionic, .NET, SharePoint, Office 365, Mobile Development, and other Microsoft products, as well as some discussion of general programming related concepts.

Not Necessarily Dot Net

  • Which Lisp?


    Let's cut to the chase. This post is for people who, for whatever reason, have decided they want to learn lisp. Pretty much the first question that comes after that decision is "Which one?"

    Several people have tried to answer that question over the years. The answers seem to all boil down to either "Don't bother" or "It depends."

    Personally, I think that any decent programmer should be proficient in a wide variety and style of programming languages. And I don't mean C, C++, C#, and Java. Those are all practically the same language. I think that learning something in the lisp family is worthwhile for its own sake...though it is more than a little dangerous. It may make you hate whatever language you use in your day-job, because it's so much less expressive.

    Please Note: I hate religious flame wars about stupid things like which text editor or which programming language is better. This is an attempt at a balanced look at the current state of the major different versions of Lisp, as I understand it. I'm not going to pretend I'm a lisp expert--in this arena, I'm still a very humble beginner. It's just that I spent quite a bit of time researching this question, and the experts seem very reluctant to offer any opinions about this matter. I'm sure they have their reasons, and this post is probably a case of a fool rushing in where angels fear to tread.

    The Family

    Since you (somehow) made it here, I'm going to assume you've already figured out that lisp really is a family of related languages (much like the C-style languages I mentioned above, which descended from Algol). Pretty much everything I know about how it grew and evolved came from the Net, written by much better writers than I. So I'll skip the history lesson and get straight to the comparison.


    A flawless little gem. Carefully designed, shaped, and polished. This is perfect for academia and people who want to build their own...well, pretty much everything. If you want to learn how to write computer programs, or just love elegant simplicity, this is the language for you.

    If you actually want to build something useful, you either have to re-invent the wheel (over and over), or use vendor-specific extensions. Which lock you into a specific implementation fairly early-on.

    My recommendation: Racket. It has a very thorough set of extensions, and the IDE that comes with it seems pretty thorough.

    Common Lisp

    An ugly ball of mud. With warts.

    Common Lisp was designed for getting stuff done. It's all about practicality. I've read that it and Perl share that attitude. Which is more than a little ironic. From what I've been able to gather, Larry Wall hated lisp's parentheses so much that it drove him to create Perl. I figured I'd love anything he hated, and it turns out I was correct.

    This is a huge language. You probably won't begin to see the benefits of using it unless you're involved in, say, a gigantic project with shifting and under-specified requirements. Though it's also great for weekend hobbyists. Needing to track down and learn libraries for "normal" modern computer-usage (GUI and sockets are the two complaints I've seen most often...pretty much every distribution comes with a socket implementation, but you have to install a 3rd-party library to get anything portable) can overshadow the productivity gains for projects with in-between scope. But, once you have those installed, you don't have to worry about the language changing to a "new, improved, and incompatible" version (the frozen standard is a feature, much like the parentheses).

    My Recommendation

    Well, umm...It Depends. :-)

    I've been advised to start with the LispWorks Personal Edition. It's really a free trial of their Professional Edition, with a couple of limitations: there's a limit to the heap size, and you can't run the image for more than 5 hours at a time. And you can't create redistributable images with it. It also denies CLIM access (though I'm told CAPI is much nicer all around anyway). If you fork over the cash to upgrade to the Professional Edition, you still can't redistribute the compiler.

    The next-best recommendation I've run across is "Use Clozure [note the 'z': I'll get to Clojure shortly] until you know why you need a different implementation." If nothing else, its IDE (Mac-only) sounds pretty sweet. Well, compared to the alternatives.

    OTOH, SBCL is probably the most popular CL dialect. It's definitely the most popular open source version. You're a little more likely to get help on IRC if you're using it. And pretty much any library you run across is almost guaranteed to work with SBCL. The only real downside to it is that the developers have been struggling with getting it to work on Windows for years. There is an official Windows release, but, last I saw, it had issues with multi-threading.

    The only real choice for a development environment (except CCL's IDE) is Emacs with SLIME. There's something similar available for Vim, but the most positive thing I've ever seen written about it ran along the lines of "It didn't make me want to slit my wrists" (note that that's completely anecdotal: I'm happy enough with SLIME that I've never bothered trying this plugin). This morning, I ran across CUSP, which is a plug-in for Eclipse that uses SLIME's back-end. So there may be another viable option now.


    Ahh...the hundred year programming language. Odds are, pretty much everyone who winds up here will be familiar with Paul Graham's writings. If, by some crazy chance you aren't, I highly recommend them.

    For anyone who hasn't looked at it yet: Arc takes a sort of middle-road between scheme and common lisp. With a bit of syntactic sugar added where Mr. Graham believes it make sense.

    I've been told that anyone who wants to learn lisp for its own sake might as well start with Arc. You can cram in all the hard parts in one afternoon, then switch to a "real" lisp until Arc's ready to be released into the wild. It's kind of like learning Latin to make the Romance languages easy.

    OTOH, if you just want to pick a specific lisp dialect, Arc probably isn't it, for now. You probably wouldn't take the time to learn Latin if you really only cared about learning French.

    I don't have any idea what sort of development environment options are available here. My guess you're pretty much on your own.


    Clojure seems to be very similar to Arc. Except that it runs on the JVM, and exists in the "real world."

    I've run across some fairly snarky debates about whether Clojure still qualifies as a lisp, because of all the syntax (the parentheses are a feature...honest). Or about how superior it is to Common Lisp, because it's gotten rid of so many warts and has such newer, fresher batteries included. It feels almost like sibling rivalry to me. Clojure's sparking a lot of interest (much more than CL) in the real world, but CL's been around a lot longer, with a lot more collected wisdom under its wings.

    If you're going to use it, I think Emacs and SLIME are, again, your best bets.

    My recommendation: I think Clojure's extremely interesting, if still immature. The community members I run across tend to act like puffed-up bantam roosters with a grudge and the need to prove themselves. I plan on checking back in about 5 years, after some of the rough edges have had a chance to wear away.

    Edit: One of the huge advantages touted for Clojure is that it runs on the JVM. Which is going to be much more heavily tested and battle-scarred than the VM under any non-JVM-based lisp. And it grants access to all the Java libraries. It seems worth keeping in mind. OTOH...ABCL has these exact same advantages.

    The Rest

    There are all sorts of other family members who deserve mentioning. Hedgehog Lisp, PicoLisp, and newLISP were the first 3 that sprang to mind. Personally, I learned just enough about them to decide they don't fit my current requirements. It wouldn't be fair for me to have any opinion about them beyond that, much less to express such in public.

    And then there's javascript. I've seen raging debates about whether or not it's part of the Family. It has the same flexible outlook on life, but the syntax is all wrong. It's kind of like a love-child between Algol and Lisp.

  • Why FOSS is Better

    You really don't want to read the train of thought that led up to this post. Even if I really remembered it clearly. Let's just something that it's been bubbling around in the back of my head for the past few weeks, and leave it at that.

    The main advantage of OSS has always been clear and obvious. It's what prompted RMS to start the movement in the first place. IIRC, he had a printer that quit working. The manufacturer had decided to quit supporting it. He couldn't get the source code for the driver to fix it himself.

    This was about the time that commercial software was starting to peek out from under its hidden rock and ooze into the Real World.

    It was an alarming change in the software ecosystem, and I can understand and sympathize with his reaction.

    I think he probably went overboard with that, but I can't really complain about the results.

    Don't get me wrong. I can't honestly complain about the alternative, either. Closed source proprietary software has paid the bills for most of my programming career.

    And there's a lot to be said about companies who can afford to hire great programmers then spend millions letting them do nothing but research that might or might not net them any benefit.

    At the same time, those companies don't really produce great software.

    Sure, a few do. Apple and Google spring to mind. Every few years, Microsoft takes enough lumps in some area that they throw a ton of money at some particular problem and conquer the market long enough for them to rest on their laurels (the mixed reviews I'm reading about IE 9 suggest it just might be one of those cases).

    And, let's be honest. Usability counts for a lot. But not all.

    Emacs is [arguably] a much better and more powerful text editor than Visual Studio. But Visual Studio takes the entire developer experience up another few notches and allows average programmers to compete with the really good ones without being forced to take the time to learn elisp. And the vast majority of emacs users aren't all that interested in commercial software, so it isn't even anything that resembles meaningful competition.

    Most GNU tools probably fit into that same sort of category. Hard to use, much less understand, and created by programmers. I'm OK with The GIMP, and I can manage my [very limited] image manipulation needs with it. Actually, I can do some really cool things with it that lots of experts can't manage in Photoshop. But those "really cool things" are mostly stuff that I, as a software developer, consider "really cool things." As opposed to, say, the "really cool things" that an artist would actually want to do.

    But that's just the surface.

    The "real code" lies under the covers.

    Sure, groups like Canonical have [reportedly] hired User Experience experts to make life simple and easy for end-users. In an effort to make the Linux Desktop mainstream, or some such. It seems to me they've done a decent job (For Some Definition Of "decent).

    But the real gold there is the shared code.

    I having a hard time thinking of *any* better way to actually learn an API than learning the implementation source code (and stepping through it with a debugger). I can't think of anyone, off-hand, who actually wants to do that. But abstractions leak and any non-trivial program has bugs. So we don't have any meaningful choice other than learning the way the undocumented features actually work.

    I'm going to take a random stab in the dark and guess that this has been the case since programmers realized that they'd be spending a lot of time debugging.

    I'm in the process of learning a "new" programming language [naming no names...this is all theoretical]. Meaning I'm spending a lot of time lurking on Usenet and IRC, and a lot more time studying ancient texts from back when this language was mainstream. Call it a masochistic exercise in self-improvement.

    The experts regularly call the newbie questioners onto the carpet, then dissect their code and show them how it could be better.

    I doubt anyone would argue that code reviews are a bad thing, even if they're painful.

    Sure, this sort of internship is probably available at a few of the major development companies. But isn't going to have time for this sort of support. They want to plug the new guy into his cube and have him cranking out useful code as fast as possible.

    I've run across that attitude more than a few times.

    It doesn't matter to management if your code is elegant or maintainable. As long as they can kick the next version out the door ASAP, it's "Good Enough."

    And *that* is the real reason FOSS is better.

    There's no real time pressure. Sure, if there are regularly scheduled agile-style releases, there's a little. But it isn't going to impact the company's bottom line if you decide that Feature A isn't ready to ship. And you're free to go to whatever 'Net source you trust to get advice about ways to improve your implementation of that feature.

    Oh, and, as an added bonus, there's a goldmine of existing example code to show you how this feature *should* be implemented.

    You even get to look at the way it was implemented 10 years ago and see how the implementation has evolved and (hopefully) improved since then. [Again, naming no names].

    Several of my friends today spend a lot of their time inside the Reflector for .NET. How much easier would their lives be if they could just look at the source code instead?

    I'm tempted to finish up with some random old-school comment about debugging in dasm and getting off my lawn, but they've been working in the field at least as long as I have.

    Maybe that makes me one of the "old programmers" who just can't hang out with the cool kids doing sharepoint anymore. Or maybe it's a sign that I've quit being so arrogant that I've finally really those hippies back in the 60's had something to teach me.

  • Simple NoSQL (resolved...I think)

    For all you avid readers who have been waiting with bated breath (I'm sure there are at least 2 of you on the planet): I found a resolution to my recent post about NoSQL options under Common Lisp.

    The project I was looking for seems to be Rucksack. I completely dismissed it at first, because the documentation implies that it's so immature it isn't worth messing with.

    That documentation was written 4 years ago.

    According to one member on the mailing list (who very well may be the only person on the planet actually using it...welcome to the common lisp "community"), he's been using it for over a year with no down-time or data loss. He's had to patch it to deal with read-locks, but he's more than willing to share that code. You should be able to find the thread on the mailing list archives from the past couple of days. (Yeah, I'm feeling a little rushed, or I'd post that link for you too).

    Anyway. I'm going to give it a try and see how it works. So far, it's looking like exactly what I was looking for.

  • Simple NoSQL

    I'm nibbling around the edges of kicking off a new project. Still doing the research and due diligence parts, but it's starting to solidify enough that I'm more or less down to picking out specific tools to at least start actually planning (how much planning winds up happening up front depends on a lot of different this point, I'm still not sure whether this will wind up being commercial, open source, or a beautiful combination of the two).

    Whichever way I wind up going, two of the most important considerations are "cheap" and "simple."

    One of my first tentative steps involved choosing a persistence engine. I started out dithering between postgresql and firebird. MySQL dropped off my radar when Oracle started messing with it and Java. As far as open source RDBMSs go, I've honestly always preferred firebird, if only because its feature list is jaw-dropping (in case you haven't realized this yet, I'm far from having any meaningful qualifications as a DBA. I can sling me some SQL, but that's about it). But postgresql is so ubiquitous that I wound up installing it first to kick its tires yet again (I haven't actually messed with any RDBMS in ages).

    Then I logged into the console to start defining schema, and realized it was completely and totally the wrong tool for the job.

    This project is really a total experiment in exploratory programming. The requirements are nothing but a bunch of vague ideas swirling around inside my mind. Up-front planning and defining database schema just do not fit at all, yet.

    Maybe I've been corrupted by all the time I spent mucking around with Google App Engine and its interface over BigTable. All you do there is define classes that inherit from Model, create instances of them, then save them. Query, delete, and update as you like. Change Model properties (AKA column definitions) on the fly. The only real restrictions are keeping queries simple and pretty much forgetting about normalization.

    Yeah, it's completely and totally a different mind-set from the RDBMS approach.

    I may very well wind up hosting the web server portion on GAE. Like everything else, it depends on a lot of different factors. But I'll really need some sort of client-side persistence layer. If nothing else, different aspects need to work when the client isn't connected to the internet.

    I vaguely remembered a project I ran across a few years ago, called CouchDB. It was all about simplicity, and seemed to fit well with my requirements. So I looked it up. And that led to a couple of weeks worth of [spare time] research into the different NoSQL (what a horribly misleading name...the "No" is reportedly an acronym for "Not Only") offerings.

    Project Voldemort is extremely tempting. If only it weren't written in Java (that's one of the few decisions that I've actually been made so Java! Well, unless something drastic pops up and convinces me that Clojure is mature/stable enough for my needs).

    Cassandra probably deserved more attention than she got from me. But it seemed like no one's giving her any respect/attention. Hmm...googling directly turned up a ton of recent results. Shows just how flawed/spotty my research has been. Doesn't really most of the NoSQL offerings, Cassandra seems focused on "Big Data." I'm looking for a simple programming model that makes it easy for a project to evolve.

    So I wound up trying to pick and choose between CouchDB and MongoDB. Tons of comparisons have been written between the two. They basically seem to boil down to "Querying CouchDB is weird, because you pretty much have to come to grips with Map/Reduce. MongoDB can trash your database if, say, the power fails." I'm fine with Map/Reduce, and I take an extremely dim view of losing data.

    So I got set to install CouchDB. Downloaded it and read the Linux installation instructions. No freakin' way. This thing has more dependencies than that hooker who offered me a $7 blowjob. Out of curiousity, I read the Windows installation instructions. Cygwin, curl (but *not* the version that comes with cygwin), VS 2008...for this project, I have to care about Windows. Which totally destroyed CouchDB as an option.

    So I took a step back and tried to get a look at the bigger picture. All the buzz around NoSQL, exobyte-scale databases, high performance, sharding, replication...none of that really mattered all that much. Well, replication is pretty important (another of the huge points that really keeps MongoDB from being feasible here). But none of the rest really matter at this point. If and when they do, it'll be a good problem to have.

    Right now, I really just want some stupid-simple data persistence layer that lets me modify my data models arbitrarily, without having to update every other model of the same "kind" or in the same "table" in the database. Preferably that doesn't involve starting some sort of server. I hated the years I spent using Access as the database backend, but that was pretty much exactly the sort of programming model I want now. Not Access's front-end GUI pieces. Just a single database stashed in a file that I can manipulate programmatically using some sort of simple library.

    I'd pretty much decided to just roll my own and started thinking about implementation details. How hard can it be? I'm not worried about big data sets at this point. Just have a file that I can keep in memory and save to disk periodically. If the data size gets noticeable, and memory even looks like it might start being an issue, I can always break it into a B-tree. And then optionally add indexes for something like CouchDB's Map/Reduce to create views. Then there's replication to consider. And avoiding data corruption in case there's a crash. What about cases where the data is more suitable for a hash table than a B-tree? Or a crazy self-balancing self-ordered tree that I ran across a couple of gurus discussing on Usenet a few days ago (sorry about not having a link. If enough people care, I'll try to look it up. It was on comp.lang.lisp).

    Those were just the first issues that popped into my head immediately. This is, obviously, a hard problem that tons of really smart people have spent decades trying to solve. Something like flat files would work fine for this particular use case (at this point it's just a TODO list...I really am just experimenting and selecting technologies right now). But the point of investing time up-front is to pick technologies now that will make life simpler in the future.

    Common Lisp's Elephant looks very promising. Except that Quicklisp can't install it. Quicklisp has changed the way I approach and think about programming. If I wind up using common lisp on this project (right now, it's got a strong lead over the other options), Quicklisp compatibility is pretty much a requirement.

    Hmm. One of Elephant's back-end possibilities is Berkeley DB. I almost started an Access (the GUI parts) clone over Berkeley DB several years back (feel honored...I rarely admit something that embarrassing in public). It's one of the earliest "NoSQL" databases, from back before the phrase was cool. It's been around for decades. Its capabilities read like a shopping list for my requirements.

    It's owned by freaking Oracle. GAAHHH! They've pretty much always been more evil than Microsoft, and they're looking ever more like the Borg of the Open Source world.

    So I'm still pretty much exactly where I started. Simple open source data persistence options pretty much suck. The closed source/proprietary world really doesn't look that much better. Well, except for the eye I suppose the options do "look" better. But that's just on the surface.

    EDIT: I think I found the solution to evolutionary data persistence with common lisp. It isn't perfect...I'll probably swap it out for Elephant with a PostgreSQL back-end when my requirements firm up, but it seems to be a nice fit for my current requirements.

  • C++ Virtual Inheritance

    I don't know what the odds are that anyone actually having this problem will run across this on google. But maybe someone will read it and remember before-hand.

    I have an inheritance hierarchy something like

    Interface -> ABC -> C -> D.

    Each constructor explicitly calls its parent class' constructor, with the appropriate parameter.

    Originally, ABC had a default parameter to its constructor. Things just weren't working right, and I noticed that the constructors were getting called out of order. ABC was getting called first, with the default parameter, then Interface, C, and D. When I got rid of the default parameter, D (and all its siblings) quit compiling because ABC didn't have a default constructor.

    After a great deal of frustration and trying to isolate the problem, I realized that C was inheriting virtually from ABC. I think I was planning on it being a mixin with other classes that also inherit from ABC. (This project has no architecture or up-front planning. It's cowboy-coding at its finest. And C++ is the wrong language for that).

    I don't know whether this is a C++ bug or just part of the standard that I missed, and I've already wasted *way* too much time on it. Hopefully this will help someone else avoid making the same mistake.

  • Startup Weekend

    Last weekend, I heard about an event called a startup weekend.  Tonight, I'm in the middle of one.

    The idea is that a bunch of people who are interested in startups get together on a Friday evening after work.  Those of us who had them pitched ideas (no one warned me about that part) to the crowd.  Then the crowd spent half an hour networking, schoozing, and deciding which ideas were best.

    Then we spent another half hour or so picking out teams.  Then we had a little time to start fleshing out ideas before they kicked us out for the night.

    Our team headed over to a coffee shop to get our plan solid enough  to be worth sleeping on.  We finished up around midnight, then were back to work around 9 am.

    We've had a few speakers--people like IP lawyers, investment brokers, and government agencies whose sole purpose is to help small businesses.  But mostly, it's been churning out code and cranking out graphics.

    Sunday evening, we pitch to investors.

    I dropped by here to look some things up, and I figured I'd at least take enough of a break to mention it.

    If you get a chance to go to one, and you're even vaguely interested in starting your own business, I highly recommend finding one.  So far, it's a great experience, even though I'm exhausted, sore from spending so much time at my keyboard (it's hard to justify taking breaks when no one else is), and constipated.  I've met a lot of interesting people, made some great networking connections, and I've had a chance to work with a lot of very intelligent, creative, dedicated people.  Quite a few of them even have personalities.

    If nothing else, it's an eye-opener about just how vanilla corporate IT is.

    Enough of this.  Back to work.

  • Developer Reliability Metrics

    Why Reliability Metrics? 

    There's a new law being considered in the EU that would require software companies to pay for damages caused by bugs.

    A comment about halfway down the page recommends requiring specific certifications for coders working on specific kinds of projects.  Just like engineers, doctors, and lawyers.

    My initial reaction was "That's dumb." But that's because I was thinking of the way our current certification system works.  You cram some obscure material about details that you'll probably never actually need to know (and, when you do, you're just going to use google to find them anyway), take some computerized test, and pass or fail.

    How could that ever possibly show that you can create reliable software?

    But the idea dovetailed with a conversation I had last night with a former boss.  He was comparing me with a guy I replaced, who took around 20 tries to get a login screen to work.  We were both happy to agree that it doesn't matter how fast and pretty it is, or how many extra bells and whistles it has; if the core functionality doesn't work, you wasted your time creating it.

    I'm tempted to veer off and talk about how we've gotten used to exactly that, but I'll leave it for some other day.

    My question is: how can a developer prove that he's capable of writing reliable software?  For most of this article, that's all I'm going to discuss. I'm ignoring very important things like maintainability, performance, scalability, etc.  I'm focused on "Does the software do what the end-user wants?"

    The Existing Situation (as I see it)

    An organization can point to existing software to demonstrate its abilities.  A developer could show a portfolio of existing work, but that really doesn't prove anything.  There's nothing to say he/she actually wrote any of it, or even did the googling to find existing code him/herself.

    Languages and methodologies don't really matter either.  I once worked with a very old-school C programmer named Mike.  We were using C#, but he tried to avoid anything more modern than the concepts in K&R C.  He copy and pasted all over the place, because, he claimed, he didn't want to take the "extra time" it would cost him to create a base class and refactor into that.  Personally, I think he just didn't understand inheritance.

    He writes extremely reliable code (even though it was a nightmare for anyone else to try to change).

    It's mathematically possible to prove that a piece of code is correct.  It might even approach something that could be done by mere mortals, if functional languages ever become mainstream.  For now, it's just mental masturbation for academics.

    NASA has their reliability procedures neatly codified.  I've read that they spend more time working on the process of creating code than they actually do creating it.  Maybe we could get them to make those processes public and certify developers in them.  That is sort of the thing the slashdot suggestion seemed to be focused around.  Crashing planes, power plants blowing up, etc.  NASA's methodology would kill most software companies, because it takes too long, but we aren't talking about most software companies, are we? The EU legislation seems (to me) to be aimed squarely at Microsoft, hoping to bleed them for losses caused by things like bugs in Excel. 

    Maybe the government could force that sort of thing on publicly traded companies, like an extension of SOX.  But SOX requirements are already ridiculously expensive and painul.  Besides, I haven't noticed them being all that effective.

    Certification, re-thought 

    So, what about that original idea?  Certify developers the same way we do doctors and lawyers.  Someone who's already certified sponsors a developer, they go through essay exams and oral boards, and actually design and write some code.

    Of course, any code examples would be too small to prove the developer could actually write a reliable distributed transactional enterprise-level system, say.  And, just because a developer knows he/she should write unit tests and actually test changes themselves doesn't really mean they'll do so...or be allowed the time to do so by their managers.

    And it would skew the results toward people who think the same way.  I suspect the certified people would become dominated by people who favor their own pet methodology (like Mike, or maybe the people who are always chasing the latest automated testing methodology du know who I mean).

    There doesn't seem to be much reason for this sort of thing in the open source communities, but there are possibilities there, too.  If all (or even most) of the Open Office developers got this sort of certification, that would be even more reason for governments to ditch MS Office.

    Then there's the question of programming language.  Most developers aren't polyglots.  When I started programming, I latched onto the advice to "learn one new language every year."  Like Java, the .Net environment is exploding so quickly that it's now tougher to keep up with the changes in each new release than it was to learn those new languages.  Besides, many have a prejudice against time spent working in other languages, as if it doesn't count toward experience in whichever language their shop uses.  (I'm not saying a guy who started with C++ then switched to other languages for several years can jump right back into doing C++, but it isn't that hard to pick back up...that's exactly what I did for my current job.  It took about a month to knock the rust off, but those years spent working in other languages have been extremely valuable).

    For most languages, even if they don't grasp all the subtleties, most developers could read most code.  Many won't even make that much effort (I remember trying to show another developer some python...he almost gave up in disgust because he didn't know what to open the sample with--how do programmers stay employed when they don't understand that almost all code is written in a text editor--then he completely dismissed the language because I hadn't used Hungarian, and he couldn't tell what type everything was).  Anyway, if you're familiar with one language derived from Algol, you can probably pick up the basic gist, even if the coder did use variable names that offend your personal sensibilities.  But, if you start getting into more obscure languages, all bets are off.  Even though a lot of those languages were designed for reliability.

    From that perspective, maybe the certifiers shouldn't look at the code at all.  Just say "make a web service that fulfills these requirements."  When it's delivered, change the requirements drastically.  Then a third time.  You know, just like what happens in real life.

    Maybe that's too drastic, but they were talking about something like becoming a doctor or lawyer.

    If we really are going that far, maybe potential Certified Developers should also be required to serve an internship.  But what about Mike?

    Then there's education. I still run across rants in blogs about whippersnappers who call themselves software engineers, even though they have no degree.  Hey, that was my title at a couple of jobs.  Don't blame me.

    Some of the best programmers I know (any way you choose to measure it) have no college degree.  Relatively few have a CS degree...and that's usually because they've learned to write software outside the ivory towers.  A degree shows that you can put up with bureaucratic nonsense, cram for tests, and are willing to delay gratification (or just didn't have a clue what you wanted to "be" when you grew up), and you might have had some exposure to writing code.  All useful skills in the industry, but there isn't really indication that you have the mentality to write reliable software.

    Other Considerations

    Like I said at the beginning, all this has been focused strictly on reliability.  But it's been my experience that that goes hand-in-hand with maintainability.  And that goes hand-in-hand with the way someone structures their code. Blocks that aren't evenly indented don't necessarily mean the coder's careless, but if he didn't even take that much pride in his's become a big red flag for me.

    And what about security?  Mike's code did what it was supposed to, but it was incredibly insecure.  I showed him (and our boss) how easy it was to hack, but they dismissed it because our end-users weren't smart enough to figure that out.  Sigh.

    I read another recent slashdot article (which I'm too lazy to find the link for) about our air traffic control systems.  Since they connected to the Internet (whose brilliant idea was that?) they're getting hacked left and right.  If you're worried about a plane's computers locking up in mid-flight, it seems like something that could tell them to fly into each other would also be on the agenda.  I seem to recall talking with one of the developers who was involved in that fiasco, and they used C#...but that's just vaguely recalled hearsay, so don't hold me to it.  If what I recall is correct, though, he'd be right up front as one of the people getting certified as a Reliable Software Engineer.

    Besides, what about actually understanding performance?  I've had jobs where the boss' reaction to any suggestion about optimization was "We'll just add another server."  But, for the most part, performance matters.  If your website crumples when it starts getting 10 requests per second, it turns out that it really wasn't all that reliable to start with.

    Then there's that whole maintainability thing I've been ignoring.  A given piece of software might be "Reliable" in its current incarnation.  Requirements always change, eventually.

    As I understand it, the functional requirements for the air traffic control computers didn't actually change.  But the software was written in lisp, for ancient lisp machines that were wearing out.  The requirement change was "It has to run on modern hardware." I guess whoever made the decisions didn't trust any modern lisp implementations (I'd love to hear from anyone who has an inside-scoop on this), so they did a complete re-write in a more industry-standard language.

    That sort of thinking was what killed Netscape.

    So we're back to some sort of board of certified Engineers who certify Engineer wanna-be's through some sort of rigorous process.

    The Problem

    There are 2 major problems with that (that I see just now).  The first is that software development is still more art than science.  Much "Enterprise" software is developed by people with questionable skills following some sort of magical Process that lets the business types plug code monkeys into slots like factory drones on an assembly line.  This is equivalent to McDonald's making its employees do everything by The Manual.

    Many of those people may very well be entitled to the title "Software Engineer," but that seems (to me) to be missing the entire point.  Come to think of it, real engineers, lawyers, and doctors are also as much artists as scientists.

    The second major point is that these things tend to become cliques designed to exclude people with opposing views and methodologies.  Many states (well, Texas, at least) started requiring lawyers to get licenses because they didn't like certain lawyers' political leanings.  The AMA makes it really hard to become an MD, at least partially because it keeps competition low and their rates high.  They may have originally had some justification to filter out the quacks, but it turns out that MD's really don't know as much as many of us would like to believe; even the practice of bleeding, which so many of us laugh at now, was good medicine in certain cases (I highly recommend Survival of the Sickest by Dr. Sharon Moalem, for an interesting, readable, and amusing take on why medical "science" really isn't all that advanced).

    It seems a lot more cut and dried for real engineers.  Either you understand the math and physics behind what holds up a suspension bridge, or you don't.  Then again, I know a carpenter who designed his own shop because he didn't want to pay an architect.  When he finally got his plans approved through whichever committee it was, they reluctantly approved.  But they required him to drive each nail by hand, instead of using a nail gun.  Some sort of revenge because he just had the experience and eye to do things the Architect's Union (or whoever) spent years in school learning how to approximate?

    Craft vs Art and Science

    Anyway.  Software "engineering" isn't an old enough discipline to be anywhere close to a "real" engineering discipline.  As I understand it, real engineers know that, if they build things in a certain way, they will have specific effects.  Tall buildings are designed to sway in the wind, with massive counterweights to reduce the sway so the building can remain upright.  Dams are designed to get stronger, the more water that builds up behind them. The oil in your car engine somehow gets thicker and more slippery as it warms up.

    This sort of thing is starting to emerge as more and more of us start to turn to "The Cloud."  But we're a long ways away from being able to dissect a section of code, analyze its O(whatever) factor mathematically, much less calculate how any given change impacts that actual user requirements (assuming the requirements gatherers were even remotely competent). Automated testing helps with that last, but that still feels more like just piling up more sticks in front of the dam, instead of just designing it right in the first place.

    Ironically enough, software development was, arguably, approaching the status due an "engineering discipline" somewhere back in the stone ages when people were still crafting hand-optimized assembler routines for certain pieces of code where hardware was still slow enough that that level of optimization was still justified.  Even then, the architects/engineers pretty much never factored in major systems failures.  Like "What happens if someone unplugs the power cord in the middle of this transaction?" or "What if the user pulls the disk out of the drive while I'm writing to it?"

    These days, we don't have to worry about that sort of thing.  Whichever black box we're using throws an exception, and we let the user know that something went wrong.  Or the database server rolls the transaction back and forgets it ever happened, like it was some girl who puked all over your car and passed out while you were taking her back to your place from the bar.

    These days, most of us leave the "engineering" mainly up to Microsoft (or FaceBook, Amazon, or wherever).  Let them deal with that stuff while we get on with actually doing our jobs.  Like quilters (they even use that term for Azure), we pick the pieces we want to use and stitch them together.  We aren't even close to the engineering level.

    Back Full Circle

    So what about those people who are working at that level?  Should they get some sort of certification?

    They probably all have tons of certifications already, along with Masters' Degrees or PhD's from prestigious universities.  What more does anyone want?

    On the other hand, software development is sort of the epitome of the scientific method.  "If I do this, I think that will happen.  Let's see..."  And experience develops it into something that resembles engineering "When I do this, I know that will happen," even if it's very far from being mathematically rigorous (don't get me started on math).  But there's also a very strong touch of art "Well, ***.  This didn't work.  What happens if I do that?"

    I dunno.  Usually theoretical posts like this help me iron out my thoughts, but this one has left me with more questions than answers.  I'd love opinions from anyone who actually bothered to read this full thing.

  • Broken has_key on GAE, using Cheetah Templates

    The GAE developers have done something that seems incredibly stupid to me, but it probably won't seem to matter that much to anyone who hasn't been using Python for a while.

    They've defined a has_key() method on their ORM's Model class.  What it really does is check to see if the Model instance has been saved and (thus) has a completed primary key.

    No big deal, right?  Except the Law of Unintended Consequences shows up to bite end-users in the ass.

    has_key() has an extremely long history in the Python community.  It's used to see if a dictionary instance has a specific key.  You pass in the key name, it returns a bool.  That functionality has been in place pretty much since the Beginning of Time.  The problem is, it's been deprecated.  The test should now be "key_name in dictionary".

    The problem really is existing library code that uses the old functionality.

    The Cheetah templating engine is where I got bit.

    They (very helpfully, to the graphics-type people who will actually be creating the templates) have put a lot of effort into letting people use objects seamlessly.  If the engine comes across, it first checks to see if a is a dictionary-style object, and then it tries to pull a value from the "foo" key.  If that fails, it tries to access the foo property of the a object.

    The problem with that approach is python's duck-typing (which is usually one of the best aspects of the language).  The only real way (until Python 3.0) to see if an object implements dictionary-style semantics is to check whether it implements a has_key() method.  If it does, call it, passing in the specified key as a parameter.

    Since GAE's models implement has_key() to take only one parameter (self), this winds up causing the following TypeError exception:

    has_key() takes exactly 1 argument (2 given)

    The second argument is the name of the key you're checking out.  For those of you who don't know, it's one of python's warts that you have to specify "self" as the first parameter to any instance method (like specifying this as the first paramater to every C# function).  That was a kludge that got added way-back-when when Guido decided this whole OOP thing might have some merit. Considering all the existing code that would have to be re-written to make it go away, this wart is probably with the language forever.  After a while, as with the warts in any other language, you get used to it and just get back to writing code productively.

    There are tons of work-arounds, but they're all ugly hacks (at least, I haven't run across one yet that didn't seem that way).  Bug #898 is dedicated to this issue.  If you've been bitten by this, (or even if you just feel my pain), please go star that (hey, all it costs you is about 30 seconds and an email when someone adds a comment). The problem is, which side is responsible for fixing the problem?

    If it weren't deprecated, I'd place the burden of change squarely in google's court (it's a quick 3-line renaming refactor, except that it breaks any existing user code that calls this method...probably not much (the method should probably have been at least protected in the first place), the service is in Beta testing, and, honestly, this is just stupid on their part).

    Since the method has been deprecated, it seems like the onus lies upon library writers who are using the old functionality.  Except there's really no other way to tell (until python 3.0, which GAE probably won't support anytime soon) whether an object implements dictionary-style semantics or not.  If you try to call "foo in bar" (the new preferred way) against something that isn't iterable, you get another TypeError exception.  You can check to see if an object's iterable *and* has a has_key() method, which would fix this special case, until Google makes the Model class iterable.

    To make matters worse, if the object's an iterable like something derived from list, and it implements has_key() there's a good chance the "foo in bar" test might succeed, but then trying to access the value by the given key will blow up...another one of python's warts. And has_key() doesn't seem to have been deprecated in python 2.5, which is the version GAE uses.

    This just feels like the tail wagging the dog, and they're both arguing which is which.

    This is also a perfect example of why it's good to work on Open Source projects.  If this were Microsoft and Telerik, my reluctant molehill hack would be an insurmountable mountain that I couldn't even see.  Come to think of it, this is why Stallman started the FOSS movement in the first place: he couldn't get access to a printer driver's source code so he could fix a bug.

  • Pyjamas on Google App Engine

    Pyjamas is one of the most interesting frameworks I've run across in a long time. 

    You write a desktop-style app (it looks a lot like something you'd do if you were lame enough to actually write WPF code instead of using the GUI to build XAML) in python, and run it through a "compiler" that converts it to browser-independent javascript. There's even a pyjamas-desktop project that lets the serious hacker neophiles run it as a desktop app without the "compilation" stage (which restores the ability to develop like you're using an interpreted language).

    Google's been doing this for a while now with Java and GWT.  But who wants to write code in Java?  You might as well be doing C# ;-)

    I won't pretend there are scads of articles throwing themselves at us over the 'Net, but there are plenty to get started (which is totally where I am...along with several members of the GAE user's group who can't figure out how to serve static files).  But there *are* tons of materials devoted to GWT.  Same engine and libraries, just translated to a different (higher level, more intuitive, easier to use) language.

    I've seen a few articles devoted to running Pyjamas-generated apps on GAE, but I haven't seen anything that really gives the basic HOWTO, nuts & bolts details (they all assume that the reader is more familiar than I).  So, without further ado, let's get the basic "Hello World" app going.

    I'm going to assume that you already have pyjamas and the GAE SDK downloaded and extracted.  Along with python, of course.

    From the GAE SDK's new_project_template directory, copy the files into the one for your have to start somewhere.

    ===Your application's source===

    This lives in some directory outside your website.  Or maybe a directory that's excluded by app.yaml.  Wherever.  Someplace that you will not be uploading to GAE.  (Just to clarify an FAQ: put this directory under some sort of source control, too).

    Run the `` script in...whichever directory's appropriate (for this, admittedly rather stupid example, pyjamas' SDK's helloworld directory).  You might have to suck it up and translate this into a .BAT file (actually, you *really* should.  Or whatever the powershell equivalent is.  Sooner or later, you're going to want to do a little more than you are now).  Run your source code through the pyjamas interpreter and get a folder that has output.

    Our goal, really, is to get the output of that to display as static content (there's some interesting possibilities revolving around hacking into that and making it dynamic...but those are *way* out of scope).  Therefore, copy the results (from the output/ directory) into your web app's /static directory (yes, you do have to create that...yet again, write a .BAT file for this so you can be lazy like the rest of us.  Better yet, make this a step in your first one.  Seriously, they aren't that scary).


    Update app.yaml.  (If you've played much with GAE, you know by now that this is where all configuration changes wind up living).

    My initial inclination was to add a section that tells it where to find static requests:

    Under [handlers], add

    - url: /static
       static_dir: static

    The server refuses to start, with this error:

    Fatal error when loading application configuration:
    mapping values are not allowed here
    in "/blah/app.yaml", line 10, column 14

     Which is, of course, as useless as those old "Error -16837" dialog boxes from Way Back When..  WTF does that actually mean?

    Just for the sake of anyone else who runs across that ridiculous error, it means that yaml's a lot pickier about white space than even make.  I have 3 spaces at the front of the static_dir line. Reduce it to 2 and the problem goes away.

    Now I can browse to (for example) http://localhost:8080/static/Hello.html and see my work of art.

    From here, it wouldn't be any big deal to add a script that handles all your other requests and redirects them (for example) to /static/index.html.  But that's beyond the scope of today's babbling.

  • cheetah templates on Google App Engine

    I've run across a few blog entries that indicate other people have managed to get this working, but I don't see any hints around (or even about) this particular problem. Maybe it's something that is just really well known for Genshi and Mako, and I just didn't make the connection.

    Anyway, installs fine, with the pure python name mapper.

    When I try to actually start using variables in my template, I get this exception:

    'module' object has no attribute 'get_suffixes'

    Traceback (most recent call last): File "blah/", line 132, in Twiddle result = self.fill_view(variables)
    File "blah/", line 109, in fill_view return str(t)
    File "blah/Cheetah/", line 982, in __str__ def __str__(self): return getattr(self, mainMethName)()
    File "blah/applications/init/views/whatever/", line 96, in respond _v = VFFSL(SL,"name",True) # '$name' on line 10, col 11
    (That's my compiled template)

    File "blah/Cheetah/", line 255, in valueFromFrameOrSearchList frame = inspect.stack()[1][0]
    File "/tmp/python.6884/usr/lib/python2.5/", line 885, in stack return getouterframes(sys._getframe(1), context)
    File "/tmp/python.6884/usr/lib/python2.5/", line 866, in getouterframes framelist.append((frame,) + getframeinfo(frame, context))
    File "/tmp/python.6884/usr/lib/python2.5/", line 837, in getframeinfo filename = getsourcefile(frame) or getfile(frame)
    File "/tmp/python.6884/usr/lib/python2.5/", line 386, in getsourcefile for suffix, mode, kind in imp.get_suffixes(): AttributeError: 'module' object has no attribute 'get_suffixes'

    get_suffixes is a method that Google doesn't allow into their sandbox.  The solution is to add a monkey patch in

    Underneath the opening document comments, there's a credits section, followed by some imports:

    import types
    from types import StringType, InstanceType, ClassType, TypeType
    from pprint import pformat
    import inspect

    You need to update the inspect module's reference to the imp module (do this just below the import):

    def get_suffixes():
        return [('.py', 'U', 1)]
    inspect.imp.get_suffixes = get_suffixes

    Actually, that's probably simple enough that it's worth putting into a lambda:

    inspect.imp.get_suffixes = lambda : [('.py', 'U', 1)]

  • Real World Dojo part Six: File Compression

    The Point

    In the last installment, I covered how to create your own custom components. This time I'm going to tackle something that should have been much less involved.

    For every page request that uses that file uploader, I wind up downloading approximately a bazillion .js files.  Reasonable caching should cut that down so it only happens once (-ish) per session.  Still, bandwidth is money.

    More importantly, my hosting environment puts rather extreme limitations on the number of files I can use.  It's a long story.

    The ideal solution would be to use a CDN.  As I discussed in Part 2 (FIXME: make that a link), that doesn't work because of Flash's cross-site scripting security.  Besides, custom components just don't work with the CDN (at least, they choke with an error to that effect every time I've tried).

    So, it's time to make my files smaller and work with fewer of them.

    Option A

    There's an Adobe AIR Toolbox for Dojo.  It includes an API reference and a wrapper over the compression tool.  The basic idea is that you specify a profile file (more about that below), the location of your Dojo source tree, and an output directory.   Whatever I'm doing is wrong.  It always freezes about 2/3 of the way through.

    If it works for you, great!  If you're like me, it's time to take it to the command line.  (Don't be such a wuss.  This is a blog for programmers).

    Merging Files

    There are all sorts of options available for this.  But the one that makes the most sense is Dojo's.  It already understands the various files involved and can handle tracking down all (well, most of) the dependencies.

    Of course, it can't work magic.  You have to specify which files you're using.  This is where that profile file (mentioned above) comes into play.

    Start out by downloading the latest source distribution (you want either the .zip or .tar.gz file, depending on your druthers, with "src" in its name) and extracting it somewhere.

    In there, under the utils\buildscripts\profiles folder, you need a profile file (mentioned above) that describes your build.

    Profile File

    You need this even if you're using the Dojo Toolbox.

    I was tempted to take an example taken straight from the Dojo Book.  But you can look that up just as easily as I can.  So let's go with something slightly more interesting.

    // Copy this into your dojoroot/utils/buildscripts/profiles directory (yes, you
    // need to get the source distro) and use it to build the compressed version of
    // dojo
    dependencies = {

        layers: [
                name: "../dijit/dijit.js",
                dependencies: [
              name: "YourNameBase.js",
              dependencies: [
                name: "YourNameUploader.js",
                dependencies: [
                layerDependencies: [

        // Note that these are relative to the output file
        prefixes: [
            [ "dijit", "../dijit" ],
            [ "dojox", "../dojox" ],
            [ "internal", "../../internal"]

    Each "layer" in that file will be built into a source file, along with all its dependencies.

    The "prefixes" array tells the "compiler" where to find files associated with various namespaces.  In this case, dijit and dojox are in the directory next to the one where the dojo code lies.  Things in the "internal" prefix are in a directory parallel to the dojo root.  These dependencies are the ones you specify directly (e.g. in dojo.require() calls).  The "magic" part of this is that you don't have to work out any of their dependencies.

    - dojo

    The builder will create a core dojo.js file by default.  This profile specifies three more files to be generated: dijit.js (which will include the basic dijits), YourNameBase.js which includes some default widgets that are used all over the place, and YourNameUploader.js, which references the file uploader we built last time.

    N.B. The list of dependencies you include in your layers is vitally important.  If you reference dependencies you don't actually use, you can wind up with ballooning files that totally defeat the purpose of this.  On the other hand, if you forget to specify a dependency, it should be pretty obvious when you test a page that requires it.  You could keep an intern very busy tweaking this thing.


    Once you have that set up to your satisfaction, open up a command line in the src\util\buildscripts folder (you do have the Command Line Here powertoy installed, don't you?) and run build.bat with the appropriate options.

    What are those options?  Like so much else in life, it depends.  In general, you want to specify which profile and how much compression to use. The link to the page in the Dojo Book that I included at the top has lots of discussion about what the various options mean in the comments.  For now, let's just stick with the basics.  Assuming your profile file was named MyProfile.profile.js, you might run

    build.bat profile=MyProfile action=release cssOptimize=comments.keepLines releaseName=SomethingMeaningful mini=true optimize=comments version=1.0.0.a

    Running it without specifying an action will give you a list of the various options available. There are various issues involved with many of these options that you should be aware of.  Particularly the xd options for building pieces designed to work cross-domain (see all the pain I've had trying to use the CDN). The comments about cssOptimize, optimize, layerOptimize, and mini are the ones you probably need to pay most attention to.  (The nutshell version is that specifying commentscomments.keepLines leaves newlines to try to keep things readable/debuggable.  packer tells the builder to run those files through Dean Edwards' packer. The mini option tells the builder to delete several files that you don't want on your production site, such as unit tests).

    The builder should run for a bit, spitting out lots of log text, then save the results in src\release\initial.  In this example, the interesting files are src\release\initial\dojo\dojo.js, src\release\initial\dojo\dojo.uncompressed.js, src\release\initial\interal\YourNameBase.js, src\release\initial\internal\YourNameBase.uncompressed.js, src\release\initial\internal\composites\YourNameUploader.js, src\release\initial\internal\composites\YourNameUploader.uncompressed.js, src\release\initial\dijit\dijit.js, src\release\initial\dijit\dijit.uncompressed.js, src\release\initial\dijit\dijit-all.js, and whichever theme you might be using (e.g. src\release\initial\dijit\themes\tundra\tundra.css).

    As you might guess, the files with "uncompressed" in their names are the versions that were mashed together, but not actually compressed.  Use them for debugging. The dijit-all.js file contains a reference to all the dijits.  (There are options to mash the entire dijit part of the tree into this file instead of just using references).

    The CSS files...some you'll need, others you won't. You can pull the basics from CDN, but several controls have their individual CSS files that you'll need.

    Using Them

    Copy the files you'll be using into the appropriate folder on your web server.  You might have others (with their dependencies) that you didn't add to any layer, because you use them so rarely. The directory structure should mimic the output from Dojo's builder.  The various documents and blog posts (with their comments) that I've found claim that you need to include the entire generated source tree.  That isn't strictly true (and would be a deal-killer in my situation).

    Change your script references to include the most-specific script you'll need, along with all its dependencies:

            <script type="text/javascript" src="/static/dojo/dojo/dojo.js"
               djConfig="parseOnLoad: true, isDebug:false"></script>
            <script type="text/javascript" scr="/static/dojo/dijit/dijit.js"></script>
            <script type="text/javascript" src="/static/dojo/internal/YourNameBase.js"></script>
            <!-- In order to load this, must load all its dependencies -->
            <script type="text/javascript" src="/static/dojo/internal/composites/YourNameUploader.js"></script>
    <2>Gotchas (yet again)
    1. Because I'm not including the dojo._base parts of the tree, the isDebug option to djConfig must now be false (you only really need to deploy dojo._base._firebug to get around this).
    2. You'll get annoying 404 errors trying to retrieve translation files in the NLS directories. You'll have to copy these files into the appropriate directories as you need them (and you will need them).
    3. The error isn't showing up client-side, but the server is complaining about a missing dojoRoot/dojo/resources/blank.gif.  It's a 43 byte file.  Might as well add it.
    4. The profile file was missing a line that specified that YourNameUploader.js depends on dojox.embed.flash.  It probably shouldn't need that, since it's going up the dependency tree.  Still, it's dojox.
    5. Yet another cryptic error message: '[Exception... "'Error: Bundle not found: validate in dijit.form , locale=en-us' when calling method: [nsIDOMEventListener::handleEvent]" nsresult: "0x8057001c (NS_ERROR_XPC_JS_THREW_JS_OBJECT)" location: "<unknown>" data: no' (Hey, it's a lot better than you get from, say, SharePoint or SSIS). 

    This originally looked like a bug the dojo developers seem to have blamed on FireFox 3: Dojo Bug #7280 (there probably is some sort of distant relation).  But I got basically the same result in IE 7. This is actually a side-effect of that 404 error mentioned above. I had to copy in the English versions of the .js files for YourNameBase.js and YourNameUploader.js. When the site goes international, the rest will probably need to be added as well.

    Fixing those errors left me with a 404 on the uploader.swf.  Client-side, the error message was "'this.flashMovie' is null or not an object" on IE and a pair of errors in FireFox: "this.flashMovie is null" and "this.flashMovie.setFileMask is not a function."

    Total Failure (sort of)

    I added the SWF and dojoRoot/dojox/embed/IE/flash.js and ran headlong into another example of the dangers of dojox.  IE (running flash 10) now switched to an "Unknown Error" when I clicked the file browse button. It works as expected in FireFox (running flash 9). Following a reboot, I ran the upgrade to flash 10.  The install said the change would not take effect until I rebooted again.  But, after that, clicking "Browse..." in firefox caused this error: '[Exception... "'Error calling method on NPObject! [plugin exception: Error in Actionscript. Use a try/catch block to find error.].' when calling method: [nsIDOMEventListener::handleEvent]" nsresult: "0x8057001e (NS_ERROR_XPC_JS_THREW_STRING)" location: "<unknown>" data: no].'

    This is a known issue with Flash 10.  svn trunk has pretty much a complete re-write of the file uploader (since 1.2.3 was released on 12/4) that deals with this, along with a change to let you use CDN, and quite a few other bugs that have been plaguing this thing.  The joys of open source and an active community!

    I'm not about to go into that, though.  Messing with dojox in the first place is silly enough (and not someone anyone should ever really consider except on projects where they just want to be on bleeding edges).  Writing about changes that aren't part of any official release would just be a waste of my time and yours.

    Besides, the point to this article was reducing your application's throughput.

    Final Note

    This should probably become a regular part of the build process on any sizable project using dojo.  Along with a script to generate the profile file and copy the built files, based upon a dependency list, as part of your continuous integration.  Don't forget to run the output through something like jsLint.

    On the other hand, like a lot of "best practices," that might be overkill at earlier stages in the project. In the beginning, you might be better off specifying a single compressed .js file that contains everything you need.  Later, you can tweak it into multiple files with layered dependencies like in the example I gave. Until you have numbers to balance requests vs bandwidth, it's just premature optimization.

    Quick Hint and Update:

    Redoing the build every time you make a change is a complete change.  Especially if you're working with other scripting languages, and you've gotten spoiled to not having to wait on the compiler.

    The trick here is to take the file(s) you're working on out of the build profile, and copy them manually into the destination directory.  The devilish detail here is remembering to move the changes back out into your source tree after you're finished.  Caveat emptor.

  • Real World Dojo part 5: Custom Components


    It turns out that the file upload piece from last time (the User Feedback article) is going to be used over and over.  And that I need to attach a drop-down to let the uploader specify what kind of license is associated with the file.  In the dot net world, I'd be tempted to slap the code into a user control and keep moving.  That won't work for me, and, anyway, it's probably the wrong way to do things where dojo's concerned.

    Dojo has its own support/library system for creating reusable components.  For background on this, check out the official Dojo book's chapters on creating your own widgets (or just skip that and read this if you're in a hurry for the nutshell version...that article's a bit out of date, and it leaves out a ton of important details). There's a lot going on here, which is why it looks as convoluted as it does.  Things get more complex when the developers make them more flexible.


    Start by adding something similar to the following in a script block in the head of your html:

    dojo.registerModulePath("internal", "../internal");

    where internal (or whatever you choose to call it) is a subdirectory of your Dojo installation parallel to the dojo, dijit, and dojox directories.

    If you try to load the page now, it tries to load the file internal/composites/pictureUploader.js and errors out. So the next step is to create that.

    The skeleton pieces look like this:



        [dijit.form._FormWidget, dijit._Templated],
            //  summary: A composite picture uploader and license selector
            // description: A control that lets you choose an image file, associate a
            // license, and then upload them. You'll have to deal with the license
            // yourself

    If you've looked at dojo modules at all, this is pretty straightforward.  In case you haven't, I'll break that down:

    What is this file "providing"?


    Which other pieces/controls do we need? This will also allow you to remove the reference from your page.


    More required bits.  These are specific to creating your own widget:


    Then define your widget's class.  The first parameter is the class name (with name spaces).  The second is the parent class and whichever mixins it gets.  The third is a hash table of class members.

        [dijit.form._FormWidget, dijit._Templated],
            //  summary: A composite picture uploader and license selector
            // description: A control that lets you choose an image file, associate a
            // license, and then upload them. You'll have to deal with the license
            // yourself

    You don't actually have to include the summary and description comments, but it's probably a good idea to stick with dojo's coding standards.

    Some Member Variables

    Inside that hash table, add some variables that let me customize a few things (I'm still hard-coding far too much, but I'm really not going for anything general-purpose here).

            // The title of the "Browse for file" button
            browse_label: "Browse...",

            // ID of the Button to replace
            upload_button_id: "chooseFile",

            // Where to post to when uploading
            upload_url: "you must specify this",

            // ID of whichever element to use to indicate the name of the file
            file_name: "fileToUpload",

            // ID of the element that tracks upload progress
            upload_progress: "uploadProgress",

            // ID of the element that shows the results
            upload_results: "uploadResult",

     And specify where the "template" file is located:

            templatePath: dojo.moduleUrl("internal.composites",

    Note that those lines are all separated by commas.  You're declaring a dictionary, not writing javascript statements.

    The Template File

    Now we need to create that template file. It's really just a matter of cutting and pasting the relevant portions from our existing form.

    <div class="InternalPictureUploader">
            <div id="${upload_button_id}" class="browse"
            id="fileName"></span><br />
            <div id="${file_name}"></div>
    <div class="InternalPictureUploaderProgress">
        <div id="${upload_progress}"></div>
        <div id="${upload_results}"></div>

    The ${} things are replaced by member variables of the UploadPicture widget.

    Add some more dojo magic stuff, to make the control more flexible (and stylable) when it's declared:

        <div class="InternalPictureUploaderFileName" dojoAttachPoint="fileName"
        <div class="InternalPictureUploaderProgress" dojoAttachPoint="progress"
        <div class="InternalPictureUploaderResults" dojoAttachPoint="result"

    (I'm not going into the why's/wherefore's of dojoAttachPoint here.  It's a huge topic, the documentation about it is pretty scanty, and I don't feel qualified to comment on it at this point).

    At this point, I had to comment out all the pieces in the head of my HTML that referred to any of these pieces.  Basically, all the code that has anything at all to do with uploading.

    Declaring Your Widget

    In your HTML, replace the pieces we've refactored into the control with a control declaration:

            <div dojoType="internal.composites.PictureUploader"></div>

    Debugging and Gotchas

    At this point, the page errors out trying to load ValidationTextBox.  The error message in the console is "too much recursion," but it happens right after it loads the FileUploader, which seems suspicious.  Besides, the error goes away when I comment out that control.

    Looking at the stack trace, the problem starts when dijit.form._FormWidget tries to call dijit._Templated.create().

    A quick google revealed that the problem came from using that old tutorial I recommended at the beginning as my basis.  dijit.form._FormWidget now mixes in dijit._Templated.  When I tried to mix it in as well, it caused Bad ThingsĀ®.

    Fixing that left me with a "node is undefined" error. The error originally seemed coming from my templating file.  When I switched to a hard-coded template string in the class members dictionary, the HTML did get rendered, but the error did not go away. Adding some logging to the relevant dojo source code revealed that the error happens when I (or dojo's internal magic, rather) try to set the id attribute of the widget.

    More specifically, it was trying to set an attributes named id and tabIndex to the value specified in my template file (or something it magically generates).  That attribute is actually trying to get attached to a DOM node associated with the 'command' focusNode.

    (Not that focusNode is not actually a command.  It's the value of the dojoAttachPoint attribute that needs to be assigned to some focusable DOM node).

    Adding that value to the file upload button in my template made the errors go away:

        <div id="${upload_button_id}" class="browse" dojoAttachPoint="focusNode"
            id="fileName"></span><br />

    Making it Do Something

    That seems like a ridiculous amount of trouble to get a single visible div that does absolutely nothing.  It's time to restore the code that does the uploading.

    Again, that's mostly a matter of cut and paste.  Cut the pieces that were commented out of the HTML and paste them into a function associated with a key named startup in the control's "class body."

            // ...all that code

    Then replace all those ID strings that we'd been hard-coding with the names of the new member variables.  (e.g.                         dojo.byId(this.fileName).innerHTML += "File to upload: " +
                      " " + Math.ceil(d.size*.001)+"kb \n";
    becomes                         dojo.byId(this.file_name).innerHTML += "File to upload: " +
                      " " + Math.ceil(d.size*.001)+"kb \n";

    Add a wrapper around the file upload control:

                upload = function(){

    And update your HTML's doUpload() method to call that:

                doUpload = function(){
                    // Actually upload the file
                    var uploader = dijit.byId("pictureUploader");

                    // And submit the metadata

    And run headfirst into a brick wall.  No matter what I tried, the button widget was returning as null when I tried to access it in my startup method.

    So I whittled away everything extraneous and took it to #dojo on IRC.  neonstalwart and slightlyoff (thanks again!) were kind enough to look at my mess and straighten me out.

    In the example that I gave them, I had my widget declared with these "parents":

    [dijit._Templated, dijit._Widget],

    which was completely backwards.  dijit._Widget is designed as a base class. dijit._Templated is a mixin that adds functionality. Trying to use it as the "actual" base class causes all sorts of nastiness. (Yes, I switched from deriving from FormWidget. This just seemed cleaner).

    Since I want widgets in my template to be expanded, I also needed to set dijit._Templated.widgetsInTemplate to true.  This isn't done by default, for performance reasons.

    Finally, using a widget's ID the way I was is considered a horrible practice.  The correct way to do this is to set a string as a dojoAttachPoint (I mentioned that thing's important, didn't I?), declare that as a member variable in my widget (defaulting to null), and just reference by name as needed:

    [dijit._Widget, dijit._Templated],
            uploadButton: null,
            uploader: null,
            widgetsInTemplate: true,

            templateString: "<div id=${id} ><div " +
            "dojoAttachPoint='focusNode, uploadButton'" +
            "class='browse' " +
            "dojoType='dijit.form.Button'>" +

            //startup: function(){
            postCreate: function(){

            var btn = this.uploadButton;
           console.debug('Upload Button: ' + btn);

    Getting Back to Making it Do Something

    Now, all of my events are wired up incorrectly. The methods are all getting bound to dojo's global context rather than my widget.Changing dojo.connect to this.connect fixes that problem.

    Also, it might not be quite as efficient (in terms of bandwidth), but it feels cleaner to me to make the event handlers into member variables rather than anonymous inline functions

    For example:

    _onChange: function(dataArray){
      // ...all the event-handling code

    And call

    this.connect(this.uploader, "onchange", "_onChange"); in postCreate()

    That is actually a shortcut for dojo.connect(this.uploader, "onChange", dojo.hitch(this, "_onChange"));. dojo.hitch() is an incredibly important function that connects methods to a specific context (or object). I've run across several occasions where things you'd expect to "just work" need dojo.hitch() because something else changed the meaning of this. (The first two that bit me were forEach() and these event wireups).  I don't know yet whether this is a dojo wart or just a javascript limitation.

    The different pictures that my end-users might upload can be associated with various Creative Commons licenses.  I added a combo box to let the user decide which license is appropriate for a given picture.  It feels odd to have something like that list hard-coded (an excuse for anyone who looks at my source), but it's not as if the different possible choices will be changing very often.

    I ran across one final gotcha when I was working on some final cleanup. I tried to seperate the "requires" in the template file by including a script block and specifying which widgets I was using there, as opposed to the ones that I referenced in the .js file.  This led to my widget silently failing to load.

    For anyone who's wants to scan over the final version, I'm attaching the source to my widget (and its template) and a test file that uses it.

  • Real World Dojo part Four: User Feedback

    So now we have a simple form that uses AJAX to upload a file and submits some metadata for the server to associate with that file.

    It doesn't really give any useful feedback, though.  No real end-user's going to read the console, I'm not actually doing anything with the file upload progress, and using an alert to show that the metadata uploaded is incredibly lame.

    Easy part first.  The response from the XHR. Add a div for feedback:

         <div id="metadataFeedback"></div>

    And let's make the response handler do something vaguely interesting:

                            if(typeof data == "error"){
                                dojo.fadeOut({node: box, duration:0}).play();
                                box.innerHTML = data;
                                dojo.fadeIn({node: box, duration:5000, delay:500}).play();

    (I did warn you that it was only vaguely interesting).

    The point to doing the fadeOut first is to avoid animation flicker while the DOM is being updated.

    Feedback about the file upload is a tad bit more involved.

    Add some divs for tracking upload progress:

        <div id="uploadProgress"></div>
        <div id="uploadResult"></div>

    So far, I've limited these articles to using events assigned in the markup.  Now we have to scrape the surface of Dojo's rich event system.  The nutshell version is that, in an addOnLoad method (or whenever else seems appropriate) you connect various named events to whatever function you want to fire when that event happens.

    For  starters, let's inform the user that we realize they've selected a file:

                dojo.connect(uploader, "onChange", function(data){
                    dojo.forEach(data, function(d){
                            dojo.byId("fileToUpload").innerHTML += "File to upload: " +
                      " " + Math.ceil(d.size*.001)+"kb \n";
                            dojo.byId("fileToUpload").innerHTML = "File to upload: " +
                                + " " + Math.ceil(d.size*.001)+"kb \n";

    Letting the user know that the upload is complete should be this simple:

                 dojo.connect(uploader, "onComplete", function(data){
                    console.log("Upload complete");
                    dojo.forEach(data, function(d){
                        dojo.byId("uploadProgress").innerHTML =;

                        // FIXME: Actually, want to display the uploaded picture
                        dojo.byId("uploadResult").innerHTML = "Finished: " + d;

    For whatever reason, that event isn't firing.  That forces me to shoehorn things in the progress event:

                dojo.connect(uploader, "onProgress", function(data){
                    dojo.byId("uploadProgress").innerHTML = "";
                    // Think the forEach is for handling multiple files
                    dojo.forEach(data, function(d){
                        var progress = "(" + d.percent + "%) " +;
                        dojo.byId("uploadProgress").innerHTML += progress;

                        var movie = uploader.flashMovie;

                        // Kludge because onComplete isn&#8217;t getting called:
                        if(d.percent == 100){
                            // Do another AJAX postback to get the URL to the image

                                url: "/beta/test/get_url?id=some_guid",
                                handleAs: "text",
                                handle: function(data, args){
                                    var box = dojo.byId("metadataFeedback");

                                    if(typeof data == "error"){
                                        // Unfortunately, w/ web2py, we never actually get here
                                        console.log("URL: " + data);
                                        var result = dojo.byId("uploadResult");
                                        result.innerHTML = '<img src="' + data + '" />'

    Obviously, your server should be handling the file upload and be able to return a URL to the newly added picture.

    Have I mentioned before that Dojo's file-handling abilities seem to leave a bit to be desired?


  • Real World Dojo part Three: AJAX

    When we finished up last time, we had an AJAX-ified form that uploads an image file.

    The problem now is that the "metadata" (the name and URL) are being completely ignored.  It's ugly, but try adding them as GET variables to the upload path:

    It seems like I should just be able to update the uploadUrl right before calling doUpload():

                    var name = dojo.byId("Name").value;
                    var url = dojo.byId("Url").value;
                    uploader.uploadUrl += "?name=" + escape(name);
                    uploader.uploadUrl += "&url=" + escape(url);

    but that doesn't work.  The SWF is given its upload URL when it's created.  The File Uploader object doesn't really have all that much interaction with it after that.

    Oh, well.  It's not like that would be a valid long-term fix anyway (the real page that this is a proof-of-concept for has too many variables to put into the GET part of the URL).

    So it's time to do the "AJAX thing."  After all, Dojo started out life as an AJAX library, right?  (Actually, I'm not at all sure of that.  They very well might have been planning a full-featured javascript library from Day One.  After all, the AJAX stuff is really just a tiny portion of what Dojo does).

    It's not like there's much to this:

                var metaDataSubmit = function(){
                        url: "/beta/test/assign_metadata",
                        form: "TheTest",
                        handleAs: "text",
                        handle: function(data, args){
                            if(typeof data == "error"){

    and add a call to that around the time you call uploader.upload();

    url is where the postback goes to.  form is the ID of the form that holds the values of interest.  handleAs is where things get interesting.  Change it to "json" and you can actually return javascript objects.  handle is the function that gets called after the postback gets a response.

    Of course, this implementation's complete nonsense.  In the real world, you need to assign some sort of ID (and some sort of security validation) to the requests so you can coordinate them.  Otherwise, how would you know which file went with which metadata?

    Since that's really server-side stuff, I'll ignore actually generating that for now.

    I feel odd writing such a short post, but that's really all there is to this.

  • Real World Dojo part Two: File Upload

    In my last post, I wrote about my research into doing client-side validation with Dojo (disclaimer, in case you haven't seen this a billion times before: this can never be trusted server-side...this is only a convenience for the client, not a security thing).

    There's a long story in that post, but the short version is that we came up with this form:

        <title>Validation Test</title>

        <link id="themeStyles" rel="stylesheet"

        <script type="text/javascript"
        djConfig="parseOnLoad: true, isDebug: true"></script>
        <script type="text/javascript">
    <body class="tundra">
        <form action="/beta/test/client_validated" method="POST"
        id="TheTest" encType="multipart/form-data" dojoType="dijit.form.Form"
        validate();" onReset="return false;">
            Name: <input dojoType="dijit.form.ValidationTextBox" type="textbox"
            name="Name" id="Name" required="true" trim="true"
            intermediateChanges="true" /><br />

            URL: <input dojoType="dijit.form.ValidationTextBox" type="textbox"
            required="true" name="Url" id="Url" /><br />

            File: <input type="file" name="File" /><br />

            <button id="Submit" dojoType="dijit.form.Button">OK
                <script type="dojo/method" event="onClick">
                <script type="dojo/method" event="startup">
                    var form = dijit.byId("TheTest");
                    // set initial state
                    this.attr("disabled", !form.isValid());
                    this.connect(form, "onValidStateChange", function(state){
                        this.attr("disabled", !state);
            <button dojoType="dijit.form.Button" type="reset">Reset
                <script type="dojo/method" event="onClick">
                    dijit.byId("Submit").attr("disabled", true);

    (Yes, you'd think the VS "Copy as HTML Plugin" would let me do all the syntax highlighting and such.  It doesn't).

    Anyway.  Aside from the fact that the Submit button does absolutely nothing, I've been totally ignoring the file input box. I could repeat countless lame-ass tutorials that show you how to post back the two fields I'm really using, but then I'd have to start over to show you how to really use the file input.  Besides, I did title this thing "Real World," and I don't want to waste anyone's time.

    For reference, I'm stealing pretty much all of this from the original announcement about the Dojo Multiple FileUpload Dijit.  I'm just trying to put that into some sort of perspective for real-world use.  i.e. Something I can come back to next year, scoop up, and slap into place.

     Uploading files with Dojo is still pretty raw.  It seems like this is something that should be pretty well tamed by now, but...everything in life's a trade-off.

     Start by declaring the file uploader:


    N.B. In case you haven't dug into Dojo at all, the various dojox pieces are experimental things they're considering adding to the main set of dojo widgets (dijits), in some future version.  The documentation is horrible, and you get frequent warnings that the API is subject to change without notice.  Use at your own risk.

    Add an event for after the page loads to wire up the actual upload pieces:

                // Only allow uploading certain file types
                var fileMask = [
                    ["Jpeg File",     "*.jpg;*.jpeg"],
                    ["GIF File",     "*.gif"],
                    ["PNG File",     "*.png"],
                    ["All Images",     "*.jpg;*.jpeg;*.gif;*.png"]
                var selectMultipleFiles = false;

                // assign a file uploader to the appropriate button
                var uploader = new dojox.form.FileUploader({
                    degrabable : true,
                    uploadUrl: "/beta/test/upload",
                    uploadOnChange: false,

                // Actually upload the data
                doUpload = function(){
                    // FIXME: This really isn&#8217;t enough. Have to deal w/ metadata

    Then change the file input to this:

            <div id="chooseFile" class="browse"
            dojoType="dijit.form.Button">Select image...</div><span
            id="fileName"></span><br />

    And hit the snag: using the CDN, there's a cross-site scripting issue with the flash object.  When you try to run that, you get an "this.flashMovie.setFileMask is not a function" error.

    No big deal.  Forget the CDN for now and extract the full dojo distribution somewhere on my site I can access the files statically.  This changes the reference line to something like:

    <script type="text/javascript" src="/static/dojo/dojo/dojo.js" djConfig="parseOnLoad: true, isDebug: true"></script>

    (The actual path depends on how you have things set up on your server, of course)

    Update the onClick declaration to:

                <script type="dojo/method" event="onClick">

    and the doUpload method gets called, but I don't see any evidence that it's contacting the server.  Time to dig deeper into the source code.

    The problem's in the FileInputFlash constructor.  For the sake of the uploader SWF, it's converting the relative path (I'm using /beta/test/upload) to the bogus absolute path http://localhost:8080/beta/test//beta/test/upload.

    It's annoying that the console isn't reporting that bogus request, but there you are.

    I've submitted a bug report and a [horrible] patch to the dojo developers, but I wouldn't count on it making it into a release anytime soon.  The workaround is to keep the upload URL in the same subdirectory and just specify the last entry in the path (i.e. uploadUrl="upload").  If you really need to specify a different subdirectory, I'll be happy to share my "fix" (it's basically just hard-coding the path, but it should work for every case that springs to my mind, and it certainly works for me).

    For anyone who's interested, I've attached the full HTML file.
More Posts Next page »
Powered by Community Server (Non-Commercial Edition), by Telligent Systems