C++ growing pains

Still mucking about with C++ at work. Here’s some of the stuff that bothers me lately.

Header file hell. Much like DLL hell on windows, or RPM hell (before yum), header file hell is, I suspect, quite common in C++, and is one of the main reasons it’s not my favorite language.

Typically, with C++, as you try to solve whatever problem you’re trying to solve (means: get the computer to do some useful work) you are kind of forced to do a bunch of “extra” work, figuring out how to frame the problem in terms of objects and classes, not really that big a deal. But, C++ being, (or trying to be) a strongly typed language, encourages the use of custom enumerated types (over #defines, say), That means, if you want classes to communicate with each other without converting from, say, ints to these enums, both classes have to include the header file that defines these enumerated types. Which means there’s a good chance the typedef for the enumeration cannot reside in the header definition file for either class, at least not without putting some thought into it, and moving things around a bit. Why? Because, lets say both (of a mere two) classes define some custom enumeration types, and each class needs to communicate with the other in some way using those types. In that case, if each class includes the header file of the other, and the header file contains guard #ifdefs to prevent being included twice. Well, if there is a circular dependency, things get a bit sticky and non-obvious. You’ve got to sort it out in one of a number of ways. The easiest way is probably to take these types and separate them out into their own headers. Fine. But look at the time it takes to sort all this out. In C, the params would have been ints, everybody automatically knows what an int is, and #defines would do the job pretty effortlessly, albeit with less compiler meddling to make sure you’re doin’ it right.

So, anyway, when I left work today, I was in that particular corner of header file hell, from which I will have to extricate myself tomorrow.

There are other corners of header file hell. Suppose you want to try to reuse some C++ code. If you are very lucky, the code will have been designed to be reused, and designed to have been reused in just the manner you’re wanting to reuse it. More likely, the code isn’t quite so well designed. Instead, it is within a class which is designed such that it fits in well with whatever application it’s embedded in. This embedding entails references to all sorts of other classes in the application, types, etc. As you try to extricate the code you’re interested in for reuse, you find yourself dragging along more and more header files and classes in an ever expanding network, until, in an epiphany, you realize that this approach will lead you to suck in the entire application eventually. At that point, you start cutting and pasting bits, and reworking the code to fit into your problem space.

Well, that’s the bigger of the nuisances I’ve been facing. There are some smalelr ones as well.
whoDecidedThatIdentifiersLikeThisAreEasyToRead? WasItSomeGermanGuy? iHearGermansTendToBuildCompoundWordsInThisWay. ByWhichIMeanCapitializingWordsButOtherwiseJustCrammingThemTogether. VeryReadableDon’tYouThink?

Why do C++ people like identifiers like that? Are they retarded? Brainwashed? I mean, English sentences, in general, consist mostly of uncapitalized words separated by spaces. That is the most readable way of writing text, and that’s what you will find in every instance of fiction, newspaper journalism, magazines, etc., (except possibly some very esoteric and pathological exceptions which are deviating only to make some weird point. ) The closest approximation to this for C or C++ identifiers is lowercase words separated by underscores (as spaces are not an option.)
I_mean_like_this. This_is_far_more_readable_than_the_insane_cramming_method.

So, why, oh why, are C++ programmers so fond of this crazy way of naming variables and functions? I have no idea.

Another nit. I have become used to the linux kernel way of formatting code in the last 6 or 8 years or so. This way of formatting code involves indenting code with tabs, not spaces. And tabs indicate and indent of 8 spaces, not 4, not 3, not 2. When I started on this project, I asked around, “hey, are there any coding conventions, style guidlines, etc., that I should be following? By default, I’m apt to write code formatted like what you’d see in the linux kernel, is that ok?” “Sure, we’re not religious about that,” I was told. Hmm. Well, today, I get a call. “Hey, I noticed you’re using tabs.” “yeah…. is that ok?” “well, we’d prefer not.” Oooookay. Could’ve told me that earlier. The problem with this is, my editor of choice is vi. Vi, by default, is very tab friendly, provided you agree that a tab means an 8 space indent. It has various commands that know that (e.g. “<<” will move a line over by one tabstop, “10>>” will move ten lines right by one tab stop, etc. These things are habits that are hardwired into my fingers by now.

Anyway, I converted all my C++ files to use 3 spaces in place of every tab. And, I found (via codeblog that I can add some stuff to the end of the file like this:

// Local Variables:
// c-basic-offset: 3
// indent-tabs-mode: nil
// End:
//
// vim: et sts=3 sw=3

The above teaches vim that, around these parts, there is no such thing as a tab, when you think “tab”, put 3 spaces. It even makes “<<” and “>>” work as well. So I’m glad to have found that. I don’t have to retrain my fingers. (I couldn’t retrain my fingers anyway, as there’s plenty of code I still have to work on that adheres to the “no spaces, only tabs” doctrine, so I’m doubly happy that I found a solution that automatically (once implemented) limits its application to only those files which need it, and leaves default behavior alone.)

I had another idea today, while thinking about the C++ identifiers and using capital letters vs. lowercase, and underscores (or not.) It occurred to me that both the underscores and the capital letters require one to use the shift key, which is, although easily learned, an inefficient and overly strenuous finger move, given how frequently it must be executed. I started thinking about those keyboards with the split space bar they had for awhile, the ones where half the space bar was made into a backspace key. Evidently that didn’t go over too well, as I don’t recall having seen those around lately, but they were around for about a year… maybe that was 2 years ago? I can’t remember. Anyway, it occurs to me that if you could get one of those keyboards, and, for whichever thumb you don’t normally use for space (I tend to use my right thumb exclusively for space, so for me, the left half of the space bar is unused), dedicate that to the underscore. This way, C style identifiers could be typed in very nearly the normal fashion, just use the “wrong” thumb to hit the space bar… Brilliant idea, no?

~ by scaryreasoner on July 17, 2008.

8 Responses to “C++ growing pains”

  1. About the thumb using underscores idea:
    Yeah, that could work, if you somehow manage to force everyone else to buy such keyboard too. I’ve never had any problems reading CamelCased identifiers, though I’ve never had to deal with that long ones. Anyway, after I was forced to switch to Visual Studio (after 6 years with various open source IDE’s and text editors), I have no problem with long identifiers at all because of IntelliSense. Who would’ve thought Microsoft made something that actually works :/

  2. I don’t think I understand your complaint about enumerations (possibly because I am a C++ guy). Whether it’s an enum or a set of #defines, you need to have that content published in a shared header file somewhere. If you’re doing something that requires circular header dependencies in C++, you’d get the same result in C as well, no?

    In any case, a circular dependency is one of those “code smells” that usually indicates something needs to be refactored. Maybe the set of shared constants belongs in a separate header file. Maybe the circular dependency should be broken so that one strictly uses the other; or if they are truly codependent, the boundary between them needs to be rethought or removed.

  3. The point about the enums is that functions will take the defined type as arguments, etc, so you can’t live without the enum types. (Granted, enums are rather easier to solve, being at the bottom, really just ints.)

    With #defines, the function, rather than having enums, will take, most probably ints. You can live without the #defines, or replicate them, or those you need to use.

    I suspect heavy C++ users tend to be just very accustomed to refactoring code, getting rid of these circular dependencies, and, like vi users who are used to switching between “command” and “insert” mode, don’t find this annoying, or the least bit strange.

    From the C perspective, C++ tries to solve a bunch of problems, but then introduces many new ones which have nothing to do with the problem being solved. C++ seems to have a way of encouraging you to carefully design interfaces, even when all you’re really wanting to do is hack together a simple program.

    One C++ programmer at work put it to me this way, “You cannot hack on the code.” And he was right, you couldn’t. It was like a diamond. The world’s ugliest diamond. If you wanted to add a new feature, you had to understand the entire codebase at a deep level in order to be able to understand the framework enough to be able to see how your new piece could be crafted to fit in nicely. Or else it wouldn’t fit _at all_.

    He liked this about it. I hated (and still hate) it. That particular hunk of code is, I would say, feared for it’s complexity. And it is needless complexity. Having said that, it is also clear that the guy who wrote it really knew what he was doing with the language. And, while he worked there, it was also clear that he really understood the code. The thing is, the complexity in there has pretty much NOTHING to do with the problem the code is trying to solve. It seems a direct result of writing very general, very heavily factored code. A direct result of doing exactly what all the C++ people say you’re supposed to do. It convinced me that C++ does not aid the maintainability of a codebase, but harms it.

    Your experience may be different, esp. from a Microsoft Windows perspective, coding in C is probably unthinkable, given the interface the Windows libraries present (MFC, etc.)

    On linux, that problem doesn’t exist so much. Though the legions of Windows C++ coders jumping onto the linux bandwagon these days makes me dread what things will be like.

    Well, I am not trying to convince anyone to abandon C++ in favor of C. People like what they like — often more a matter of what they’re used to than anything else — and that’s fine.

    I, at the moment, like C, and like it better than C++, and sometimes I write down some thoughts I have about it. Many other people see things differently, and this is no surprise to me.

  4. I accept your arguments about programmer culture, but I don’t buy the “just #define it yourself, you don’t need to include the header” mentality.

    For a long while, glibc defined errno as “extern int errno”. Many applications (lazily) declared that in their own source files, rathern than specifying “#include “.

    The result? A major compatibility breakage when multithreaded applications came onto the scene, and glibc changed their definition to “extern __thread int errno”. All of those applications are now broken, and require some pretty nasty system-level gymnastics in order to continue supporting (see LD_ASSUME_KERNEL for more details). Heck, if you look carefuly at your build process, you may notice that while your apps may link against /usr/lib/libc.so, they’re actually running against /lib/tls/libc.so.0 (or something to that effect). Freaky.

    Oh… I’m a vi user, too (^:

  5. Ok, fine, you don’t buy it. That you don’t buy it, and can provide a counter example doesn’t especially bother me. You’re right about errno though, extern int errno doesn’t work, of course.

    Perhaps my picking on enumerations was overreaching. In the particular case I was thinking of, it was values returned from firmware. Those values are burned into ROM. They aren’t changing, they can’t, they are hardware. Whether they’re #defines, or enums, the hardware is out there, and it’s going to return those bytes regardless of what contortions the programmer is going through to interpret them.

    Anyway, I see it more as a spectrum than a black and white, right or wrong issue. C++ programmers — at least the code from the ones I’ve encountered — seem to have a tendency to veer to one side of that spectrum. The side of over-generalization, overly complex, too generic code — or worse, code that is trying — but failing — to be overly generic, but not failing to be overly complex. I probably have a tendency to veer to the other side — make the code as general as is *realisitically* necessary — but if shortcuts can be had by making some assumptions which might conceivably, but not realistically be violated, go ahead and make those assumptions, and fix the assuming code after the day the assumptions get violated, which probably means never. This keeps you from wasting time writing overly general code which is never taken advantage of, and which is prone to making its own assumptions about how things might eventually pan out, which might well turn out to be wrong too.

    Could be it’s something to do with the type of programming I do, which tends to be (in the case of my job) rather close to the hardware, or (in the case of my hobby) with a need for speed, and a need for the act of programming itself to be fun..

    I used to be a proponent of OOP, back in the days of Turbo Pascal 5.5 — 1989 or so — but, well, I seem to have gotten over it, after running into too many situations in which the OO model wasn’t a good fit — or, fit only with the aid of contortions — otherwise known as “design patterns.” The fundamental problem with information hiding which OO advocates is that it hides information. That would be fine, if all the things hidden worked well, and correctly, and as you thought they were supposed to. In the real world, they never actually do work as you suppose they do. And blaming them for not working as they are supposed to doesn’t help your program to run any better.

    These “Design patterns” are another example of the complexity of OOP overriding the benefits. Books like “design patterns” shouldn’t be necessary.

    (Oh jeez… you’ve got me started. 🙂 )

    I’ve seen a lot of blog posts around which take this form:

    “The solution to C++’s woes are more educated programmers! If only the programmers knew the proper way to use C++….”

    Well, you know, maybe the problem is really that C++ introduces more complexity than it removes. That, in a language, is a failure, in my book, and blaming the programmers doesn’t absolve the language.

    In my experience, if there are two fairly large programs, one written in C, and one written in C++, and that is all the information I know about them, and I’m asked to bet which one has fewer bugs, I have to bet on the C program.

    Ok, I could go on and on about this, but I will stop now.

  6. Wow, I hit a vein! I don’t think there’s a solution to the complexity of C++. For someone who already has expertise in it, the language provides a huge range of flexibility – choose the speed of C (disable exceptions, RTTI, avoid dynamic polymorphism, add inline asm), or choose the abstraction of… well… a higher-level language, like Python. All in one beast-of-a-language. In that case, the complexity gives me the flexibility to decide between “developer efficiency” and “code efficiency” for any project or subcomponent of a project.

    For someone who doesn’t have that expertise already, the benefits of learning it are dubious. I do it anyways because I think it’s fun, but for many people without the expertise (such as yourself) it’s probably far better to get that flexibility by writing high-level parts in Python/Ruby/Perl/Lua/whatever and keeping performance-sensitive parts in C modules.

    Regarding design patterns – as time goes on, things that used to be code-level patterns end up as language-level constructs. They exist only to make a non-expressive language provide an expressive solution. Visitor is multiple dynamic dispatch in Lisp. Singleton is a Python module. Heck, even the vtable in C++ is considered a design pattern in C. The problem isn’t that design patterns exist (it’s always good for technical experts to get together and talk about their solutions). The problem is that design patterns are idolized by some developers, and complexity for complexity’s sake is never appropriate in real-world code.

    My personal belief is that the terminology from design patterns should *never* exist in the code – maybe in the comments, but never class/function names. Letting the pattern terminology creep into the code itself is a sign that the programmer is doing exactly what you are complaining about – adding complexity without really thinking about how it solving the problem.

  7. just stumbled onto this thread, quite interesting.
    i like C++ over C but it’s not perfect. I’ve encountered situations (e.g. low level, but also compile-time issues i.e. project management) where adhering standard C++ teaching simply gets things WRONG.
    I always prefer to think in terms of C++ being C with some extra macro abilities.

    It still bugs me that you can actually make an ABSTRACT INTERFACE for a STATICALLY LINKED ‘class’ in C, by sticking some prototypes in a header along with forward declaration, then you can change the implementation WITHOUT DOING BUILD-ALLS… and THIS CAPABILITY DOES NOT EXIST IN C++ WHEN USING CLASSES.!!! so when making additions to large projects i often find myself falling back to C style interfaces for the sake of compile times.

    Returning from the fixed world of ‘visual studio’ to open source/linux in my spare time, I’m on the lookout for tools that automate organizing/maintaining header files.

    No programmer should have to manually maintain/sync such things or think too hard about where to put a function.
    It’s a waste of time & energy that should be elimimateable with better tools 🙂

  8. Should we tell him about Hungarian notation?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: