Sunday, January 24, 2010

Having a good software design document is important

I've come to the point in my software project where it's starting to get hard to manage making tweaks to my code and I am starting to realize that it would have been nice to have a software design document to keep track of what is going on in the code. I'm getting to the point where making a tweak in one part of my code requires tweaking of other bits of code to make everything work together properly is starting to consume a lot of time. This is one reason why sometimes rewriting things from the ground up can be faster than incrementally modifying a program.

Writing a program is like writing an essay. In the case of the essay, you are writing to figure out what you want to say. Sometimes after many paragraphs, thesis', topic sentences and self reflecting discussions later, the initial premise of the essay could be completely or partially wrong, resulting in a rewrite of various sections. This is especially so for long or complicated essays and often having a outline of what you want to write can be a huge time saver-- instead of writing 2000 words to and then realizing that the logic was bad, you might have been able to write an outline in 200 words and figured that out earlier. It also helps when it comes to keeping track of what needs to be said.

In the case of a program, you are writing a set of procedures on what to do with data. So instead of having a thesis, topic sentences and discussions you are describing sets of data and the the kinds of operations that must be performed on the data. The difference between the essay and writing a program is the difference between figuring out what you want to say versus what you want to do.

Programming is an organic process

Like writing and essay to figure something out, writing a program is about figuring out what you want to do and how you're going to do it. For my current project with parsing html data from financial webpages and then performing analysis, I didn't have debugging in mind and decided that it would be OK to download data from the web, parse the HTML in memory, save the values in a table and then save it when I was done processing everything. I ran into significant problems when the program choked on bad html data and crashed. Since I wasn't saving the webpage in question, I wasn't able to easily recreate the error and I would have to manually download the webpage and then perform an analysis on my own. After that, I would restart the process and hope that I wouldn't run into another problem (which I usually did!).

What I learned from that experience was that it was important to keep a hard copy of the raw data and write a parsing module that would be able to look at bad data, write a report and continue onwards. When I first designed my program, I expected all available data to be properly formatted, which unfortunately wasn't true. The result of this ordeal resulted in me rewriting my download module from processing data on the fly (and thus saving HD space) to saving the data first and then separating the parsing module and have it access and read the HTML file from the hard drive instead.

Other problems also started to creep up, I was also incorporating data from both yahoo and google and I assumed that I would be able to get complete data from both financial sites. Unfortunately, that wasn't the case and there were many times when my analysis module would break down because I was missing data. I spent plenty of time debating what was the best solution to deal with this problem. Do I have the analysis module detect the missing data? or do I go back into the parsing module and add functionality to prevent data sets with missing data from getting put into the database? While writing code for the analysis module, it's tempting to just add some checking functionality there since it would break my train of thought... I used to build in functionality where ever I needed it without concern for organization. This would balloon into a rat's nest of functions scattered all over in the code that became problematic in figuring out what piece of code was doing what.

Walking the line between a creative and structured process

Writing code is both a creative and structured process, a point that I didn't appreciate until now. There is the design and organization aspect where one needs to decide the kinds of modules and the input-output functionality required for each module which would fall under the creative process. The structured process would be taking the outlined requirements and then translating that in to robust and tested code. It sounds easy when things are properly broken down, but it can turn into a hairy mess when one starts to realize that there were some missing things in the initial design. Without a design document, it becomes tempting to start making design decisions on the fly to get things done or if one start thinking too much it becomes pain to try and visualize the whole program and figure out how things might affect one or another part of the code.

I've been in both situations where I would an hour trying to visualize the entire program and trying to make a decision on what to do. After finally making a decision, the implementation might take only 20 minutes and I'd already be exhausted from all the thinking and still not be confident that I made the right decision. Running into road blocks like these can be a really frustrating experience.

All of this can be avoided by keeping a well maintained design document that describes the functionality of the program in it's current state where it can be used as a reference for further design revisions. Writing a design document, though, doesn't particularly feel like really coding and it is tempting to skip over it, but I realize now that keeping a decent outline of code and it's status is really helpful. I'll probably be spending my next few slots of free time creating a design document.

1 comment:

Sacha said...

It sounds like you've taught yourself three years of software engineering in three weeks. Good work!

I've always been a big fan of "do it and figure out structure later" and it seems that you're figuring out when the "later" is in terms of planning the design of whatever it is you're making.