Away from Home: June 2011

Thursday, June 30, 2011

What does it take to run an E-business

I've been mulling the idea of having some parts fabricated and sold online recently and got down to considering the challenges of logistics and putting together a small site to see what happens. It's more of an experiment than anything else but having sold some X-box's though E-bay before, I just thought: "how hard would it be to create an online store and be able to accept credit card payments for something."

Been looking around here and there but a cursory search indicates that one can open a merchant account for about $25/month through paypal and deal with them skimming off some of sales. Modeling selling things online is also quite fascinating and what I noticed is that sensitivity to profits is very high when margins are low. The most important thing is learning how to create a landing page and figure out how to drive traffic to a site to gague interest before starting.

Thursday, June 16, 2011

Premature Optimization is Bad

The title almost sounds dirty, but still, one shouldn't get too excited and ahead of them self when it comes to implementing programming solutions. One of the biggest challenges to programming is understanding the full scope of the problem that is dealing with. When I mean "problem," I don't mean it in a sense that something is broken, but in the sense of understanding the "specification of the system" of what one intends to build.

I am currently putting together a more advanced parser that can read text and extract values out of it from html. With the advent of content management systems (CMS), most of the data displayed on the internet follows a specific layout depending on the page. Writing parsers can be a tedious process as you need to do several things:

1.   Effectively isolate the block of text want to analyze
2.   Write a parser to target and extract the information you want to get, and
3.   Create a data structure to save that data

This process is time consuming because I would have to write additional code to parse different values and change the data structure to save everything. For every additional dependency there is in code, the probably for error goes up exponentially (a gut feeling). The less dependencies between modules there are, the less errors in code that you'll probably get. The best kind of code is code that automatically adapts itself to whatever you're doing... but I am going off on a tangent. Right... dealing with premature code optimization.

I have come up with an idea using html templates instead of writing code to parse data out of an html block. Writing procedural code requires one to think of a strategy to get at the data wanted and then codify the process in a program. That means that for every block of html, one would need to write code to get at the data. I've done that before and it can be a time consuming process (which can be made somewhat easier with the use of parsing libraries however).

Using templates it becomes pretty easy to specify the structure of the text and target certain sections of the data using keyword markers. Editing text and converting them into templates can be somewhat tedious but still far easier than writing code to extract data. The work required is just turning parts of the template into wild cards and entering keywords to certain parts to get at the data and I really like this solution, so much that I decided to write code that would semi-automatically take html text and convert it into a template. It took me over a day to try and put something together... and I realized that I still hadn't fully understand the use cases of the html templates, the possible forms of the html templates and just as important, the form of the templates can have small variations that could cause the code to not work.

So here I was, spending a god chunk of a day trying to write code to optimize a process that I hadn't fully understood yet. And I have a feeling that some of the biggest failures of software projects is a poor understanding of the use cases which may need extensive revisions to deal with unforeseen problems.

More important than making it work quickly, is that it works first. I ought to stick to that before putting effort into speeding up certain processes. Once I get this templating engine up and running, it'll be interesting to see what uses I can come up with for this code.

Wednesday, June 15, 2011

Building a web craling platform

I've been doing some web-crawling over the last few years. I started off with some really rudimentary pattern matching stuff but over the last 2~3 years I've become much more comfortable with text processing. To the point where I am becoming able to programmatically to edit chunks of text that I write. The power of really good text processing is amazing, especially when it comes to structured textual information.

I have spent an innumerable amount of time trying to figure out the best way of writing programs to parse text and I have been thinking of programmatic methods of getting the data out the information out there. But the more and more time one spends with text, patterns and strategies arise that can be used over and over again to get at the information embedded in text. What I want to be able to do is to create a simple frame work that will allow me to quickly create parsers for whatever text document I want to get at and keep a library of them so I can stream data from a variety of websites. Eventually, I might be able to make it easy enough that even non-programmers can be able to write parsers too and that may have some interesting applications.

Tuesday, June 14, 2011

Running 4 km (almost) every weekday

As a result of the commute, I've been working on getting the most out of my time. Getting home by about 8~9 pm everyday means that I don't have a lot of time to do much else. Considering that I am now making time to study for my CFA exam in December and trying to exercise everyday, that doesn't leave me with much time to do anything else after getting home.

I've noticed that I've put on a little weight, about 3 kg compared to last year. I attribute that to getting a big bag of almond chocolates from Costco and having a kitchen where I am cooking much meatier dinners as of recent. So I am needing some cardiovascular exercises to act against the recent bad diet (which I will get back into making more healthy).

I have a transfer stop on the way home where I need to change trains which I need to ride for a single stop before getting home. I've decided to take this opportunity to skip taking the transfer and jogging the rest of the distance home. I've been cycling on and off since the last year, usually doing long treks on the weekend (about 30 km) and I was quite pleasantly surprised to find that exercise on the bike transferred quite nicely to running as running 4 km. Progressing through the first weekhop of running nearly everyday, I have found that my stamina has increased and that I can push up my speed significantly. There are also daily variations on my ability to run, however I am feeling an improving trend. I should start timing my runs to see how I progress.

By the time I get home after a quick hop into the shower, I've got time to pump some iron using the weight training machine I have in my room. I also have some free weights in my room that I use from time to time and I've noticed that using free weights are great for full body workouts instead of working on isolated muscles. I still am working on trying to figure out the perfect routine to really push my muscles. It looks like I'll either have to hit a gym sometimes and talk to a trainer or consult some books.

Wednesday, June 08, 2011

Conquering Excel Macros

VBA is a terrible language. It's archaic and idiosyncratic. I recently got back into programming in VBA since yesterday to help out a friend at work to process data faster. It would take him weeks to what this program will be able to do in minutes. The only kicker is that it costs me time to write this code.

I've transitioned over to Python as my main programming language about 4 years ago and I've learned a lot from it since starting. Functional programming and the map, reduce strategy to computing data. Basically, if you can write out a function to process one block of data, it's just a matter of looping through the rest of the data array to calculate everything.

What python (and several other languages) do is obsolete the loop and make them implicit. If you have a function and you have an array, then there is way to apply the function to all the elements of an array with one line of code. None of that

let's create counter,
create the loop structure,
create an output array,
pass the data to the function,
dump the output data into the array,
and increment the counter to get to the next piece of data

process. You have a spoon and there are many buckets of ice cream, what else do you need to know? Functions and list comprehensions work just like that. In a single line of code that 6 step process is gone. Code that can comprehend lists are amazingly short and the hard thing to do it limited to writing the function.

I work in Python and I think in Python when I program. Then I come back to VBA, it doesn't have the syntax to do that. What usually took me 1 short line of code takes me 4~7 lines of longer code to do the thing. It's a waste of time and prone to error.

So basically, I've spent the last 2 days making VBA more Python like by creating the data structures to alleviate the lower level management of data array comprehension to the point where I can throw arrays around and process them in 2~3 lines of short code. It's not a perfect solution, but much better and far more manageable. Compared to the old procedural code that I used to write, managing old procedural code is a nightmare to maintain. The great thing about list comprehension is that you can worry less about the state of the program because the code becomes stateless. None of that "what was the value of the counter?" and that pizzaz.

There are times when the state of the program is important however, like for instance a user application: What screen am I on, or what settings have I changed, or am I still connected to the internet or something like that. But still, the overhead of dealing with states associated with data greatly decreases with list comprehensions (there will be cases where the data will require state changes in a function, but the overhead is greatly reduced).

At the time, instead of having multiple lines to describe a process, you can have just one which is far easier to understand because the unimportant scaffolding is hidden; the only thing showing are the important parameters and the name of the function. That is how good code is written. So farewell to the terrible looping structure, I've gotten rid of most of them and now left with the descriptive code that tells me what it does and what it operates on.

That's the beauty of higher level computing languages, in that you're able to do a lot with saying a little. There are even higher level ones where you can define your own keywords and syntax to them. A not very well known language is Lisp that allows one to do just that. I have no idea what is possible when one is able to define one's own language to suit whatever problem they are tacking with but I am quite sure it would be a very fascinating adventure. I've already seen my programming skills improve greatly by using Python. I can only imagine what else is there to learn by learning higher level languages.

Tuesday, June 07, 2011

The mind is a lot fresher after spending time away

I have found that I need to take 2 passes at a problem before getting it right and usually the second pass needs to be done after spending a day away from the task. My mind feels significantly fresher after spending time away from work and then coming back to a task. Things that I didn't notice before just jumps out at me, where as in other cases, I could look over something repeatedly and still overlook details.

I find this to be true to both writing and even programming. The first post of anything I make tends to filled with errors and even after double checking, I still tend to miss a lot of small mistakes. It's likely related to a bad habit of wanting to get something done as fast as possible and sometimes I find double-checking as a tedious task that gets in the way. One hitting the submit button, that feeling of needing to have something done as quickly as possible usually subsides and I am able to look at past work with a more critical eye.

There are times when I have been coding to find myself coming back to the code wondering to myself "what the hell was I thinking" when a much simpler solution exists. This is especially true when I make the mistake of designing a complex solution and thinking of the solution as clever.

For example, I managed to write a text parser with the feature that it could automatically detect if a text string was an integer, a floating point number, exponential or a date and automatically call the correct function to parse the data. The function was also extensible in that it could also be updated on the fly to auto-detect and parse other values.

I thought the smart thing to do was to create 2 functions, 1 to detect if a text string was of a certain type (returning true or false) and then a second function to perform the conversion and I had to make an elaborate system to keep the testing function and the parsing function paired together which I thought was kind of unweildy, but I managed to do it. A day later, I realized that I could just have 1 function that would either return a value or nothing depending if the parse failed or passed and used that as the indicator if the right function had been called.

I could have been bashing at a problem through an entire day with an inelegant solution... and I deplore inelegant solutions because why work hard on a bad solution when you could be spending time on a better one instead? Thinking like this is both a blessing and a curse because the resuts of my work varies between "really good" or "none."

Anyways, hopefully I'll be getting back to posting on a more regular basis. I have more pockets of time where I am able to think compared to before and I hope for this to continue.

Monday, June 06, 2011

Less time online and improved concentration

I've been getting back in to studying for the CFA exam and one thing I noticed is how terrible my concentration was. Spending one's time reading online articles vs reading books is quite a different experience-- a large amount of information has been accumulated into a single place instead of scattered through a variety of webpages that one would have to usually have to search through. Having a good table of contents is also a boon to immediately pinpoint where in a document that you would need to go to get the information needed.

By having a large volume of high signal-to-noise information available, it significantly cuts down on the time required to search for information (which I think of as a distraction). The resulting effect is that having good books to do research from actually helps improves one's concentration instead of being distracted by looking for information (which may or may not be relevant).

One other interesting thing I've noticed about myself is that I tend to focus way better when I am working with other smart people. I believe that having the synergy of having a few smart people around to cover for information gaps or thought gaps cuts down distractions by the need to look for information. I believe that addictiveness of, say, video games in general can be attributed to having all of the information a player might need very accessible through a very intuitive interface or by having an environment where one can immediately figure things out with simple tests.

Right now, I think that the internet isn't an ideal place to learn in depth topics through websites as either the quality isn't there, finding good quality content is simply too much work or the information that you're looking for isn't covered in significant enough depth. I think that books and other resources fill those gaps.

One project that I would like to work on is to start creating a repository/network of high signal-to-noise sources and will be looking into a variety of tools to help me do that.

Friday, June 03, 2011

Cutting down on wasted time

I've been working on cutting down wasted time recently, considering that now with the commute that I leave the home at 7:30 am and don't get home till past 8:30 pm, I have come to the realization that I have little time to myself. It is paramount that I make the best of it so I've been working on cutting down the number of distractions and time killers.

One of the best moves I've made is killing the habit of flipping on the computer immediately after getting home. I don't have a TV and have been living without one for the past 5 years without one and to take its place, the computer has become my media center. The problem is that the way I am spending time on the computer is just as bad as the way I used to spend on TV-- aimlessly doing nothing.

Instead of flipping on the computer to aimless browse at things, am becoming more stringent with my time allocation. First thing I do after getting home is either cook or exercise. I've also started making a new habit of not riding the train all the way home and getting off at 1 stop before I get home. Actually, it's a transfer stop and instead of spending time waiting for my transfer, I just hit the street and jog the rest of the 3 km home. I've managed to do the jog in about 15~16 mins and it's turning into a pretty good exercise routine.

Running this distance is actually quite significant when it comes to reducing body fat as I burn about 200 calories. One average meal for me is about 500~600 calories so I am burning off the equivalent of 33%~40% of a single meal. Assuming that I do this 3~4 times a week (ie when it doesn't rain) the reduction in the amount of calories the body absorbs is significant in addition to the health benefits of jogging.

I've also allocated about 1 hour almost every night to studying for the CFA exam (at a rate of about 20 pages/day) and have made this a prerequisite to turning on the computer. It's been about 3 years since the last course I took in university and it's rather refreshing to start studying again. One of the great things about reading study material is the much higher signal-to-noise ratio compared to reading articles on the net. The quality of information in books is far superior to what is published on the net. I've also noticed that as much as I like digitized information, I find that I like writing out my notes; there is simply a freedom in penning out notes that isn't available compared to typing out information (writing diagrams and sketching arrows is still ridiculously slow). Now that I think about it, if there was a great way to pen notes digitally and organize them, that would be awesome because I am one of those people that can easily generate volumes of notes.

I wonder if there are good note taking software where you could both type and use a stylus to sketch in other information. Like arrows, lines and even equations.

Once getting into a groove of "getting things done in rapid succession," I've found that a momentum kicks in. In the sense that I am far more likely to want to move on to the next item on my mind that needs to get done, without hesitation. Looking back at past behavior, the biggest hindrance to action is thinking about all the things that I ought to be doing and not knowing which of the things I should just simply do, because I have a tendency of worrying about "is the thing I am doing the right thing to be doing?" I've moved on to killing that by coming to the realization that time is scarce and doing something is better doing nothing. I've become far more effecting at using my time when getting at home, far more compare to when I was living closer to work and I find that ironic.

A change in environment is a good thing.

Thursday, June 02, 2011

Working on my first patent

I've been charged with the design of some new optical systems based on geometry and varied materials to improve the brightness of light emitting devices. One of the great things about doing simulation work is that I can create 3D models, change materials and configurations faster than it would take the people fabricating the devices to create and test the devices built, because semiconductor fabrication is tough work.

Meaning that I can likely iterate through far more designs and variations than the processing people can go through in the same amount of time, probably by factor of x5 ~ x10 easy.

As a result of the work I am currently doing, I've been able to write a library that allows me to quickly generate 3D models, simulate them and analize the data faster than the guy that taught me to use the software (and has about 2 years up on me in experience using the simulation program). We tried to bring in another guy from a different department to do simulation work, but after 6 months of work, he eventually burned out... but I digress...

Now after getting out of the clean-room and having reasonable amounts of time to spend thinking and analyzing information, I have already found ways of increading output of our devices between 20%~50% with out standard models and I have identified a possible way of pushing that up to the 60%-90% range with some designs.While I am at it, I figured that it would be very cool to put out a patent to have something under my name.

Applying for patents is an expensive process with fees for the application and additional fees to maintain the patent over it's period. As an individual, it might be prohivitive to do this kind of stuff, but if you're working for a large corporation that can do this, why not?

Unfortunately, the patent review process takes approximately 3 years to go through, but I figured that the earlier I get started, the better. I just need to get it through the internal review process and find time to talk to the legal department here... this is going to be interesting.