Away from Home: Insufficient data vs a bad model

I nearly blew a gasket trying to explain to a co-worker the difference between a calculation model being incomplete and having insufficient data. I was trying to do something a little tricky which these guys have never done, calculate the fundamental parameters of a diode using measured current-voltage data.

The problem I was trying to solve was backwards from the problems that most text books give you. The text books give you the diode parameters and maybe a resistor in the circuit and you had to go and calcualate the current flowing through the circuit at some voltage.

I managed to setup the basic diode resistor equation and then startted back solving for the unknown coefficients. For those that are interested the equation is:

V = I*R + n*Vth*Ln(I/I_s + 1)

with the unknown factors being n, Is and R and you are given a set of IV data with some noise.

This problem is impossible to solve algebraically so one does need to use a numerical solver to find the most optimal values of n, Is and R to fit some modeled data to the curve. Obviously there are likely other effects that aren't in the model, such as temperature effects which may cause the resistance to change. I made the simplyifying assumption that the temperature change was insignificant enough to ignore and proceeded to do my calculations through a program I wrote.

The results I got was interesting, that I realized that we were taking insufficient data to for my simulator to make an accurate fit. For those that know diodes, diodes are non-linear devices, where the amount of current that goes through them is exponential in voltage or I = Is*e(V/Vth), resistors on the other hand, are linear devices and the current that goes through resistors is proportional to the voltage or I = I/R.

The interesting thing about having a diode and resistor in series is that the diode starts off as being the current limiting device at really low voltages (because the exponential is really small at first) then the resistor becomes the current limiting component at higher voltages. The problem was that our measurements had the current and voltage too low for us me to calculate a reasonable value for resistance, since resistance measurements becomes far more accurate because the resistor is influencing the circuit the most. For example, I modeled out a 1300 ohm resistance with a limited data set and then a 200 ohm resistance with an expanded data set.

Immediately after, everyone thought that the model was wrong since a different resistance was coming out after using a data say from 0 to 600 uA versus 0 to 2 mA. The biggest argument was that the model was flawed because I was getting different current values using different ranges of data. My head wanted to explode because I was telling them and had verified that we were getting in sufficient data.

The simplest example to illustrate this would be like the following example. Suppose that you had the following function: y = -a*x + b*x^2 + c*x^4 where a, b and c are unknown. You can measure both (x,y) and your objective is to find (a,b,c). There is some measurement noise when you get y and y is > 0

Suppose that you took data with x between 0 and 1, -a*x term is going to be the strongest and you'll probably be able to rig the equation to be always positive for some value of (a,b,c) and look about right for x between 0 and 1. But what if you took the same parameters and measured beyond [0,1] and up to [0,100]? How good would your fit be then? if your c*x^4 term had a lot of error, then the resulting graph is going to be out of whack. The (a,b,c) are going to have to be rejiggled so that the fit works well for both [0,1] and [0,100], the whole range. The values might change drastically!

And that is the problem when people try to fit limited data with higher order polynomic or non-linear functions. You could have many different functions that could describe a line segment very well, but everything falls apart when the line segment is extended, because the higher order terms were not fitted well.

I tried to explain that with all my might in Japanese. But they just didn't get it and they kept going with that the model was probably missing a few things. I even back calculated and showed reversed approximations giving a confidence interval down to 12% on the data, but alas no dice.

This is quite a problem when most people I work with are experimentalists, they just do stuff to see if things work or not. When it comes to simulation, programming or theoretical stuff, they know the basics but the real fruits are hidden in the details. Since we make some of these devices, it is important that they are characterized properly so we might have a good idea on what factors might impact the performance of these devices. Just saying that voltage is up or down isn't sufficient because you want to go after the biggest impact factors when improving things first.

4 comments:

Sacha said...: Your Japanese friends are probably not too dissimilar to what happened when the Long Term Capital Management hedge fund blew up, with the exception that they weren't handling billions of dollars.

(Wikipedia "LTCM" if you don't know what I'm referring to); 3:52 AM GMT+9
Paladiamors said...: Not really sure about the direction connection with LCTM.

Mind elaborating a little?; 1:22 AM GMT+9
Sacha said...: The guys that ran LTCM had a bad model, but they kept on using it anyway until they nearly blew up the world's financial system.; 3:08 AM GMT+9
Paladiamors said...: The problem is that I think that the model is good and that we have insufficient data.; 8:24 AM GMT+9

Monday, January 18, 2010

Insufficient data vs a bad model

4 comments: