Friday, August 23, 2013

Advanced Topic: Fully Automated Linear Regression


In this Advanced Topic post, I discuss how you can both create data and run a statistical analysis all from within a single Myrtle script.  Teachers and course instructors may wish to do something like this when coming up with a class example for their lectures or even for generating problems and answer keys for their quizzes or exams.

Note: Requires Myrtle Version >= 1.8.13

Our goal will be to create a synthetic data set containing two variables that are linearly related according to the equation Y = 1.3 + 2*X, but also contaminated by random measurement error. We will want to not only generate the data, but also run a statistical analysis of the data.

Create a new blank procedure by clicking the new procedure button ("Create a new procedure.") on Myrtle's procedure toolbar -- it looks like a blank sheet of paper.


Next, right-click on the new procedure (Untitled) you just created and select Rename...  Rename the procedure to something more informative than Untitled like "AutoRegression" as shown in the above image.

Next, edit the procedure by double clicking on it as shown below.


Let's begin writing our script.  You will need to copy and paste (e.g. Ctrl+c and Ctrl+v) or simply type directly into the script editor the lines shown in red below.  First, we need to let the compiler know about some of the packages we will be using with a few import statements.

import com.mockturtlesolutions.snifflib.datatypes.DblMatrix;
import com.mockturtlesolutions.snifflib.stats.NormalDistribution;



Then, we create some linear data in order to mimic real data.  We'll assume for now that our data set has N = 10 observations.  The underlying linear relationship is Y = 1.3 + 2*X.  But, in order to add some realism to this "real" data, we will also perturb the Y-values with deviates from a normal distribution. We utilize DblMatrix class methods plus and times.

normdist = new NormalDistribution();
X = DblMatrix.span(0,10,10);
Y = X.times(2).plus(1.3);
deviates = normdist.random(X.getN());
Y = Y.plus(deviates);

Next, we will paste these "real" data into the current spreadsheet.
ParentPanel.pasteDblMatrixAt(X,0,0);
ParentPanel.pasteDblMatrixAt(Y,0,1);
Realize that when this script actually runs, the Myrtle function  pasteDblMatrixAt() will be pasting the X data into the first column (JAVA indices start at 0) at the first row.  Then, we assign some bookmarks to those spreadsheet data ranges.
ParentPanel.addBookmark("Xdata","Sheet1!A1:A10",true);
ParentPanel.addBookmark("Ydata","Sheet1!B1:B10",true);
Lastly, we load and run Myrtle's standard linear regression script on these data.

String proc = "com.mockturtlesolutions.LinearRegression";
Script script = ParentPanel.loadArchivedProcedure(proc);
 
Binding bind = script.getBinding();
bind.setVariable("XDATADefault","#Xdata");
bind.setVariable("YDATADefault","#Ydata");
 
script.run();
Be sure to save your edits to your AutoRegression script (Save or Ctrl+s).  Your session should now look like the following:


Finally, click on the "Run & update selected procedures" button  (has green arrow on it).  Running the script will now produce a detailed regression analysis.  Notice that the estimated slope an intercept are close, but not identical, to the "true" values in the underlying linear relationship.

Instructors may wish to experiment with different values for the sample size (N) and the magnitude of the random deviates to determine their effects on the resulting parameter estimate bias.


That's it!  But before you leave, however, you should consider archiving your AutoRegression script.  Why?  Well, if you think you ever might want to tweak or fine-tune this script or use it in the future (e.g. for generating exam or quiz problems) you should archive it.  To do this, right-click on the script's icon and select the Archive... option.  Edit the fields as you see fit and then finally click the upload button (cloud icon) as shown below.





For your convenience, the entire complete AutoRegression script listing mentioned above is reproduced below.


import com.mockturtlesolutions.snifflib.datatypes.DblMatrix;
import com.mockturtlesolutions.snifflib.stats.NormalDistribution;


////////////////////////////////////////////////////////////////
// First, we create some synthetic linear data...
////////////////////////////////////////////////////////////////


normdist = new NormalDistribution();
X = DblMatrix.span(0,10,10);
Y = X.times(2).plus(1.3);

deviates = normdist.random(X.getN());

Y = Y.plus(deviates);

////////////////////////////////////////////////////////////////
// Next, paste the data into the current spreadsheet.
////////////////////////////////////////////////////////////////

ParentPanel.pasteDblMatrixAt(X,0,0);
ParentPanel.pasteDblMatrixAt(Y,0,1);

////////////////////////////////////////////////////////////////
// Then, assign some bookmarks to the data ranges just created.
////////////////////////////////////////////////////////////////

ParentPanel.addBookmark("Xdata","Sheet1!A1:A10",true);
ParentPanel.addBookmark("Ydata","Sheet1!B1:B10",true);


////////////////////////////////////////////////////////////////
// Lastly, run Myrtle's standard linear regression script on
// these data.
////////////////////////////////////////////////////////////////

String proc = "com.mockturtlesolutions.LinearRegression";
Script script = ParentPanel.loadArchivedProcedure(proc);
Binding bind = script.getBinding();
bind.setVariable("XDATADefault","#Xdata");
bind.setVariable("YDATADefault","#Ydata");

script.run();

No comments:

Post a Comment