In this
Advanced Topic post, I discuss how you can both create data
and run a statistical analysis all from within a single
Myrtle script. Teachers and course instructors may wish to do something like this when coming up with a class example for their lectures or even for generating problems and answer keys for their quizzes or exams.
Note: Requires Myrtle Version >= 1.8.13
Our goal will be to create a synthetic data set containing two variables that are linearly related according to the equation
Y = 1.3 + 2*X, but also contaminated by random measurement error. We will want to not only generate the data, but also run a statistical analysis of the data.
Create a new blank procedure by clicking the new procedure button (
"Create a new procedure.") on Myrtle's procedure toolbar -- it looks like a blank sheet of paper.
Next, right-click on the new procedure (
Untitled) you just created and select
Rename... Rename the procedure to something more informative than
Untitled like "
AutoRegression" as shown in the above image.
Next, edit the procedure by double clicking on it as shown below.
Let's begin writing our script. You will need to copy and paste (e.g.
Ctrl+c and
Ctrl+v) or simply type directly into the script editor the lines shown in red below. First, we need to let the compiler know about some of the packages we will be using with a few
import statements.
import com.mockturtlesolutions.snifflib.datatypes.DblMatrix;
import com.mockturtlesolutions.snifflib.stats.NormalDistribution;
Then, we create some linear data in order to mimic real data. We'll assume for now that our data set has
N = 10 observations. The underlying linear relationship is
Y = 1.3 + 2*X. But, in order to add some realism to this "real" data, we will also perturb the Y-values with deviates from a normal distribution. We utilize
DblMatrix class methods
plus and
times.
normdist = new NormalDistribution();
X = DblMatrix.span(0,10,10);
Y = X.times(2).plus(1.3);
deviates = normdist.random(X.getN());
Y = Y.plus(deviates);
Next, we will paste these "real" data into the current spreadsheet.
ParentPanel.pasteDblMatrixAt(X,0,0);
ParentPanel.pasteDblMatrixAt(Y,0,1);
Realize that when this script actually runs, the Myrtle function
pasteDblMatrixAt() will be pasting the X data into the first column (JAVA indices start at 0) at the first row. Then, we assign some bookmarks to those spreadsheet data ranges.
ParentPanel.addBookmark("Xdata","Sheet1!A1:A10",true);
ParentPanel.addBookmark("Ydata","Sheet1!B1:B10",true);
Lastly, we load and run Myrtle's standard linear regression script on these data.
String proc = "com.mockturtlesolutions.LinearRegression";
Script script = ParentPanel.loadArchivedProcedure(proc);
Binding bind = script.getBinding();
bind.setVariable("XDATADefault","#Xdata");
bind.setVariable("YDATADefault","#Ydata");
script.run();
Be sure to save your edits to your
AutoRegression script (
Save or
Ctrl+s). Your session should now look like the following:
Finally, click on the "Run & update selected procedures" button (has green arrow on it). Running the script will now produce a detailed regression analysis. Notice that the estimated slope an intercept are close, but not identical, to the "true" values in the underlying linear relationship.
Instructors may wish to experiment with different values for the sample size (N) and the magnitude of the random deviates to determine their effects on the resulting parameter estimate bias.
That's it! But before you leave, however, you should
consider archiving your AutoRegression script. Why? Well, if you think you ever might want to tweak or fine-tune this script or use it in the future (e.g. for generating exam or quiz problems) you should archive it. To do this, right-click on the script's icon and select the
Archive... option. Edit the fields as you see fit and then finally click the upload button (cloud icon) as shown below.
For your convenience, the entire complete
AutoRegression script listing mentioned above is reproduced below.
import com.mockturtlesolutions.snifflib.datatypes.DblMatrix;
import com.mockturtlesolutions.snifflib.stats.NormalDistribution;
////////////////////////////////////////////////////////////////
// First, we create some synthetic linear data...
////////////////////////////////////////////////////////////////
normdist = new NormalDistribution();
X = DblMatrix.span(0,10,10);
Y = X.times(2).plus(1.3);
deviates = normdist.random(X.getN());
Y = Y.plus(deviates);
////////////////////////////////////////////////////////////////
// Next, paste the data into the current spreadsheet.
////////////////////////////////////////////////////////////////
ParentPanel.pasteDblMatrixAt(X,0,0);
ParentPanel.pasteDblMatrixAt(Y,0,1);
////////////////////////////////////////////////////////////////
// Then, assign some bookmarks to the data ranges just created.
////////////////////////////////////////////////////////////////
ParentPanel.addBookmark("Xdata","Sheet1!A1:A10",true);
ParentPanel.addBookmark("Ydata","Sheet1!B1:B10",true);
////////////////////////////////////////////////////////////////
// Lastly, run Myrtle's standard linear regression script on
// these data.
////////////////////////////////////////////////////////////////
String proc = "com.mockturtlesolutions.LinearRegression";
Script script = ParentPanel.loadArchivedProcedure(proc);
Binding bind = script.getBinding();
bind.setVariable("XDATADefault","#Xdata");
bind.setVariable("YDATADefault","#Ydata");
script.run();