Back to Navigation Page

Overview of statistical estimation in Riemann



The call to a parameter estimation proc can,in meta-language form, be expressed as
   q = STATEST(dataset,dep,.....);
The inputs differ somewhat between these procs but the output is always a packed vector. In addition a dataset ds$resid is written to disk containing the data used in the estimation, fitted values and residuals (sometimes other datasets are written as well - but the user should not need to know that).

As far as has been possible the input to various STATEST procs have been standardized. Auxillary control of the procs up and above the input arguments is maintained by using options and global variables.

The output pair (q,ds$resid) of STATEST can be used by different procs to process the result further. Note that this call must be made before ds$resid has been overwritten!

STATEST procs

Go to the appropriate proc to find a more detailed description on how it works and examples on how other procs can be used to process the output.

  • Confidence intervals can be obtained by
    call ConfInterval(q,d,names);
    
    These can be based on linear approximation, on the likelihood surface and on bootstrapping. The input d is in general a linear contrast of the regression parameters in the model, but we can do better: for many cases we can construct likelihood based, or bootstrapped, confidence limits of an arbitrary function of the parameters.
  • Tests consisting of more than one linear condition by
    call LinearTest(q,d,name);
    
    Again this can be done in different ways.
  • We can compare observed and predicted values and investigate the distribution of residuals, by the call
    ResidualPlots(q);
    
  • We can identify which observations are the most influencial on the estimates by the call
    q = Diagnostics(q,d);
    
    where d most often is 0, but can be a contast matrix or, in some cases, a pointer to a function.

The following notations have universal meaning:

dataset Either a data matrix in memory or a string defining a gauss dataset on disk.
dep Either column numbers or names of variables in dataset which contains the dependent variables.
indep Either column numbers or names of variables in dataset which contains the independent variables. For regression models categorical data should in general have one parameter for each value (such variables are called class variables). For such models indep refer to variables that should not be transformed this way.
class Either column numbers or names of variables in dataset which contains class variables, i.e. variables that should be transformed into a number of indicator variables.
cfgstring defining configuration of a regression model. Using this we can model interactions between variables without actually having to contruct these variables in the dataset.
Note that class and cfg are specific to linear models. More on linear models can be found by clicking here

Back to top


Comments and suggestions, please mail: Anders Källén
Last modified: 98-09-12 9:00