|
NonParametric Hypothesis Testing and Estimation ![]()
NonParametricTestThe syntax of this proc isq = NonParametricTest(data,dep,indep,test);The input data here is a NxP matrix, dep are
the dependent variables whereas indep is, when appropriate,
a group variable. The final entry, test is a string defining
which test to do. The output q is a packed vector containing
a variety of information. Here are a few, related, examples:
AddOption("Normal Scores");
prior to the call. This test is usually called the normal score test, or
van der Waerden test. We can alternatively call it by either of
q = NonParametricTest(data,dep,indep,"Normal scores test"); q = NonParametricTest(data,dep,indep,"Van der Waerden's test");Another alternative is to use so-called Savage scores, which is done similarily by
AddOption("Savage scores");
or, since this test usually is called the logrank test, by
q = NonParametricTest(data,dep,indep,"logrank test");There are however even more bells and whistles for this test. Normal scores can be computed by three different methods: Bloms, Tukeys, van der Waerdens which are called by the respective keywords Blom, Tukey, vW.
Bloms method is the default. Tied ranks are by
default given the mean rank. We can override this by assigning the low/high rank
instead by giving the appropriate word among the options. We can also rank from
highest to lowest instead of the converse, by giving the option descend.
Other variants of the Wilcoxon test can be called by their names: Mood,
Klotz and Conover. A many-sample test that is not of
Wilcoxon type is the Jonckhere-Terpstas trend test which can be used for dose-response
studies.
All many-sampled tests can be analysed stratified on other variables. These are
then specified by the global For dose-response studies within subjects we have Page's trend test, which is called by a command like q = NonParametricTest(data,dep,0,"Page's test");where dep is a sequence of variables ordered by, say, increasing
or decreasing doses. A much weeker test, for unordered variables, is Friedmann's test.
A stronger(?) test for this situation is the aligned rank test.
Another trend test is Theils test, which is essentially a non-parametric linear
regression:
q = NonParametricTest(data,dep,indep,"Theil's test");where indep is the independent variable and dep the
dependent one. The output (both printed and q) contains not only
the test P-value, but also a non-parametric estimate of the slope.
One final note. By default P-values are computed by large-sample approximations. You can override this by
AddOption("Exact");
However, the underlying method of obtaining exact P-values is combinatorial, and
actually goes through all available combinations. So this should only be done
on small samples, unless your computer is really powerful.
HodgeLehmannEstimate and WilcoxonProbOften we might append a nonparametric test with some quantative measure of effect. In Riemann we have two such available as of today:
out = HodgeLehmannEstimate(data,dep,indep);with the same input as for NonParametricTest. The output contains
the estimate and its confidence limits. The syntax for the latter is
out = WilcoxonProb(data,dep,indep);with similar input/output. In both cases pairwise comparisons are made between all groups defined in indep. Hodge-Lehmann estimate can also be
obtained when this is 0, in which case an estimate of a symmetri point of the
distribution is made.
SurvivalTestThe syntax of this proc is
{z,cov,table} = SURVIVALTEST(dataset,dep,indep,rho);
Here dataset can be either a data matrix in memory or a dataset on disk.
Variables are defined either by column numbers or variable names. The independent
variable should define a group belonging (negative number for character data). The
dependent variables in dep should be either 1,2 or 3:
rho, defines what type of test is done: If this is
a numeric number this defines a member of the Fleming-Harringtons family of tests.
The case rho = 0 corresponds to the logrank test, whereas the case
rho = 1 gives an approximation of the Peto-Peto modification of
the Wilcoxon test. If we take rho = "Wilcoxon" the Gehan-Wilcoxon
test is obtained.
We obtain stratified tests by using the global variable The output contains information from which other tests can be obtained:
{z,cov,table} = SurvivalTest(leukemia,2|3,-4,0);
produces the output
group N Obs. Exp. (O-E)^2/E ------------------------------------------- M 11 7 10.69 1.273 NonM 12 11 7.31 1.862 Chisq = 3.40 on 1 degrees of freedom, P = 0.06534for a logrank test. If we sant to construct a test for a particular linear combination a
of the deviances, we can form the appropriate test statistics from z*a
and a'z*a.
SurvivalFunction and SurvivalPercentilesThe command
{out, means} = SurvivalFunction(data,dep,keys);
estimates the survival function from data defined by the columns dep
in a data matrix data. Like for SurvivalTest
we can use 1,2 or 3 rows for dep. If keys is nonzero,
individual survival functions are estimated for each key combination.
The output
There are a number of options for Here is a simple example:
{out,means} = SurvivalFunction(leukemia,2|3,-4);
in which case column 4 of leukemia is a character vector. Part
of the output is as follows:
group time at risk events S_KM se 95% conf limits M 0 11 0 1.000 . . . M 9 11 1 0.909 0.2944 0.5390 0.9867 M 13 10 1 0.818 0.3410 0.4729 0.9512 M 18 8 1 0.716 0.3737 0.3645 0.8990 M 23 7 1 0.614 0.3907 0.2854 0.8353 M 31 5 1 0.491 0.4052 0.1802 0.7534 M 34 4 1 0.368 0.4033 0.1132 0.6570 M 48 2 1 0.184 0.3918 0.0288 0.5250 NonM 0 12 0 1.000 . . . NonM 5 12 2 0.833 0.3280 0.5235 0.9555 NonM 8 10 2 0.667 0.3689 0.3753 0.8597 NonM 12 8 1 0.583 0.3773 0.2906 0.8009 NonM 23 6 1 0.486 0.3849 0.2024 0.7297 NonM 27 5 1 0.389 0.3834 0.1421 0.6498 NonM 30 4 1 0.292 0.3724 0.0901 0.5609 NonM 33 3 1 0.194 0.3491 0.0476 0.4614 NonM 43 2 1 0.097 0.3031 0.0166 0.3489 NonM 45 1 1 0.000 . . .except that the heading is not included in the output vector. Two natural things to do with a survival function is to plot it and to compute percentiles from it. The latter is done by the command perc = SurvivalPercentiles(out,keys,p);where out is the output from SurvivalFunction. The
input keys give the key variables in out (these come
first, negative numbers for character data. Finally, p is a vector
of numbers between 0 and 1 defining percentiles.
Continuing on the previous example with the code
perc = SurvivalPercentiles(out,-1,0);
means = merge(means,-1,q[.,1 3:5],-1,0);
fmt = ("-*.*s"~8~8)|reshape("*.*lf"~10~2,2,3)|
reshape("*.*lf"~8~0,3,3);
"Group mean sem median 95% conf.limits";
"------------------------------------------------------------";
call printfm(means,0~ones(1,5),fmt);
we get the table
Group mean sem median 95% conf.limits ------------------------------------------------------------ M 52.65 19.83 31 13 . NonM 22.71 4.18 23 8 33To plot is facilitated by the proc PlotCurves, as illustrade
in the following piece of code which takes the output from above:
xy(PlotCurves(out,-1,2|9));(column 2 is time in out and column 9 is the survival function estimate).
As already mentioned there are a number of options available for how to estimate the survival function and its confidence limits.
CompeteRiskPIf we callSurvivalFunction in a competing risks situation, i.e.
when there are more than one type of event that can happen, we get the survival
distribution for one type, when all the other types are considered censored.
To compute the risk of a specific event, in an enviroment where other event
types are working we call
out = CompeteRiskP(data,dep,keys);which has the same input data as SurvivalFunction.
Comments and suggestions, please mail: Anders Källén |