164k views
1 vote
Suppose we want to select between two prediction models M1 and M2. We have performed 10-fold cross validation on each. The error rates obtained for M1 are 30.5, 32.2, 20.7, 20.6,31.0, 41.0, 27.7, 26.0, 21.5, 26.0. The error rates for M1 are 22.4, 14.5, 22.4, 19.6,20.7, 20.4, 22.1, 19.4, 16.2, 35.0.

Is one model significantly better than the other considering a significance level of 1%.?

1 Answer

3 votes

Answer:

Since the calculated t value does not fall in the critical region we accept H0 and conclude that one model is not significantly better than the other at 1% level of significance.

Step-by-step explanation:

When the observations from two samples are paired either naturally or by design , we find the difference between two observations of each pair. Treating the differences as a random sample from a normal population with mean μd =μ1-μ2 and unknown standard deviation σd we perform one sample t- test on them. This is called paired difference t- test.

M1 M2 Difference d²

d= (M1-M2)

30.5, 22.4, 8.1 65.61

32.2, 14.5, 17.7 313.29

20.7, 22.4, -1.7 8.3521

20.6, 19.6, 1.0 1

31.0, 20.7, 10.3 106.09

41.0, 20.4, 20.6 424.36

27.7, 22.1, 5.6 31.36

26.0, 19.4, 6.6 43.56

21.5, 16.2, 5.3 28.09

26.0. 35.0. -9.0 81.0

∑ 329.2 213.1 64.5 1102.7121

Now

d` = ∑di/n = 64.5/10= 6.45

sd² = ∑(di-d`)²/n-1= 1/n-1 [ ∑di²- (∑di)²/n]

=1/9[ 1102.7121 - (64.5)²/10 ]

=[1102.7121 - 416.025/9]

= 79.298

sd= 8.734

We state our null and alternate hypotheses as

H0 : μd= 0 and Ha: μd≠0

The significance level is set at ∝ = 0.01

The test statistic under H0 is

t= d`/ sd/√n

which has a t distribution with n-1 degrees of freedom.

The critical region is t ≥ t (0.005,9)= 3.250

Calculating t

t= 6.45 / 8.734/ √10

t = 6.45 / 8.734/3.162

t= 6.45 / 2.7621

t= 2.335

Since the calculated t value does not fall in the critical region we accept H0 and conclude that one model is not significantly better than the other at 1% level of significance.

User Calamity
by
5.3k points