Here are questions for the discussion session tomorrow:

Exercise 4.1.4 on page 270

I just need some direction, my problem is that I don’t know what probability distribution my test statistic has and so I can’t compute critical values. 

In the exercise it is recommended that we use a sufficient statistic to x1,…,xn that follow a Rayleigh density. 

I found that the sufficient statistic is the sum of squared xi.

Then the exercise recommend that we use the central limit theorem to compute the critical values. So, one question that I have is why can we use a sufficient statistic of the likelihood function as a test statistic for theta? Does it not make more sense to just use the maximum likelihood estimator of theta as a test statistic and use the wald estimator or something? Just what theorem am I applying if I instead use a sufficient statistic?

My second question is how do I figure out what distribution the sum of squared xi has? In order to apply the central limit theorem I need the sufficient statistic to be a sum of xi, the xi should not be squared, right? Because then I can use the mean and variance from the Rayleigh density to compute an approximate normal distribution. But I have the sum of squared xi, so I don’t understand.

HINTS:
All your questions are natural here but I cannot cover with my hints all of the issues (we can discuss them on Thursday). Here are some specific tips:

if you define y_i=x_i^2 then you can apply CLT to sum_{i=1}^n y_i. All you need is EY_i and Var Y_i. These can you find using the moments of the Rayleigh because EY_i = EX_i^2 and Var Y_i = E X_i^4 - (EX_i^2)^2.

Moreover, the moments of Rayleigh can be easily found because the square of Rayleigh is chi-square and thus a special case of gamma. One can go even further with this since sum of the squares of the Rayleigh will be gamma (with what parameter?) so maybe one does not even needs to use CLT?

Using sufficient statistics can be sometimes simpler than using MLE if the latter does not have an explicit form (it is probably the case here).

Why we choose one test over the other it may depend on the alternative and the power of a test.


Exercise 4.2.2 b) 

Manage to define LRT but then get stuck with the math, could not end up with what was asked for

HINTS:
theta/(1-theta)  is an increasing function of theta, and a^X for a>1 is increasing function of X.

The Neyman-Pearson lemma guarantees that a test L(X,theta_0,theta_1)>K is the MP. 
where  K=K(alpha) is given by the equation 
P_(theta_0)(L(X,theta_0,theta_1)>K)=alpha


Since L(X,theta_0,theta_1) is an increasing function of T(X)=2N1+N2, thus 

L(X,theta_0,theta_1)>K  for some K>0 is equivalent to T>c for some c>0


The explicit relation between K and c is not that important because it is easier to find
c  by using P_theta_0(2N1+N2>c)=alpha.

We can even use Theorem 4.3.1 and even argue that the test is UMP for H: theta < theta_0 vs. 
K= theta > theta_0


4.2.3 c) same as above

HINT:
Express the Neyman-Pearson test statistics by an increasing function of a linear combination a1*N1+...+ak*Nk then solve the normal approximation problem for

P_theta_0(a1*N1+...+ak*Nk >c)=alpha

where c is the unknown value (and a function of alpha). 

4.5.12 need some hint on how to start

HINT:
Show that the question is equivalent to saying that out of all intervals of a given width the largest probability of a t-random variable to belong to such an interval is achieved for an interval symmetric around zero. How general is this result?


4.4.14 d) don’t know how to approach it

HINT:
If (L1,U1) and (L2,U2) are two INDEPENDENT confidence intervals for two parameters theta1 and theta2 at levels 1-alpha1 and 1-alpha2, then (L1+L2,U1+U2) is the confidence interval at the level at least (1-alpha1)(1-alpha2) and for any alpha you can find alpha1 and alpha2 such the latter is equal to 1-alpha.
 

4.7.2 we are getting different results, some get chi-square (k+2t) and some get chi-square (k+t)

HINT:
From the Bayes theorem
lambda | T=t  has the density proportional to the product of the likelihood T=t | lambda * pdf of lambda 

the likelihood is proportional to e^(-n*lambda) lambda^t  
while if we assume that lambda has gamma distribution with scale beta and shape alpha, then it has density proportional to lambda^{alpha-1} e^(-lambda/beta)

Thus the density of lambda | T=t is proportional to

lambda^{t+alpha-1} e^(-lambda(1/beta +n))

which is again gamma. 

Then note that chi-square is the special case of gamma (with what parameters?)