STATS2107 Statistical Modelling and Inference II

For solutions, purchase a LIVE CHAT plan or contact us

Q1
This questions may be typed or hand written and scanned in as a pdf.
The purpose of this question is to show that the sample variance is an unbiased estimator
of the population variance when you have independent observations. The key point of this
question is understanding where the n − 1 comes from in the formula of the sample variance.
Let X 1 ,X 2 ,...,X n denote a sample of i.i.d. random variables with finite mean µ and finite variance σ 2 .
Consider the sample sum of squares given by:
S XX =
n
X
i=1
(X i −
¯
X) 2 .
a. Show that
E[X i X j ] =
(
µ 2 if i 6= j ,
µ 2 + σ 2 otherwise.
[3 marks]
b. Hence show
E[X i ¯ X] = µ 2 +
σ 2
n
.
1
[2 marks]
c. Show that
E[S XX ] = (n − 1)σ 2
[4 marks]
d. Hence show that the sample variance S 2 is unbiased for σ 2 .
[1 mark]
[Question total: 10]
Q2
This questions may be typed or hand written and scanned in as a pdf.
The aim of this question is to give you experience with different estimators of parameters:
you are going to find an unbiased estimator for the parameter p 2 . You will be able to practice
finding the bias and MSE of a different estimators, and use the moment generating function to
great effect.
Let
X ∼ Bin(n,p).
Consider the estimator
ˆ p 2 =
? X
n
? 2
.
(a) Find E[ˆ p 2 ], and hence state the bias b ˆ p 2 (p 2 ).
[3 marks]
(b) By considering the moment generating function for a binomial distribution
M X (t) =
? 1 − p + pe t ? n
,
show that:
• E [X] = np
• E
? X 2 ?
= n(n − 1)p 2 + np
• E
? X 3 ?
= n(n − 1)(n − 2)p 3 + 3n(n − 1)p 2 + np
• E
? X 4 ?
= n(n − 1)(n − 2)(n − 3)p 4 + 6n(n − 1)(n − 2)p 3 + 7n(n − 1)p 2 + np
[8 marks]
(c) Hence, or otherwise, calculate MSE ˆ p 2 (p 2 ).
[4 marks]
(d) Show that
[2 marks]
E
?
ˆ p(1 − ˆ p)
n − 1
?
=
p(1 − p)
n
.
(e) Using the Parts (a) and (d), find an unbiased estimator for p 2 .
[1 mark]
(f) BONUS: Calculate the MSE for your estimator in Part (e) [5 marks].
[Question total: 18]
Q3
THIS QUESTION IF FOR POSTGRADUATE STUDENTS ONLY. This questions may be
typed or hand written and scanned in as a pdf.
The purpose of this question is to investigate the properties of different types of estimators.
This is to familiarise you with different types of estimators.
Let Y 1 ,Y 2 ,Y 3 be independent Exp
? 1
θ
?
random variables with density
f(y) =
1
θ
e −
y
θ
for y > 0.
Consider the following estimators of θ:
ˆ
θ 1 = Y 1 ,
ˆ
θ 2 =
Y 1 + 2Y 2 + 3Y 3
6
,
ˆ
θ 3 =
¯
Y ,
ˆ
θ 4 = min(Y 1 ,Y 2 ,Y 3 ).
(a) Find the bias of each of these estimators.
[5 marks]
(b) Find the variance of each of these estimators.
[2 marks]
(c) Which estimator has the smallest MSE?
[3 marks]
[Question total: 10]
Q4
Please submit your answer to this question online using MyUni. You will be asked to upload
an R script file with the commands you used to complete the tasks in this question. Further
information is found on MyUni.
The purpose of this question is for you to practise your data cleaning skills that you learnt in
Practical 1. You are presented a new dataset and asked to clean it in R using the methods
covered in Practical 1.
A survey in 2003 was conducted to study the TV viewing habits of Australians. The data is available on
MyUni in an Excel spreadsheet called survey2003_dirty.csv . A description of the variables recorded are
listed below:
• Participant ID: ID for participant survey
• favourite genre: Participant’s favourite TV show genre (Action, Comedy, or Thriller)
• sleep hour: Average hours of sleep per day
• TV hour: Average hours spent per day watching TV
• height: Height (in cm) of participant
• weight: Weight (in kg) of participant
• gender: Participant’s gender
For each of the following variables:
• favourite_genre
• sleep_hr
• TV_hr
• height
• weight
• gender
Perform the following:
1. Clean each of the variables using the methods described in Practical 1. This includes:
a. Ensure each variable is the right class (i.e. numeric).
b. Make sure NA values are correctly entered.
c. Identify any values that may be incorrectly entered.
d. Where possible, recode factors to the right values.
2. For each of the variables, produce an appropriate plot to look at the data. That is:
a. Look at histograms for numeric variables.
b. Look at bar charts for categorical variables.
3. For each quantitative variable, identify whether it is unimodel or bimodel; also whether it is symmetric,
left-skewed or right-skewed. For the categorical variables identify the most common level. (Hint. Look
at the distributions without the incorrectly entered data.)
4. Generate five-number summaries for quantitative variables. For categorical variables produce a frequency
table. Identify any missing values.
5. Export the cleaned data into a CSV (comma separated values) file. (Hint. Type ?write_csv into R
console.)
6. Answer the questions in Assignment 1 (practical) on MyUni.
7. Upload the R script that you used to complete Tasks 1 to 6 above.
For full marks you must include commented code to explain why and how you cleaned each variable.
Please note the output csv file must follow the naming: A1_aXXXXXXX.csv . That is, when you save your
clean data, you will call it this file name where aXXXXXXX is your Student ID number.
[20 marks]

For solutions, purchase a LIVE CHAT plan or contact us

Limited time offer: