October 3

For solutions, purchase a LIVE CHAT plan or contact us

BTMA 636 (Fall 2022) HW #2: Building Decision Support Tools

Midterm Retake Policy (50 points)
1. Due to COVID-19, many instructors are experimenting with novel online examination procedures. A
professor has asked me for your help to design the online midterm retake policy for their class.
For the midterm, each student will have one initial attempt. The score of that attempt is that student’s
pre-retake score. After the midterm, students can retry the exam an unlimited number of times on D2L
(before a specified deadline), learning from their mistakes along the way.
The midterm retake policy takes the following form. If your pre-retake score is less than P, then if your
highest retake attempt score R is at least P +B, then your updated midterm grade will be P +B. Otherwise,
it will be R. If your pre-retake score S is at least P, then if your highest midterm retake attempt score R is
at least S + B, then your updated midterm grade will be S + B. Otherwise, it will be R. In other words,
there is a cap on how high a student’s post-retake score can be which depends on the student’s initial score.
For example, suppose the midterm policy had P = 30 and B = 30. The policy would be: If your pre-retake
score is less than 30, then if your highest retake attempt score R is at least 60, then your updated midterm
grade will be 60. Otherwise, it will be R. If your pre-retake score S is at least 30, then if your highest midterm
retake attempt score R is at least S + 30, then your updated midterm grade will be S + 30. Otherwise, it will
be R. For example, if you got a pre-retake score of 45, then your highest possible post-retake midterm grade
would be 75. If you got a pre-retake score of 60, then your highest possible post-retake midterm grade would
be 90. If you do not get at least 90 on a retake attempt (for example, suppose your highest retake attempt
score was R = 85), then your post-retake midterm grade would be your highest retake attempt score (85).
To get credit for the questions below, make a function to answer these questions or show your
work as R comments in your homework file.
1) Suppose you were in that class. If you initially got a score of 21 on the midterm. Also suppose that the
midterm retake policy was set so that P = 30 and B = 30. Suppose that you attempted the midterm
over and over again until you scored R = 50. Then what would be your post-retake midterm score?
2) Suppose you initially got a score of 21 on the midterm. Further suppose that the midterm retake policy
was set so that P = 30 and B = 30. Suppose that you attempted the midterm over and over again
until you scored R = 60. Then what would be your post-retake midterm score?
3) Suppose you initially got a score of 21 on the midterm. Further suppose that the midterm retake policy
was set so that P = 30 and B = 30. Suppose that you attempted the midterm over and over again
until you scored R = 70. Then what would be your post-retake midterm score?
4) Suppose you initially got a score of 21 on the midterm. Further suppose that the midterm retake policy
was set so that P = 30 and B = 30. What would be your highest possible post-retake score?
5) Suppose you initially got a score of 54. Further suppose that the midterm retake policy was set so that
P = 22 and B = 30. Suppose that you attempted the midterm over and over again until you scored
R = 67. Then what would be your post-retake midterm score?
6) Suppose you initially got a score of 54. Further suppose that the midterm retake policy was set so that
P = 22 and B = 30. Suppose that you attempted the midterm over and over again until you scored
R = 78. Then what would be your post-retake midterm score?
7) Suppose you initially got a score of 54. Further suppose that the midterm retake policy was set so that
P = 22 and B = 30. Suppose that you attempted the midterm over and over again until you scored
R = 89. Then what would be your post-retake midterm score?
8) Suppose you initially got a score of 54. Further suppose that the midterm retake policy was set so that
P = 22 and B = 30. Then what would be your highest possible post-retake midterm score?
9) Suppose you initially got a score of 22 on the midterm. Further suppose that the midterm retake policy
was set so that P = 30 and B = 30. What would be your highest possible post-retake score?
10) Suppose you initially got a score of 86 on the midterm. Further suppose that the midterm retake policy
was set so that P = 30 and B = 30. What would be your highest possible post-retake score? Assume
that there are enough bonus problems on the midterm that your score can be over 100 (for example,
imagine that there are 50 points worth of bonus problems on the midterm).
Building the Midterm Retake Policy Function
Given a vector that consists of a column of pre-retake midterm scores where the class average score is less
than 75, build a function that tells the professor what pre-retake threshold P and boost amount B to use
so that the maximum post-retake average is between 70 and 75. What is meant by ‘maximum post-retake
average’ is that you should assume each student achieves their highest possible post-retake score.
Your function should satisfy the following:
1. P and B are positive multiples of 2.
2. If possible, choose P and B so the maximum post-retake average is between 70 and 75 (strict inequality)
and that P + B ≥ 60. If there are multiple P and B that satisfy this criterion, then choose the one whose
standard deviation of post-retake scores is closest to the standard deviation of the pre-retake scores.
3. (Optional) If no P and B with P + B ≥ 60 can make the maximum post-retake average lie between 70
and 75, then if possible, choose P and B so that the post-retake average is between 70 and 75 and that
P + B is as close to 60 as possible. If multiple P and B satisfy this, use the standard deviation condition
above.
4. (Optional) If there is no P and B that would make the maximum post-retake average lie between 70 and
75, then choose P and B to get a post-retake average above 75 but as close to 75 as possible.
D2L Questions: What P and B do your function output when applying your function to the five RDA
files in the HW 2 D2L folder containing pre-retake midterm scores?
Tip #1: Think about the structure of the problem. Only once you understand the problem should
you begin to think about how you could design a step-by-step process to find P and B to satisfy the
problem.
Tip #2: After you have a potential solution approach, then begin to think about the code to implement
that approach. One way to see if your code works is to apply your functions on the datasets provided in the
HW folder for this assignment and use the D2L quiz as a way to check if you got it right/wrong. Another
way is to construct a simple dataset yourself, so that you can more easily track how pieces of your code are
working on that simple dataset. For example, you can construct a dataset with eight students with scores:
10, 20, 30, 40, 50, 60, 70, 80. If you take this second approach, then repeat this process again for other
datasets with simple numbers you construct to ‘test out your function(s).’ Once you are confident that your
code works on any dataset, then you can become more confident that your code should work for any number
of students and any set of pre-retake scores whose average is less than 70.
Note: Make sure you have built a user-defined function for this problem. I should be able to run
a user-defined function you built to answer the D2L questions for this problem. You will lose 25 points
for this problem if your answers to the D2L questions about what P and B to choose are not
outputs of a function you built.

Random Student Selector (10 points)
2. One of the challenges with moving courses online is that instructors are worried that students may tune
out more easily. An instructor at Haskayne wants to keep students on their toes by calling on students
throughout the class. However, the instructor does not want to always pick the same students (students that
come to office hours, students whose names are at the top of the class roster, etc.). Instead, the instructor
wants to be more fair about how the names are chosen by calling students at random.
In this problem, you will create the Random Student Selector function. In particular, given a number N and
a csv file containing the first and last names of students (as provided on D2L in the HW 2 folder), build
a function that prints or returns N randomly selected students’ full names. Your function should have a
number N and a class roster (as a csv file) as an input. The requirement of the function is to print or return
N randomly selected students’ full names. You can assume no two students in the class have the same first
and last name. Comment your code so that the TA and I can know how to use your function.
To test your function, you can apply your function on our class roster. If you have an error with the
read.csv() function, make sure that the csv file is in the current working directory. To check your current
working directory, use the getwd() function. You can change your working directory with the setwd() function
or using RStudio’s tools.
Tip #1: When reading the csv, using the read.csv() function, set the stringsAsFactors parameter to FALSE.
That way, you don’t have your columns be factors (levels of a categorical variable in regression) as default.
Tip #2: Make sure you have return() somewhere in the body of your function so that it’s clear to you (and
to the TA and I) what the output of your function was supposed to be. Using the return() function in your
user-defined function is not necessary (by default, R will assume the object that was last defined in your
function is your output), but it is helpful to be explicit about what the output of that function was supposed
to be. This tip applies for any user-defined function you make (especially when working on team projects).
D2L Questions: How many arguments does your Random Student Selector function have? Could an
instructor of BTMA 601 use your function to randomly select three BTMA 601 students? Could an instructor
of MGST 217 use your function to randomly select four MGST 217 students? If I asked you to randomly
select five students from this class, is there a chance that you would be selected? Assuming no two students
have the same name, could the output of your function ever print the same student twice when running your
function once?

Peer Review Assignment Function (20 points)
3. In a class at Haskayne, students will be evaluating other students’ projects. The professor of the class
is asking for your help to develop a tool to quickly assign students to mark other students’ projects. Each
student will evaluate three other students’ projects (students may not evaluate their own projects).
Given a data frame that contains two columns (assume that the first column is “Last Name” and the second
column is called “First Name,” as that is how the names are displayed when exporting the roster from D2L),
build a function that assigns each student to be a judge of three other students’ projects. Make sure your
function is such that each student has exactly three classmates evaluating his or her project (so that it’s not
the case that every student evaluates the same student and no other students get feedback). Your assignment
policy can be the outcome of a random selection process or it can be the outcome of a deterministic process
(meaning that every time you run the function, it gives the same output). You can design your output
however you want, but make sure to comment your code to make it clear how to use your function and
interpret the output. Make your function output either a list or a data frame. Figure out how to
structure that list or data frame so that the instructor can run your function see in a glance what they want
to see. In other words, if your function just prints a few students’ names, then that’s not enough. Your
function should work on any class roster of any size with at least four students.
D2L Questions: Would your function work on any class roster with at least four students? How many
inputs does your function have? If I used your function, would it be possible that some student has two or
fewer peer evaluators? If I used your function, would it be possible that some student evaluates four or more
of their peers’ projects? If I used your function, would it be possible for a student to evaluate his or her own
project?
Tip #1: As always, think about the problem conceptually before jumping into the code (and perhaps even
draw a diagram, if you are a visual thinker). Write down your approach before jumping into writing the code
itself so that you have a high-level sense of what your function does. It is possible to write such a function
with only a few lines of code.
Tip #2: Be careful of calling objects within your function that were defined outside of your function. If you
are calling an object that was defined outside of your function (i.e., an object that is already in the Global
Environment), then this is very risky to do unless you are sure this is what you want to do. If someone tries
to run your function without that object already defined, then your function won’t work.
Tip #3: Students often lose points on this question because they did not guarantee that their function would
satisfy the requirements. For example, the output of their function might have that a student evaluates his
or her own work or that two of the reviewers might be the same person. As a Quality Assurance step, it
is recommended that you create a function to ensure quality (in addition to the function you are asked to
build). This function should guarantee that each student has three distinct peer evaluators and that that
nobody is evaluating his or her own projects. In essence, if you build this quality assurance function, then
you can can “prove” that your function works and satisfies the specified requirements that it was supposed
to satisfy.

Let’s Play a Game (20 points)
4. This game illustrates the basic idea of simulation models. One of the homework problems in HW 3
builds on top of the skills developed here to tackle a common problem in operations management. For each
sub-question below, comment your code so that the TA and I know where to look for work
for 4a, 4b, 4c, 4d, 4e, and 4f.
The game goes as follows. First, you choose a whole number between 0 and 1000 (inclusive). Then I use
R’s sample() function to randomly select a whole number between 0 and 1000 (inclusive). Then you pay
me the square of the difference between the numbers. For example, if you choose 2 and the random number
generator says 5, then you pay me $ (3)2 = $9. On the other hand, if you choose 80 and the random number
generator says 20, then you would pay me $ (60)2 = $3600. You always pay me unless you perfectly guess
my number.
4a) (2 point) To the nearest five thousand dollars (do not include the decimal or the cents, just give the
whole dollar value), how much would you expect to pay me if you chose 30?
4b) (2 point) To the nearest five thousand dollars (again, leave out the decimal and cents, only writing the
dollar value), how much would you expect to pay me if you chose 950?
4c) (2 point) To the nearest five thousand dollars, how much would you expect to pay me if you chose 450?
4d) (4 points) If you had to, what number would you choose to minimize your expected loss? Round to the
nearest multiple of 5.
Note: Guessing the answer is not sufficient. You need to show your work to justify that your choice
indeed incurs the lowest expected loss out of all your possible choices (for example, by using
which.min() or by creating a plot of how the expected loss changes in the decision variable).
4e) (2 point) What is your expected loss at this chosen number? Round to the nearest five thousand dollars,
with the same format as in the previous questions.
Hint/Note: If you want more context, you can think of this as a model of politics. You are about to make
a speech to the press and the general public. How do you position yourself, knowing that people on both
sides of the aisle may toss eggs at you or write scathing articles about you if you say the wrong thing?
Super Hint: Feel free to use the following code as a template.
N <- 10000 # Number of simulations. Take it to N <- 1000000 when ready.
random.draws <- sample(0:1000, size = N, replace = TRUE) # Random simulation draws
# If it helps, think of the entries of that vector
# as simulation draws from different universes.
# Across the multiverse, different numbers were drawn.
# You don't know which particular universe you happen to be in.
# You want to make a decision to minimize your average loss across all potential outcomes.
# For each possible choice, you want to figure out your expected loss for that choice.
choices <- 0:1000
expected.loss.vec <- numeric(0) # Defines empty numeric vector to store values in
for(i in 1:length(choices)){
# Fill in the for-loop below to complete the problem.
}

==========================================================================

ECOM30003: Applied Microeconometric Modelling

1. [4 marks] Consider the following model relating a person’s disability income support payments to
their labour market earnings;
ist 0 1 23 st st ist it 4 i ist y year x earnings health =+ + + + + α α αα α ε (1)
where ist y is the logarithm of real SSI or DI payments for person i in state s at time t, it earnings is
the logarithm of real earnings of person i at time t, i health is a measure of long term health of
person i, ist x is a vector of control variables, st year are state by year fixed effects, and ist ε is the
error term.
If you have panel data on all the variables in the model, what four assumptions must this model
meet in order for the OLS estimator of α 3 to be an unbiased estimator.
2. [5 marks] Suppose that you have panel data available on all variables in this model except
individuals’ long term health. Given that health is unobserved in the data, write down the model
that can be estimated. Under what assumptions is the OLS estimator for α 3 an unbiases estimator
of this parameter? Are all of these assumptions likely to be met? Why or why not?
3. [4 marks] Consider the following model
ist 0 1 23 st st ist it ist ∆ = + + + ∆ +∆ y year x earnings u β β ββ (2)
where ist ∆y is the first difference in the logarithm of real SSI or DI payments for person i and
it ∆earnings and ist ∆u are similarly defined. On the basis of the information provided in question 1
and 2, including on data availability, will the OLS estimator for β 3 be unbiased? Explain your
answer.
4. [5 marks] Now suppose that the earnings data we have is measured with error and as a
consequence the first difference in earnings is measured with error:
cov( , ) 0
it it it
it it
e earnings v
earnings v
∆ =∆ +
∆ =


Where it ∆e is the mismeasured earnings variable we have in our data. Assume that both
it ∆earnings and it ∆e are uncorrelated with ist ∆u and that it ∆earnings is uncorrelated with it v . Will
the OLS estimator for β 3 be unbiased? Explain your answer, including the expected direction of
any bias.
5. [5 marks] Why might the Instrumental Variables estimator be a consistent estimator for the
parameter of interest in this model? Be sure to lay out the full set of assumptions required for the
estimator to be consistent, including the assumptions about the measurement error.
It turns out that the data that the authors could access is at the level of county, not individual. So they have
a panel of county level data, where counties are geographic areas within states. As a consequence, let i
denote county (not individual).
ist 0 1 23 st st ist it ist ∆ = + + + ∆ +∆ y year x earnings u β β ββ
The data I am providing you is not exactly the same as the authors data and it is at the FIPS level, not county
level. (FIPS stands for Federal Information Processing Standard). You will not be able to reproduce the
tables in the paper for this table exactly. I have provided the sample statistics for the data set you will be
using at the end of the assignment sheet.
6. [15 marks] Using the data provided, replicate Table 2 (using the format shown in the paper) and
label your replication “Table 2”. What does Table 2 suggest about the relationship between growth
in SSI income and growth in earnings, and between growth in DI income and growth in earnings,
during coal boom times and during coal bust times?
Note that this question requires you to construct the variables reported in Table 2. The authors have
coded missing values resulting from differencing and lagging variables as zero. They have also coded
missing values for the fraction of the economy in manufacturing in 1969 as zero. This is not correct,
but we follow what the authors do in order to reproduce their results. Recall that because the data
you are using is slightly different from the authors, you cannot exactly reproduce Table 2 from the
paper. The descriptive statistics table at the end of the assignment sheet provides the equivalent
information for sample averages for the data you are using.
7. [10 marks] Examine whether the IVs are relevant in the specification with the full set of controls and
in the specification that controls for state by year fixed effects only. Do this for both specifications
for which DDI is the outcome and for which SSI is the outcome. Present the key results in a Table,
call it Table 2A, to show your findings. The key results to report in this table are point estimates and
t-stats for the IVs (for each of the 4 specifications) along with the F stat examining relevance, and the
sample size. Comment on the signs and significance of individual IVs as well as their overall
relevance. What do you conclude?
8. [10 marks] Replicate Table 3 of the paper. Include it in your assignment appendix, labelled as Table
3. Note that there will be small differences in some estimates compared to those reported in the
paper because the data you are using is not exactly the same as the data used by the authors.
9. [10 marks] Interpret the coefficient estimates (including magnitude, sign and significance) from your
replication of Table 3. Comparing the OLS and IV estimates, do you find evidence of the expected
sign of bias?

10. [5 marks] What does this analysis tell you about why the number receiving income support due to
disability and cost of DI and SSI has increased in the US since the 1980s?

Descriptive Statistics (from the data provided)

Large Coal Moderate Coal No Coal
All Counties Counties Counties Counties

Coal Boom (1970-1977)
Log difference in SSI payments 0.06 0.06 0.07 0.06
Log difference in DI payments 0.13 0.10 0.12 0.14
Log difference in county earnings 0.03 0.06 0.03 0.02
Log difference in population 0.01 0.02 0.01 0.01
Difference in real price of coal 0.08 0.08 0.08 0.08
Log difference in real price of coal 0.09 0.09 0.09 0.09
Log difference in coal value instrument 0.25 0.72 0.55 0.05
Mean coal reserves 458 2563 412 6.43
Fraction of economy in manufacturing (1969) 0.27 0.16 0.28 0.29
Fraction of counties with an MSA 0.26 0.19 0.30 0.26
Population 84.40 59.10 80.40 91.30
Number of Fips 2640 376 568 1696
Coal Bust (1983-1993)
Log difference in SSI payments 0.06 0.07 0.06 0.06
Log difference in DI payments 0.03 0.03 0.03 0.04
Log difference in county earnings 0.02 -0.01 0.01 0.03
Log difference in population 0.00 -0.01 0.00 0.01
Log difference in coal value instrument -0.11 -0.32 -0.24 -0.02
Difference in real price of coal -0.03 -0.03 -0.03 -0.03
Log difference in real price of coal -0.04 -0.04 -0.04 -0.04
Population 85.70 58.50 78.30 94.20
Number of Fips 3,630 517 781 2332

=======================================================================

Statistics for Business
Unit Number: 200032

A study was commissioned to investigate the characteristics of employees at a major business.
Data was collected on 60 employees and the following variables recorded.
Column 1 Gender What is your gender? (1 = Male, 0 = Female)
Column 2 Age How old are you?
Column 3 Salary What is your yearly salary (before Tax)?
Column 4 Overtime How much overtime do you work each week?
Column 5 Job Type What department do you work in? (1 = Purchasing,
2 = Marketing/Human Resource, 3 = Accounting)

QUESTION 1 (7 marks)
The company has been accused of gender inequality. Test, at the 5% level of significance, whether the
average salary of males is greater than the average salary of females
[You may assume that the unknown population standard deviations for male and female salaries
are equal]
QUESTION 2 (6 marks)
Test, at the 5% level of significance, whether Job Type of an employee is related to the employee’s
gender?
QUESTION 3 (7 marks)
Can we conclude, at a 5% level of significance, that a linear relationship exists between the
employee’s Salary (y) and employee’s Age (x)?

==========================================================================

ISYS3435 Predictive and Prescriptive Analytics in Business

At Lumo Trucks’ monthly executive planning meeting, the company’s CEO expressed dissatisfaction
with the company’s financial performance during the past two financial quarters for one of the model
trucks the company manufactures. The CEO said to the you, in which you are the data analyst from
the analytical team of the company, “I know we are operating at capacity in some of our production
lines. But surely, we can do something to improve our financial position. Maybe we should change our
product mix? We don’t seem to be making a good profit on our Model X trucks. Why don’t we just
cease Model X altogether? Please consider different options and come up with a recommendation for
me of what you think based on your analysis.”
Lumo Trucks was established in the 1990s and is renowned for manufacturing high quality trucks. The
company manufactures two specialised models of trucks, Model X and Model Y, in a single plant in
Melbourne, Australia. Manufacturing operations are grouped into four departments, which are engine
assembly, metal stamping, Model X assembly and Model Y assembly. Due to recent changes in
Australia’s economy, a budget of $21 million in the total overhead costs for producing the trucks has
been set to ensure the company do not overspend. In addition, over the past year, Model X and Model
Y have been sold at $39,000 and $38,000 each respectively.
As a data analyst analyst, you need to determine the number of model X and Y trucks should be
produced for Lumo Trucks to maximise their profit, given the resource capacities it has. Please use the
information provided in Table 1 and 2 below for your analysis. In addition, before you finalised your
production decisions, you would like to explore the following potential scenarios:
(a) The Sales Manger of Lumo Trucks believe it is possible to increase the sale price of Model Y by
20% due to its popularity and demand. If the sale price has increased, do you see any changes
to the number of Model Y manufactured given the current operation capacities? If there is a
change in the number of Model Y manufactured, why do you think it is the case? If there is
not a change in the number of Model Y, what do you think are the main contributing reasons?
(b) The financial controller of the organisation predicts there will be growth in Australia’s
economy in the next year. Therefore, the financial controller has approved an increase in the
overhead budget of $21 million to $25 million. Find out the number of trucks can be produced,
the profit and the required resources required and whether there are any unused resources
based on this increased overhead budget.
As a forward thinker and to address the CEO’s concerns, you have been given the responsibility
to investigate this problem. Produce a report which includes the recommendation(s) to your
CEO regarding this issue and include your analysis to the potential scenarios.

Table 1. Machine-hours: Requirements and Availability

Department

Machine-hours required per truck Total Machine-
hours Available per

Model X Model Y Month
Engine assembly 1.0 2.0 4,000
Metal stamping 2.0 2.0 5,000
Model X assembly 2.0 - 4,000
Model Y assembly - 3.0 3,000

Table 2. Standard Product Costs per unit

Notes:
This case study is adapted based on the following resource:
Merton Truck Co. case study from Harvard Business Publishing Education
Model X Model Y
Direct materials: $22,000 $20,000
Direct Labour
- Engine assembly
- Metal stamping
- Final assembly
Subtotal:

$2,200
$800
$3,000
$6,000

$2,400
$600
$1,500
$4,500

Overhead
- Engine assembly
- Metal stamping
- Final assembly
Subtotal:

$2,000
$3,000
$5,000
$10,000

$1,500
$2,500
$3,500
$7,500
Total: $38,000 $32,000

=========================================================================

MXB107 Assessment 2
30 October BY 11:59 PM.

Question 1
The geometric distribution describes a random variable $X$: the number of Bernoulli trials required before the first success, e.g.��Let $X$ be the number of coin tosses needed to obtain one head.

The probability mass function for the geometric distribution is \[ p(x) = (1-p)^{x-1}p \] where \[ E(X)=\frac1p \]

Find the Method of Moments estimator of $p$.��(2 points)
Type Your Answer Here:
Find the Maximum Likelihood estimator of $p$. ��(3 points)
Type Your Answer Here:
Question 2
$30$ students in a chemistry class each performed an experiment measuring the amount of copper (Cu) recovered from a solution of copper sulfate CuSO$_4$. The sample mean and standard deviation of the results from each of the $30$ experiments is $\bar{x}=0.145$ mol and $s=0.0051$ mol.

Find a $90\%$ confidence interval for the mean amount of copper (Cu) recovered from the experiment.
(3 points)

Type Your Answer Here:
Question 3
A random sample of $130$ individuals recorded an average body temperature of $36.8$C with a standard deviation of $0.41$C. Traditional data indicate that ��normal�� human body temperature is $37$C.

Does this experimental data provide sufficient evidence to reject the null hypothesis that ��normal�� human body temperature is $37$C? Assume a Type I error rate of $\alpha=0.01$. (3 points)
Type Your Answer Here:
The $37$C standard was derived in 1868 by a German doctor who claimed it was based on a sample of 1 million temperatures recorded throughout their research. What results can you draw about this research based on the results of your hypothesis test? (2 points)
Type Your Answer Here:
Question 4
Polling of the marginal state seats of Currumbin, Mansfield and Aspley by YouGov for the Australian Conservation Foundation shows a combined two-party preference, based on a survey of $600$, individuals of $52\%$-$48\%$ for Labor versus the LNP, compared with an almost exact $50\%$-$50\%$ for these three seats in 2017.

Type Your Answer Here:
Is there evidence to reject the null hypothesis that the proportion of voters preferring Labor over the LNP is less than or equal to $50\%$? Assume a Type I error rate of $\alpha=0.05$. (4 points)

Type Your Answer Here:
Question 5
To compare the performance of two swimmers, their 100m freestyle times were recorded independently of one and other at random 10 times during both practice and competition.

Compute a $95\%$ confidence interval for the difference in the average times for Swimmer 1 and Swimmer 2. (3 points)
Type Your Answer Here:
Perform a hypothesis test of the difference between the two swimmers average 100m freestyle times. Is there evidence that Swimmer 2 is, on average faster than Swimmer 1? Assume a Type I error rate of $\alpha = 0.05$. (2 points)
Type Your Answer Here:
Question 6
To test the effects of alcohol on reation times seven individuals participated in an experiment where their reaction times we measured using the same means both before and after ingesting $90$ millilitres of $40\%$ alcohol.

Does mean reaction time increase after consuming alcohol? Use a Type I error rate of $\alpha=0.05$. (3 points)

Type Your Answer Here:
Question 7
The warpbreaks data set contains the breaks in yarn during weaving. We wish to examine factors that lead to breaks in the yarn.

Perform a single factor ANVOA considering the type of wool, is there a significant difference in the number of breaks by type of wool? Use Tukey��s Honest Significant Differences to identify which (if any pairs are different). (2 points)
Type Your Answer Here:
Repeat the ANOVA but now block on the tension of the loom (L,M,H), does this change the results? (3 points)
Type Your Answer Here:
For both models, use Tukey��s HSD to identify which pairs of types of wool (if any) are statistically significantly different. (2 points)
Type Your Answer Here:
Question 8
The Loblolly dataset contains information on the growth of Loblolly pine trees.

Fit a one-way ANOVA model of the height as a function of Seed, does the type of see have a statistically significant effect on height? (2 points)
Type Your Answer Here:
Use Tukey��s HSD to identify which (if any) pairs of seeds differ significantly (2 points)
Type Your Answer Here:
Now perform an ANCOVA for height as a function of Seed, but control for the trees age. Does this change the results from the one-way ANOVA? If so, how? Based on the residuals, is it reasonable to accept that the assumptions for linear regression are met? (5 points)
Type Your Answer Here:
Repeat the analysis using Tukey��s HSD in part b.). What are the results? (2 points)
Type Your Answer Here:
Question 9
Grades in an elementary statistics class were classified by the students�� majors. Is there any relationship between grade and major? (4 points)

===========================================================================

Biostatistics for Public Health (TM5516)
Sunday 23rd Oct 2022

1. Assessment descriptor
Suppose you conduct a study where you want to study the relationship between High-Density
Lipoprotein (HDL) and some biomarkers.
You collected the following measurements from 80 subjects (Download the “BODY1.sav”
data); BMI (kg/m2

), AGE in years, GENDER (0=female and 1= male), PULSE is pulse rate
(beats per minutes), SYSTOLIC is systolic blood pressure (mm Hg), DIASTOLIC is diastolic
blood pressure (mm Hg), High-Density Lipoprotein (HDL) is cholesterol (mg / dL), Low-Density
Lipoprotein (LDL) is cholesterol mg / DL).

2. Specific Tasks
This assignment will allow students to demonstrate the analytical skills they have
acquired throughout the subject.
The final report is due on Sunday, 23rd October 2022 (11.59 pm AEST).
Specifically, you should:
i. Use appropriate summary statistics to describe the following variables: GENDER,
BMI, DIASTOLIC, PULSE, LDL. Present your results in tabular format. 10 marks
ii. Calculate the correlation between all continuous variables. Interpret your results. 20
marks
iii. Group the age into three different AGE brackets “18-25”, “26-45” and “46 and above”.
Test the claim that subjects in those AGE brackets have the same mean LDL. 30
marks
iv. Test whether DIASTOLIC blood pressure and PULSE rate varied by GENDER. What
are the null and alternative hypotheses? 20 marks
v. Using GENDER, AGE, BMI, DIASTOLIC blood pressure, SYSTOLIC blood pressure,
and PULSE rate to predict LDL. Interpret the result and present the regression
equation. 20 marks

For solutions, purchase a LIVE CHAT plan or contact us

Limited time offer: