top of page

AFE135 Business Data Analysis

For solutions, purchase a LIVE CHAT plan or contact us

Due date: 11pm on Fri 19th August

Q1. Malaria (10 Marks)
Malaria is a parasitic disease that spreads from being bitten from an infected mosquito. The disease is a profound cause of human suffering and affects approximately 240 million individuals worldwide. It has been estimated that malaria has accounted for around 5% of all deaths in the 21st century. Thus, tracking progress in controlling this disease is an important part of monitoring human development and progress.
The file malariabycountry.xls contains data on 200+ countries giving their rates of malarial mortality per 100,000. You have data for two periods, 2011 and 2019. Your task is to analyse this dataset, and write a brief report on the levels and trends of malarial mortality rates over the last decade or so. To do so you should address the following points:
Produce some histograms for malarial death rates in 2011 and 2019, making sure that you use a consistent approach to binning across years. Comment on the location, shape and skewness of the distributions. Are there any outlier countries to the left or right? What are the implications of these outliers for public health?
Calculate basic descriptive statistics (mean, variance, standard deviation, interquartile range) for the mortality variables for 2011 and 2019. What has happened to (i) the average value, and (ii) the inter-country spread, over time? Discuss. What do your results say about trends in global health? Discuss.
Use the “sort” function in Excel to order countries from highest to lowest life malarial death rates using the 2019 data. Drawing upon your general knowledge, identify some characteristics that the countries with the highest rates tend to have in common. Also, identify some characteristics that countries with the lowest rates tend to have in common. Write a short paragraph discussing the implications of this analysis for a public health official, who is looking to develop policies that combat malaria.

Q2. Voter Behaviour (15 Marks)
Political scientists are often interested in using poling data to understand the preferences of electorates, and to forecast the results of elections in advance. For this question, you are to place yourself in the shoes of a pollster who is trying to:
(i) obtain a representative sample of the underlying electorate for an upcoming (fictional) election, and (ii) analyse the data such that you can gauge the level of support for a given candidate.
In order to gather data on voting intentions you will need a sampling plan. What method of sampling (simple random, stratified, cluster) will you use in this instance? Give a brief justification of your choice.
In analysing your data, you are likely to encounter both sampling error and non-sampling error. Which do you think will be a greater threat to your analysis? Give an example of how non-sampling error may bias your results.
The file Voter_Behaviour.xls has data on 100 randomly selected individuals. The column labelled support indicates whether or not that voter supports your candidate. A value of 1 here indicates that the voter does intend to support you candidate, while a value of 0 indicates they do not.
Calculate the fraction of the sample that intends to vote for your candidate, and express this value as a percentage. On the basis of this value, do you think your candidate will win? Why or why not? Suppose that a vote share of 50% or more is required to win the election.
Determine the standard error for the sample proportion reported in the point above, and use this to calculate a 95% confidence interval for the true population proportion (hint – you can use the Excel spreadsheet tab categorical_CI.xls for this). On the basis of this interval, do you believe that you will win?
As confidence intervals become wider, they become more likely to contain the true population parameter, and hence become more accurate. On the other hand, wider intervals are less useful, as they are less precise. Write a couple of sentences contrasting the benefits of a wider interval vs a narrower interval.
Using the tab “hypothesis test” in your Excel file, test the null that the population proportion is equal to 0.5. Give the null and alternative hypothesis, test statistic, critical value and a conclusion. Use a significance level of 5% and interpret your result.

Q3. Risk Management (25 Marks)
Asset managers are often interested in minimizing risk in their portfolios by investing in assets that react differently under varying market conditions. The idea is that by buying some securities that are positively correlated with broader market movements, and some others that are negatively associated with the market, the combined risk exposure will be reduced.
A key statistical idea for assessing this type of financial risk the market beta. This is a parameter from a regression model, designed to measure the association between the return on an asset, and the overall market performance. Market betas can be calculated using the following equation:
y_s=β_0+β_1 x_m+e
Here y_s is the return on the asset, and x_m is the market return. A share with a high beta will move strongly with the market, while a beta closer to zero will be less sensitive to market fluctuations. Shares with negative betas will move in the opposite direction to the broader market.
The file retailsharereturns.xls has observations on weekly returns (in percent) from three prominent US firms – Walmart (WMT), Amazon (AMZN) and Walgreens (MBA). There is also data on the S&P500 (a commonly used stock market index) that can be used as a proxy for x_m. Your task is to provide some statistical analysis using these data to assist a fund manager in their risk strategy.
Calculate the standard deviations of the weekly returns the three companies. Which one has the most variable returns? Which investment appears the best bet for minimizing risk? Discuss.
Produce some scatterplots depicting the associations between share returns and market returns. Do your three companies appear to be (i) positively associated with x_m, (ii) uncorrelated with x_m, or (iii) negatively correlated with x_m? Interpret your results.
Estimate the three market betas using the regression equation given above. Which company has the strongest statistical link with the market returns? Which has the weakest statistical link? Discuss.
Test the hypothesis that the return on Walmart shares is uncorrelated with the market using your regression output (i.e. test the null that β_1=0) at α=5%. Give the null and alternative hypotheses, the test statistic, the critical value p-value and a conclusion. Interpret your result.
What would happen to your test if you used a significance level of α=1% instead? Do you draw the same conclusion? Briefly discuss the implications of a change of significance on the chances of Type I and Type II errors.
Sometimes share markets can exhibit mean-reverting properties, where a positive movement in the price one week is cancelled out by a negative movement the week after (or vice versa). In the second tab of the finance_data.xls file (look at the bottom left of your screen) there is a list of returns for the S&P500, matched with the return from the previous week. These variables are called “S&P Now” and “S&P Lag”.
Perform a correlational analysis (using scatterplots, correlation coefficients and a regression equation) to see if such a mean-reverting pattern exists in the S&P500 (hint – simply examine the links between called “S&P Now” and “S&P Lag”). Do you find a negative relationship between the performance in one week, and the performance in the next? Discuss.
If you do find such an association, what should a trader of the S&P500 consider doing if their holdings of this asset increased sharply in value during the current week?

For solutions, purchase a LIVE CHAT plan or contact us

Limited time offer:

Follow us on Instagram and tag 10 friends for a $50 voucher! No minimum purchase required.

bottom of page