September 30

For solutions, purchase a LIVE CHAT plan or contact us

MATH1905: Statistical Thinking with Data (Advanced)
11:59pm Friday 7 October 2022

1. (20 marks) An American university studied the relationship between SAT scores (taking
values between 200 and 800) and first-year GPAs (taking values between 0 and 4.0) for
students at the end of their first year. The scatterplot of GPA against SAT formed an
ellipse-shaped cloud. The summary statistics were
average SAT score = 550, SD = 80
average first-year GPA = 2.6, SD = 0.6, r = 0.4.
(a) Approximately what proportion of students had SAT scores above 650?
(b) On average, what increase in GPA is associated with a 100 point increase in SAT
score?
(c) Estimate the GPA of a student with an SAT score of 650.
(d) Approximately what proportion of students with an SAT score around 650 had a
GPA above 3.45?

2. (40 marks) The bivariate data (x, y) in the file linear_analysis_data_n1e4.txt (avail-
able through the Resources page on Canvas) was generated by first drawing the x variate
as a sample of size N from the standard normal distribution. We next drew a scaled
bell-shaped sample of noise e, which was slightly tweaked to ensure that Cor(e, x) = 0
and e ̄ = 0. Finally, we defined the variate y through the equation

y = A + Bx + e, (1)
where A, B ∈ R.
(a) How many data points are in this set (i.e., what is N)?
(b) Is the scatterplot of y vs. x ellipse-shaped and are x or y bell-shaped?
(c) Explain how can you find what A and B are (QC: if you get them right they are
integers). Try to be as rigorous as you can.
(d) Redo the scatterplot of y vs. x and add the line you just found.
(e) Try to verify in R that Cor(e, x) = 0 and e ̄ = 0. What is going on?
(f) Use the normal approximation of y in the vertical strip x ∈ (2.0, 2.2) to estimate
the proportion of y values in that strip that exceed 3.7.
(g) Compare the last estimate with the actual proportion of y values in that strip that
exceed 3.7.

3. (10 marks) Given N bivariate observations
(xi
, yi)
i=1,...,N we construct a new list by

defining zi = xi + yi

for i = 1, . . . , N. Show that
Var(z) = Var(x) + Var(y) + 2 Cov(x, y).

4. (20 marks)
(a) Given N bivariate observations
(xi
, yi)
i=1,...,N show that
N(N − 1) · x ̄y ̄ − N · Cov(x, y) = X
i6=j
xiyj
.

(b) Show that

N(N − 1) · x ̄

2 − N · Var(x) = X
i6=j
xixj

5. (10 marks) (FPP, 15.4.19) There are 52 cards in a deck, and 13 of them are hearts.
(a) Four cards are dealt, one at a time, off the top of a well-shuffled deck. What is the
chance that a heart turns up on the fourth card, but not before? Explain briefly.
(b) A deck of cards is shuffled. You have to deal one card at a time until a heart turns
up. You have dealt 3 cards, and still have not seen a heart. What is the chance of
getting a heart on the 4th card? Explain briefly.

===========================================================================================

A small group of well engineers are going to reside in a small building close a recently
discovered oil reservoir until their project is finished. The head of the engineering team has
approached a consulting office to study the thermal condition in this building and to suggest
ways of improving the thermal comfort condition during the occupancy, if it is required.
The exact climatological and related data are not available to the consulting company responsible
for this work; however, the following information were provided:
1) The room is 10m × 8m × 2.2m (D × W × H) and has a flat roof.
2) The room has two 2m × 1.2m single-glaze windows: one on the south wall and one
on the east wall.
3) The maximum and minimum outdoor temperatures are 25°C and -6°C, and they occur
at 15:00 PM and 03:00 AM, respectively.
4) The equivalent sky temperature may be assumed to be 8°C below the ambient air
temperature.
5) The ground temperature at a depth of about 5 meters is constant throughout the year.
It’s value is approximately equal to 15°C (outside temperature over a day can be
modeled as a sinusoidal function.)
6) The air change of the room in normal condition (when windows and doors are closed)
is 0.5 ACH.
7) The walls are made of 15 cm brick (outer side) and 3 cm cellular glass insulation
(inner side), the floor is made of 10 cm concrete, and the roof is made of 10cm brick.
8) The solar radiation intensities on each surface are given in Table 1.
9) There are two male engineers in the room from 9:00 AM to 14:00 PM, and only one
male engineer works in the room from 14:00 PM to 18:00 PM.
10) Three computers are on in the room from 9:00 AM to 18:00 PM.
* Please make assumptions for any missing information and state them clearly in your
report.

Table 1: Solar Radiation (W/m2)
Time North East South West Roof
5 32 71 5 5 9
6 57 472 38 38 119
7 83 651 68 63 286
8 87 679 107 82 454
9 97 606 209 97 595
10 107 457 318 107 704
11 114 252 394 114 772
12 116 126 420 126 795
13 114 114 394 252 772
14 107 107 318 457 704
15 97 97 209 606 595
16 87 82 107 679 454
17 83 63 68 651 286
18 57 38 38 472 119
19 32 5 5 71 9

==========================================================================================

FIT2014
Regular Languages, Context-Free Languages, Lexical analysis, Parsing, Turing machines and Quaternions
DUE: 11:55pm, Friday 7 October 2022
First, read about “Lex, Yacc and the PLUS-TIMES-POWER language” on pp. 7–11.

Problem 1. [2 marks]
Construct prob1.l, as described on pp. 9–11, so that it can be used with plus-times-power.y
to build a parser for PLUS-TIMES-POWER.

Now refer to the document “Quaternions and the language QUAT”, pages 12–14.

Problem 2. [2 marks]
Write a regular expression, using the regular expression syntax used by lex, that matches any
finite decimal representation (of the type specified on p. 12) of a nonnegative real number. Save
it as a file prob2.txt.txt

Problem 3. [7 marks]
Write a Context-Free Grammar for the language QUAT over the fifteen-symbol alphabet
{i, j, k, +, -, *, /, ^, |, (, ), NUMBER,WHOLENUMBER, ROTATION, , }. It can be typed or
hand-written, but must be in PDF format and saved as a file prob3.pdf.pdf.

Now we use regular expressions (in the lex file, prob4.l) and a grammar (in the yacc file,
prob5.y) to construct a lexical analyser (Problem 4) and a parser (Problem 5) for QUAT.

Problem 4. [6 marks]
Using the file provided for PLUS-TIMES-POWER as a starting point, construct a lex file,
prob4.l, and use it to build a lexical analyser for QUAT.

You’ll need to change the regular expressions associated with the NUMBER, WHOLENUM-
BER and some other tokens, among other things.

Sample output:
$ ./a.out
Rotation(120.0,2.0i+2.0j+2.0k)^2 * i / Rotation(120.0,2.0i+2.0j+2.0k)^2

Token: ROTATION; Lexeme: Rotation
Token and Lexeme: (
Token: NUMBER; Lexeme: 120.0
Token and Lexeme: ,
Token: NUMBER; Lexeme: 2.0
Token and Lexeme: i
Token and Lexeme: +
Token: NUMBER; Lexeme: 2.0
Token and Lexeme: j
Token and Lexeme: +
Token: NUMBER; Lexeme: 2.0
Token and Lexeme: k
Token and Lexeme: )
Token and Lexeme: ^
Token: WHOLENUMBER; Lexeme: 2
Token and Lexeme: *
Token and Lexeme: i
Token and Lexeme: /
Token: ROTATION; Lexeme: Rotation
Token and Lexeme: (
Token: NUMBER; Lexeme: 120.0
Token and Lexeme: ,
Token: NUMBER; Lexeme: 2.0
Token and Lexeme: i
Token and Lexeme: +
Token: NUMBER; Lexeme: 2.0
Token and Lexeme: j
Token and Lexeme: +
Token: NUMBER; Lexeme: 2.0
Token and Lexeme: k
Token and Lexeme: )
Token and Lexeme: ^
Token: WHOLENUMBER; Lexeme: 2
Token and Lexeme: <newline>
Control-D

Problem 5. [11 marks]
Make a copy of prob4.l, call it prob5.l, then modify it so that it can be used with yacc.
Then construct a yacc file prob5.y from plus-times-power.y. Then use these lex and yacc
files to build a parser for QUAT.
Note that you do not have to program any of the quaternion functions yourself. They have
already been written: see the Programs section of the yacc file. The actions in your yacc file
will need to call these functions, and you can do that by using the function call for pow(. . . )
in plus-times-power.y as a template.
The core of your task is to write the grammar rules in the Rules section, in yacc format,
with associated actions, using the examples in plus-times-power.y as a guide. You also need
to do some modifications in the Declarations section; see page 10 and further details below.
When entering your grammar into the Rules section of prob5.y, it is best to leave the existing
rules for the nonterminal start unchanged, as this has some extra stuff designed to allow you to
enter a series of separate expressions on separate lines. So, take the Start symbol from your grammar
in Problem 2 and represent it by the nonterminal line instead of by start.
The specific modifications you need to do in the Declarations section should be:
• You need a new %token declaration for the ROTATION token. It has the same structure as the
line for the NUMBER token, except that “num” is replaced by “str” (since ROTATION represents
a string, being a name for a function, whereas NUMBER represents a number).
• For symbols that represent a binary (i.e., two-argument) arithmetic operation, it is worth
including them in an appropriate %left statement. Each of these statements makes the parser
treat these operations as left-associative, which helps it determine the order in which to do the
operations and removes some sources of possible ambiguity. When using %left, operations
having the same precedence are listed on the same line with spaces between them. So for +
and - you can use the following statement:
%left ’-’ ’+’

A similar line can be used for multiplication and division. For operations whose %left state-
ments are on different lines, the operations with higher precedence are those with higher line

numbers (i.e., later in the file). Right-associative operations can be handled similarly with a
%right statement. Treat exponentiation as having higher precedence than multiplication and
division.
• For every nonterminal symbol, you need a %type line that declares its type, i.e., the type of
value that is returned when an expression generated from this nonterminal is evaluated. For
example,
%type <qtn> start here
Here, “qtn” is the type name we are using for quaternions. The various type names can be
seen in the %union statement a little earlier in the file. But you do not need to know how that
works in order to do this task.
• You should still use start as your Start symbol. If you use another name instead, you will
need to modify the %start line too.
Sample output:
$ ./a.out
Rotation(120,2i+2j+2k) * i / Rotation(120,2i+2j+2k)
0.000000 + 0.000000 i + 1.000000 j + 0.000000 k
Control-D

Now refer to the explanation of quaternions and 3D rotations, page 14.
Problem 6. [6 marks]
Convert your eight-digit student ID number into an angle and direction as follows. Let

d1d2d3d4d5d6d7d8

be the digits of your student ID number. Divide this into six single-digit numbers followed
by one two-digit number: d1, d2, d3, d4, d5, d6, d7d8. The point to be rotated
is (d1, d2, d3), which can be represented by the pure quaternion d1i + d2j + d3k. The axis of
rotation is the line whose direction is given by d4i + d5j + d6k. (If this is the zero vector, then
use (d1 +d4)i+d5j +d6k instead.) Then work out the sum of the first digit d1 and the two-digit
number d7d8, and use it for your angle of rotation, θ
◦
.

For example, if your ID number is 12345678, then your point to be rotated is (1, 2, 3), your
axis of rotation is 4i + 5j + 6k, and your angle is 78 + 1 = 79◦
.

(a) Write down the quaternion expression in QUAT that represents the calculation required
to rotate the pointR (d1, d2, d3) by θ degrees clockwise around the axis whose direction is given
by d4i + d5j + d6k.
(Your expression must use the actual numbers derived from your student ID number as
specified, not the algebraic quantities used above.)
Append the string “Hamilton”, out of respect for the person who invented quaternions.
(b) Run your parser on your expression from (a), and report the result of evaluating it.
The answers to (a) and (b) should be copied into a single line each in the file prob6.txt.

Turing machines
Now refer to the description of walks on page 15. Let CW be the language of closed walks using
alphabet {N, S, E, W}.

Problem 7. [8 marks]
Build, in Tuatara, a decider for CW and save it as a file prob7.tm.
There is no restriction on the contents of the output tape at the end of the computation.

Context-Free Languages
Problem 8. [8 marks]
Prove or disprove: The language CW is context-free.
Please mention the Cocke-Younger-Kasami algorithm (but there is no need to demonstrate it).
Your submission can be typed or hand-written, but it must be in PDF format and saved as a
file prob8.pdf.

Lex, Yacc and the PLUS-TIMES-POWER language
In this part of the Assignment, you will use the lexical analyser generator lex, initially by itself,
and then with the parser generator yacc1
.
Some useful references on Lex and Yacc:
• T. Niemann, Lex & Yacc Tutorial, http://epaperpress.com/lexandyacc/
• Doug Brown, John Levine, and Tony Mason, lex and yacc (2nd edn.), O’Reilly, 2012.
• the lex and yacc manpages
We will illustrate the use of these programs with a language PLUS-TIMES-POWER based on
simple arithmetic expressions involving nonnegative integers, using just addition, multiplication and
powers. Then you will use lex and yacc on a language QUAT of expressions based on quaternions,
which we describe later.
PLUS-TIMES-POWER
The language PLUS-TIMES-POWER consists of expressions involving addition, multiplication and
powers of nonnegative integers, without any parentheses (except for those required by the function
Power). Example expressions include:
5 + 8, 8 + 5, 3 + 5 ∗ 2, 13 + 8 ∗ 4 + Power(2,Power(3, 2)), Power(1, 3) + Power(5, 3) + Power(3, 3),
Power(999, 0), 0 + 99 ∗ 0 + 1, 2014, 10 ∗ 14 + 74 + 10 ∗ 13 ∗ 73, 2 ∗ 3 ∗ 5 ∗ 7 ∗ 11 ∗ 13 ∗ 17 ∗ 19.
In these expressions, integers are written in unsigned decimal, with no leading zeros or decimal point
(so 2014, 86, 10, 7, and 0 are ok, but +2014, −2014, 86.0, A, 007, and 00 are not).
For lexical analysis, we treat every nonnegative integer as a lexeme for the token NUMBER.
Lex
An input file to lex is, by convention, given a name ending in .l. Such a file has three parts:
• definitions,
• rules,
• C code.
These are separated by double-percent, %%. Comments begin with /* and end with */. Any
comments are ignored when lex is run on the file.
You will find an input file, plus-times-power.l, among the files for this Assignment. Study
its structure now, identifying the three sections and noticing that various pieces of code have been
commented out. Those pieces of code are not needed yet, but some will be needed later.
We focus mainly on the Rules section, in the middle of the file. It consists of a series of statements
of the form

pattern { action }

where the pattern is a regular expression and the action consists of instructions, written in C,
specifying what to do with text that matches the pattern.
In our file, each pattern represents a set
of possible lexemes which we wish to identify. These are:
1actually, Linux includes more modern implementations of these programs called flex and bison.
2This may seem reminiscent of awk, but note that: the pattern is not delimited by slashes, /. . . /, as in awk; the
action code is in C, whereas in awk the actions are specified in awk’s own language, which has similarities with C but
is not the same; and the action pertains only to the text that matches the pattern, whereas in awk the action pertains
to the entire line in which the matching text is found.

• a decimal representation of a nonnegative integer, represented as described above;
– This is taken to be an instance of the token NUMBER (i.e., a lexeme for that token).
• the specific string Power, which is taken to be an instance of the token POWER.
• certain specific characters: +, *, (, ), and comma;
• the newline character;
• white space, being any sequence of spaces and tabs.
Note that all matching in lex is case-sensitive.
Our action is, in most cases, to print a message saying what token and lexeme have been found.
For white space, we take no action at all. A character that cannot be matched by any pattern yields
an error message.
If you run lex on the file plus-times-power.l, then lex generates the C program lex.yy.c.
This is the source code for the lexical analyser. You compile it using a C compiler such as cc.
For this assignment we use flex, a more modern variant of lex. We generate the lexical analyser
as follows.
$ flex plus-times-power.l
$ cc lex.yy.c
By default, cc puts the executable program in a file usually called a.out4 but sometimes called
a.exe. This can be executed in the usual way, by just entering ./a.out at the command line. If
you prefer to give the executable program another name, such as plus-times-power-lex, then you
can tell this to the compiler using the -o option: cc lex.yy.c -o plus-times-power-lex.
When you run the program, it will initially wait for you to input a line of text to analyse. Do
so, pressing Return at the end of the line. Then the lexical analyser will print, to standard output,
messages showing how it has analysed your input. The printing of these messages is done by the
printf statements from the file plus-times-power.l. Note how it skips over white space, and only
reports on the lexemes and tokens.
$ ./a.out
13+8 * 4 + Power(2,Power (3,2 ))
Token: NUMBER; Lexeme: 13
Token and Lexeme: +
Token: NUMBER; Lexeme: 8
Token and Lexeme: *
Token: NUMBER; Lexeme: 4
Token and Lexeme: +
Token: POWER; Lexeme: Power
Token and Lexeme: (
Token: NUMBER; Lexeme: 2
Token and Lexeme: ,
Token: POWER; Lexeme: Power
Token and Lexeme: (
Token: NUMBER; Lexeme: 3
Token and Lexeme: ,
Token: NUMBER; Lexeme: 2
Token and Lexeme: )
Token and Lexeme: )
Token and Lexeme: <newline>
3The C program will have this same name, lex.yy.c, regardless of the name you gave to the lex input file.
4a.out is short for assembler output.

Try running this program with some input expressions of your own. You can keep entering new
expressions on new lines, and enter Control-D to stop when you are finished.
Yacc
We now turn to parsing, using yacc.
Consider the following grammar for PLUS-TIMES-POWER.

S −→ E
E −→ I
E −→ POWER(E, E)
E −→ E ∗ E
E −→ E + E
I −→ NUMBER

In this grammar, the non-terminals are S, E and I. Treat NUMBER and POWER as just single
tokens, and hence single terminal symbols in this grammar.
We now generate a parser for this grammar, which will also evaluate the expressions, with +, ∗
interpreted as the usual integer arithmetic operations and Power(. . . ,. . . ) interpreted as raising its
first argument to the power of its second argument.
To generate this parser, you need two files, prob1.l (for lex) and plus-times-power.y (for
yacc):
• Change into your problem1 subdirectory and do the following steps in that directory.
• Copy plus-times-power.l to a new file prob1.l, and then modify prob1.l as follows:
– in the Declarations section, uncomment the statement #include "y.tab.h";
– in the Rules section, in each action:
∗ uncomment the statements of the form
· yylval.str = ...;
· yylval.num = ...;
· return TOKENNAME;
· return *yytext;
· yyerror ...
∗ Comment out the printf statements. These may still be handy if debugging is
needed, so don’t delete them altogether, but the lexical analyser’s main role now is
to report the tokens and lexemes to the parser, not to the user.
– in the C code section, comment out the function main(), which in this case occupies
four lines at the end of the file.
• plus-times-power.y, the input file for yacc, is provided for you. You don’t need to modify
this yet.
An input file for yacc is, by convention, given a name ending in .y, and has three parts, very loosely
analogous to the three parts of a lex file but very different in their details and functionality:
• Declarations,
• Rules,
• Programs.

These are separated by double-percent, %%. Comments begin with /* and end with */.
Peruse the provided file plus-times-power.y, identify its main components, and pay particular
attention to the following, since you will need to modify some of them later.
• in the Declarations section:
– lines like
int printQuaternion(Quaternion);
Quaternion newQuaternion(double, double, double, double);
.
.
.
Quaternion rotation(double, Quaternion);
which are declarations of functions (but they are defined later, in the Programs section);5
– declarations of the tokens to be used:
%token <num> NUMBER
%token <iValue> WHOLENUMBER
%token <str> POWER
– some specifications that certain operations are left-associative (which helps determine the
order in which operations are applied and can help resolve conflicts and ambiguities):
%left ’+’
%left ’*’
– declarations of the nonterminal symbols to be used (which don’t need to start with an
upper-case letter):
%type <iValue> start
%type <iValue> line
%type <iValue> expr
%type <iValue> int
– nomination of which nonterminal is the Start symbol:
%start start

• in the Rules section, a list of grammar rules in Backus-Naur Form (BNF), except that the
colon “:” is used instead of →, and there must be a semicolon at the end of each rule. Rules
with a common left-hand-side may be written in the usual compact form, by listing their
right-hand-sides separated by vertical bars, and one semicolon at the very end. The terminals
may be token names, in which case they must be declared in the Declarations section and also
used in the lex file, or single characters enclosed in forward-quote symbols. Each rule has
an action, enclosed in braces {. . . }. A rule for a Start symbol may print output, but most
other rules will have an action of the form $$ = . . . . The special variable $$ represents the
value to be returned for that rule, and in effect specifies how that rule is to be interpreted for
evaluating the expression. The variables $1, $2, . . . refer to the values of the first, second, . . .
symbols in the right-hand side of the rule.
• in the Programs section, various functions, written in C, that your parsers will be able to use.
You do not need to modify these functions, and indeed should not try to do so unless you are
an experienced C programmer and know exactly what you are doing! Most of these functions
are not used yet; some will only be used later, in Problem 4.
After constructing the new lex file prob1.l as above, the parser can be generated by:
$ yacc -d plus-times-power.y
$ flex prob1.l
5These functions for computing with quaternions are not needed by plus-times-power.y, but you will need them
later, when you make a modified version of plus-times-power.y to parse expressions involving quaternions.
$ cc lex.yy.c y.tab.c -lm
The executable program, which is now a parser for PLUS-TIMES-POWER, is again named a.out
by default, and will replace any other program of that name that happened to be sitting in the same
directory.
$ ./a.out
13+8 * 4 + Power(2,Power (3,2 ))
557
13+8*4+Power(2,Power(3,2))
557
Power(1,3)+Power(5,3)+Power(3,3)
153
1+2+3+4+5+6+7+8+9+10
55
10*9*8*7*6*5*4*3*2*1
3628800
Power(999,0)
1
Control-D
Run it with some input expressions of your own. You can keep entering new expressions on new
lines, as above, and enter Control-D to stop when you are finished.

For solutions, purchase a LIVE CHAT plan or contact us

Limited time offer: