Question 1

A)

Step 1 : Establish Hypotheses

Let ₁ be homes, population 1

Let ₂ be units, population 2

H₀ : ₁ = ₂ ₁ – ₂ = 0

H₁ : ₁ ₂ ₁ – ₂ 0

Step 2 : Level of Significance and Sample Size

∝=0.05, n₁ = n₂ = 25

n₁ = n₂ = 25 < 30, therefore central limit theorem does not apply as the sample size is

less than 30. Assume samples are randomly collected and that the population follows a

normal distribution anyway.

Step 3 : Determine the Appropriate Test Statistic

Since ∂₁ and ∂₂ are unknown, use S₁ and S₂ to estimate.

Assuming ∂₁ and ∂₂ are not equal as the sample figures are different, use separate

variance. This is an upper-tail t-test.

Step 4 : Determine the Critical Values and the Rejection Region

2

2

S S

( 1 + 2 )²

n 1 n2

v=

2

2

S 21

S 22

n1

n2

+

n1−1 n2−1 ( ) ( ) 1431.46 1864.8233

+

)²

25

25

v=

1431.46 2 1864.8233 2

25

25

+

24

24

( ( v= ) ( 17384.7738

368.4429 ) v = 47.1844 Degree of freedom = 47

t-critical = t ∝=0.05,47

t-critical = 1.6779

∴ Reject H₀ if t-stat > 1.6779

Step 5 : Compute t-stat

xx ₁ = 154, n₁ = 25, S₁ =37.8346

xx ₂ = 126.36, n₂ = 25, S₂ = 43.1836 ( ´x1− x´ 2 ) −( μ1−μ 2) √ t-stat = S 21 S 22

+

n1 n2 ( 154−126.36 )−0

t-stat = √ 1431.46 1864.8233

+

25

25 t-stat = 2.4071

Step 6 : Statistical Decision & Conclusion

∴ Reject H₀ at a 5% level of significance as (t-stat) 2.4071 > 1.6779 (t-critical). There is

sufficient evidence to prove that homes (population 1) consume more electricity than

units (population 2). B)

Step 1: Establish Hypotheses

Let the electricity consumption rate of homes µ1, and the electricity consumption rate of

units µ2.

H 0 : μ 1=μ 2 → μ 1−μ 2=0

H 1 : μ 1> μ 2→ μ 1−μ 2>0 Step 2: Statistical Test

As the sample sizes are less than 30, n (house) =25 and n (unit) =25, (n ≤30) and the population standard deviations are unknown, the t-test formula

σ1 ≠ σ2

) is the appropriate test statistic. The underlying assumption is the data is

¿

2 2 normally distributed.

t= ( ´x1− ´x2 ) −( μ1 −μ 2) √ s21 s22

+

n1 n2 df = s 21 s22 2

[ + ]

n 1 n2

s21 2

s2 2

) ( 2)

n1

n

+ 2

n 1−1 n2−1

( Step 3: Value of Alpha (type 1 error rate)

The significant level is α=0.05 Step 4: Decision Rule t 0.05,32 =1.6939 As the test is one-tailed and α=0.05, it’s in the right tail of the distribution. The critical

t 0.05,32=1.6939 can be obtained from t-table. If the computed t-statistic is greater than

1.6939, reject the null hypothesis, otherwise do not reject.

Step 5: Gather Data

HOMES

n1=25 UNITS

n2=25 ´x 1=1.0680 ´x 2=1.1061 s 1=0.0647 s 1=0.16 s 21=0.0042 s 22=0.0255 Step 6: Analyse Data

*t-test: Two-sample Assuming Unequal Variances Standard deviation Homes: SQRT(0.0042)=0,0647, Units: SQRT(0.0255)=0.16

( 1.0680−1.1061 )−( 0 )

t stat =

=−1.1058 ( 4 decimal place )

0.0647

0.0255

The output indicates that

+

25

25 √ Therefore, T-statistics is -1.1058

Step 7: statistical conclusion:

As illustrated in the hypothesis graph, |T stat|=1.1058 is less than the t-critical value 1.6939, thus the null hypothesis should not be rejected. Thus at a 5% level of

significance, there is insufficient evidence to suggest that homes consume more

electricity per square metre than units. C)

The hypothesis testing in Q1a) suggests that at a 5% level of significance, homes do

indeed consume more electricity than units. However, this test does not factor into

account the living space in square metres. The tests are standardised in Q1b), where

the hypothesis that homes consume more electricity per square metre of size is tested.

Within this test, the null hypothesis that the electricity consumption of homes per square

metre was less than or equal to the consumption of units could not be rejected at a 5%

level of significance. As such, since the electricity consumption rate of homes is lower than that of units, the statement that homes consume more electricity than units is false

as the size of the living spaces must be taken into account. Question 2

A) The estimated regression equation provides the predicted value of the dependent

variable (electricity consumption), when given a specific value for the independent

variable (size in squared meters). As such, the general form of the estimated regression

equation appears as:

Y^ i=b 0+ b1 X i

Where:

– Y^ i is the predicted value of the dependent variable for occurrence “i”. – b0 is the value of the dependent variable when the independent variable is – equal to zero.

b1 is the slope coefficient of the relationship between the dependent and – independent variables.

X i is the chosen value of the independent variable. From the output above, the values of b0 and b1 can be found;

bo =0.5350 (4 decimal points)

– b1=1.0818 (4 decimal points) As such, the estimated regression equation can be determined to be:

Y^ i=0.5350+(1.0818 × X i )

However, the real value of the dependent variable may differ from its predicted value for

that value of the independent variable, and this variation is known as the random error component. As such, the real values for the electricity consumption when given a value

of size in squared metres will follow the equation:

Y i=0.5350+ ( 1.0818× X i ) + ε i

Where:

– εi is the random error term of the regression equation; the amount by which the real value of the dependent variable differs from the predicted value for any

given occurrence. B)

The R2 value is the coefficient of determination, and is a measure of the distance

between observed data values and the predicted relationship line. The value of R 2 is

given by:

SSR

2

R=

SST

Where:

– SSR is the sum of squares due to regression

– SST is the total sum of squares

As such, the value is a percentage ratio of the variation explained by the regression

model over the total variation in the relationship. The R 2 value for the data set is 0.9066

(4 decimal points), or 90.66%. This indicates that the model explains 90.66% of the

variability of the observed data values around the mean. As such, the R 2 for the data set

estimates that the relationship between these variables is very strong, as a majority of

the variation in the data values can be explained by the predicted model, with only

9.34% of the variation being unexplained by the model. C)

Residual Plot

30

20

10 Residuals 0

40

-10 60 80 100 120 140 160 180 200 220 -20

-30 Size (Squared Meters) Assumption 1 : Linearity

Use t-test to determine whether there is a linear relationship between the dependent

and independent variables.

H 0 : β 1=0 (no linear relationship)

H 1: β1≠ 0 (linear relationship exists) Where : – β 1 is the hypothesised sloped.

H 0 is the null hypothesis assuming there is no linear relationship. – H 1 is the alternative hypothesis assuming linear relationship does exist. – t stat = b 1−β 1

Sb 1 d . f .=n−2

where :

– b1 = Regression slope coefficient – β 1 = Hypothesised slope – S b 1 = Standard error of the slope – ¿ 1.08177−0

=21.5815

0.050125 Reject H₀ as there is sufficient evidence to say that the size of the building significantly

affects electricity consumption. ( β 1 ≠ 0 ¿ Aside from using the hypothesis test, the residual plot also shows that the residuals are

scattered linearly and randomly. Thus, the relationship between the dependent and

independent variables are linear.

Assumption 2 : Independence of Errors

The residual plot does not exhibit a certain pattern and the residuals are plotted

randomly. Therefore, the error values are statistically independent.

Assumption 3 : Normality of Error Normal Probability Plot

250

200

150 Electricity Consumption (KW / Week) 100

50

0 0 20 40 60 80 100 120 Sample Percentile From the normal probability plot above, normal error is approximately displayed in a

straight line. This proves that the error values are normally distributed for any given

values of x. Assumption 4 : Equal Variance

Referring to the residual plot above, the residuals are scattered evenly; therefore, the

probability distribution of the errors is said to have constant variance. D)

Given a particular value for Xi, the confidence interval estimate for an individual value of

Y is:

Y^ ±t α / 2 SYX √ 1+ hi

Where:

– Y^ – t α /2 is the t-value consistent with the chosen level of confidence (and d.f. = n-2) – S YX is the standard error of the estimate, given by the formula: S YX = √ – is the predicted value for Y. SSE

n−2 hi is equal to:

2 1 ( X i− X´ )

hi= +

n

SSX As per the excel output from Q2a), the standard error (S YX) has been calculated as

13.1224 (4 decimal points).

Additionally, the predicted value of Y was given in Q2a) from the regression equation:

Y^ i=0.5350+(1.0818 × X i )

Y^ i=0.5350+(1.0818 ×180)

Hence, for a value X i=180 ,

^

Therefore Y i=195.2535 (4 decimal points).

For the value of hi , SSX is the sum of the squared deviations of the independent variable from its mean. Thus it is given by:

n SSX =∑ ( X i− X´ ) 2 i=1 This can be calculated in excel as

has been calculated as 129. SSX =68535.9200 (4 decimal points), where X´ hi Thus the value for (to 4 decimal points) is: 2 hi= 1 ( 180−129 )

+

=0.0580

50 68535.9200 Since the chosen level of significance is 90%, the values of t α /2 (to 4 decimal points) are given by:

t 0.05,48=±1.6772

Thus, the prediction interval for the value of Yi is as follows: 195.2535± 1.6772 ×13.1224 √ 1.0580=193.479 0 , 197.0280

Hence, it can be said with 90% confidence that for a building with 180 square metres of

size, the electricity consumption in kilowatts per hour lies between 193.4790 and

197.0280.