• Subject Name : Statistics

Statistical Analysis Research Report

Table of Contents

Introduction

Variable Selection

Descriptive Statistics

Missing Data Handling

Graphical Data Visulization

Non Parameteric Testing for Nominal Data

Correltaion Analysis

Introduction to 

The information set given for this errand has been drawn from the freely accessible PISA information (the OECD's Program for Worldwide Understudy Evaluation), in this case the ‘school survey’, ordinarily completed by each test school’s central.

Part 1 Variables Under Consideration

Let us consider these 3 variables into account:

  1. Total No. of interactive whiteboards in the school altogether
  2. Total No. of data projectors in the school altogether
  3. Total No. of computers with internet connection available for teachers in the school.

These 3 variables are numeric scale type as far as datatype is concerned. We are also interested in knowing if there exists any kind of significant relationship between the variables. We are interested in knowing the summary descriptive statistics of these variables along with the distribution of the data.

Frequency count:

Statistics

 

Total No. of interactive whiteboards in the school altogether

Total No. of data projectors in the school altogether

Total No. of computers with internet connection available for teachers in the school.

N

Valid

677

675

678

Missing

108

110

107

 Here the missing data indicates the blank responses given by the correspondent along with few that have indicated as system missing. Both these data’s together constitute the missing values.

Issues with Missing Data:

The concept of lost values is vital to get it in arrange to effectively oversee information. In case the lost values are not taken care of appropriately by the analyst, at that point he/she may conclusion up drawing an wrong deduction almost the information. Due to dishonorable dealing with, the result gotten by the analyst will contrast from ones where the lost values are show.

Ways of Handling Missing data:

The analyst may take off the information or do information ascription to supplant the them. Assume the number of cases of lost values is greatly little; at that point, a master analyst may drop or overlook those values from the investigation. In measurable dialect, on the off chance that the number of the cases is less than 5% of the test, at that point the analyst can drop them. In the case of multivariate investigation, on the off chance that there's a bigger number of lost values, at that point it can be superior to drop those cases (instead of do ascription) and supplant them. On the other hand, in univariate examination, imputation can diminish the sum of inclination within the information, on the off chance that the values are lost at random.

Overall summary Descriptive Statistics

 
 

N

Range

Minimum

Maximum

Mean

Std. Deviation

Variance

Skewness

Kurtosis

Statistic

Statistic

Statistic

Statistic

Statistic

Statistic

Statistic

Statistic

Std. Error

Statistic

Std. Error

Total No. of data projectors in the school altogether

675

200

0

200

38.30

28.821

830.675

1.351

.094

3.794

.188

Total No. of interactive whiteboards in the school altogether

677

129

0

129

18.71

19.267

371.234

1.650

.094

3.805

.188

Total No. of computers with internet connection available for teachers in the school.

678

3300

0

3300

128.93

223.660

50023.785

8.477

.094

94.697

.187

Valid N (listwise)

672

                   

In our case SPSS by itself considers only the valid data as part of the analysis. Here we have chosen to not impute the missing values and hence SPSS itself only takes the valid observations into consideration while deriving the descriptive statistics for the variables under consideration. The average number of data projectors in the school altogether is around 29 while the average interactive whiteboards in the school accounts to 19. The average number of computers with internet connections available to teachers happens to be approximately equal to 129.

We observe that for all the three variables the skewness values are greater than1. Hence we can conclude that variables are all highly skewed variables. The kurtosis values for all the three variables indicate to be greater than 3. Hence, we conclude that the variables are leptokurtic in nature. This means the tails are longer and flatter with the central peaks higher and sharper.

Relationships Between the Variables

Correlations

 

Total No. of interactive whiteboards in the school altogether

Total No. of data projectors in the school altogether

Total No. of computers with internet connection available for teachers in the school.

Total No. of interactive whiteboards in the school altogether

Pearson Correlation

1

.222**

.055

Sig. (2-tailed)

 

.000

.153

N

677

673

675

Total No. of data projectors in the school altogether

Pearson Correlation

.222**

1

.328**

Sig. (2-tailed)

.000

 

.000

N

673

675

674

Total No. of computers with internet connection available for teachers in the school.

Pearson Correlation

.055

.328**

1

Sig. (2-tailed)

.153

.000

 

N

675

674

678

**. Correlation is significant at the 0.01 level (2-tailed).

If we look at the correlation table we observe that the total number of interactive whiteboards in the school and the total number of projectors in the school are related linearly. Around 22% of linear relationship exists and statistically the p_value indicates significance as p_value is <0.05.

Similarly there happens to be a linear relationship between the data projectors and computers connected with the internet. The linear relationship happens to be 32.8% and p_value indicates that this relationship too is significant.

The QQ plots in order to check for the variable Distribution:

Case Processing Summary

 

Total No. of interactive whiteboards in the school altogether

Total No. of data projectors in the school altogether

Total No. of computers with internet connection available for teachers in the school.

Series or Sequence Length

785

785

785

Number of Missing Values in the Plot

User-Missing

26

28

25

System-Missing

82

82

82

The cases are unweighted.

 

Estimated Distribution Parameters

 

Total No. of interactive whiteboards in the school altogether

Total No. of data projectors in the school altogether

Total No. of computers with internet connection available for teachers in the school.

Normal Distribution

Location

18.71

38.30

128.93

Scale

19.267

28.821

223.660

The cases are unweighted.

Let us consider a variable which has a nominal datatype. In this particular dataset let us particularly look at the variable addressing the following question

Which of the following definitions best describes the community in which your school is located? Varaile name: (SC001C01TA_AU)

Frequency Distribution

Which of the following definitions best describes the community in which your school is located?

 

Frequency

Percent

Valid Percent

Cumulative Percent

Valid

A small rural community (with fewer than 1 000 people)

15

1.9

2.2

2.2

A small country town (1 000 to about 3 000 people)

35

4.5

5.1

7.2

A medium-sized country town (3 000 to about 15 000 people)

76

9.7

11.0

18.2

A larger town (15 000 to about 50 000 people)

71

9.0

10.2

28.4

A very large town (50 000 to about 100 000 people)

57

7.3

8.2

36.7

A city (100 000 to about 1 000 000 people)

178

22.7

25.7

62.3

Close to the centre of a very large city (with over 1 000 000 people)

126

16.1

18.2

80.5

Elsewhere in a very large city (with over 1 000 000 people)

135

17.2

19.5

100.0

Total

693

88.3

100.0

 

Missing

No Response

18

2.3

   

System

74

9.4

   

Total

92

11.7

   

Total

785

100.0

   

Here we observe that the total number of invalid responses or missing values is around 92 which accounts to 11% in comparison to the entire data set. The frequency indicates the school is closely located and accessible to the city and large city premises.

Graphical description of the Variable:

We would like to see the relation between the community in which the school is located with the Geographical location of the school. In order to find this relation we look towards constructing cross tables and get details about the distribution of the data.

 

Which of the following definitions best describes the community in which your school is located?

Total

A small rural community (with fewer than 1 000 people)

A small country town (1 000 to about 3 000 people)

A medium-sized country town (3 000 to about 15 000 people)

A larger town (15 000 to about 50 000 people)

A very large town (50 000 to about 100 000 people)

A city (100 000 to about 1 000 000 people)

Close to the centre of a very large city (with over 1 000 000 people)

Elsewhere in a very large city (with over 1 000 000 people)

Geographic location of school (major categories)

Metropolitan

0

2

12

18

24

162

123

135

476

 

Provincial

6

22

58

46

33

14

2

0

181

 

Remote

6

8

6

6

0

1

0

0

27

Total

12

32

76

70

57

177

125

135

684

We would like to see if these 2 variables are significantly related or not. In order to do this we have conducted the chi-square testing

Chi-Square Tests

 

Value

df

Asymp. Sig. (2-sided)

Pearson Chi-Square

499.390a

14

.000

Likelihood Ratio

503.241

14

.000

Linear-by-Linear Association

370.938

1

.000

N of Valid Cases

684

   

a. 7 cells (29.2%) have expected count less than 5. The minimum expected count is .47.

The chi-square test is significant as we see the p_value is <0.05. This indicates both these variables are significantly related to each other.

References for Applied Nonparametric Statistical Methods

Sprent, P. (1989), Applied Nonparametric Statistical Methods(Second ed.), Chapman & Hall

Corder, G. W.; Foreman, D. I. (2014). Nonparametric Statistics: A Step-by-Step Approach. Wiley

Hollander M., Wolfe D.A., Chicken E. (2014). Nonparametric Statistical Methods, John Wiley & Sons

Remember, at the center of any academic work, lies clarity and evidence. Should you need further assistance, do look up to our Statistics Assignment Help

Get It Done! Today

  • 1,212,718Orders

  • 4.9/5Rating

  • 5,063Experts

Highlights

  • 21 Step Quality Check
  • 2000+ Ph.D Experts
  • Live Expert Sessions
  • Dedicated App
  • Earn while you Learn with us
  • Confidentiality Agreement
  • Money Back Guarantee
  • Customer Feedback

Just Pay for your Assignment

  • Turnitin Report

    $10.00
  • Proofreading and Editing

    $9.00Per Page
  • Consultation with Expert

    $35.00Per Hour
  • Live Session 1-on-1

    $40.00Per 30 min.
  • Quality Check

    $25.00
  • Total

    Free
  • Let's Start

Get
500 Words Free
on your assignment today

Browse across 1 Million Assignment Samples for Free

Explore MASS
Order Now

My Assignment Services- Whatsapp Tap to ChatGet instant assignment help

Collect Chat

refresh