# Data Analysis

Essay by   •  December 2, 2010  •  876 Words (4 Pages)  •  1,760 Views

## Essay Preview: Data Analysis

Report this essay
Page 1 of 4

Open the file in MINITAB (LondonCO12n.MTW) and make a histogram of the data. You may need to adjust the intervals (or bins) so that the graph starts at 0 since there can be no data values less than zero. You can do this in MINITAB by adjusting the x-axis as in the tutorial.

(a) Is the distribution of CO levels symmetric, skewed right, or skewed left?

The distribution of CO levels in West London between Jan 1 and Jun 30 in 2000 is skewed right.

(b) Which do you think will be greater, the mean or the median speed? How do you know without calculating them?

The mean would be greater than the median in this scenario. This is known without calculating the mean or the median because with a skewed distribution, the mean is usually farther out in the tail than is the median.

(c) What is the modal class?

The modal class in this scenario is the first bar on the left (0.0 Ð²Ð‚" 0.2) as this interval has the most number of numbers.

(d) Give an approximate data range. Why is it only approximate?

The estimated data range is between 0.0 and 3.0. This is only an approximate as the data presented may not be sensitive enough to show smaller data variances/ increments.

(e) What proportion of CO levels are over 1.5 ppm?

Stem-and-Leaf Display: 12N

Stem-and-leaf of 12N N = 176

Leaf Unit = 0.10

N* = 6

(104) 0 00000000000000111111111111111111111111111111111111111111111111111+

72 0 2222222222222222222222222222222222222222233333333333333

17 0 45555

12 0 666667

6 0 8

5 1

5 1 23

3 1 4

2 1

2 1

2 2

2 2 2

1 2

1 2

1 2

1 3 0

There are two levels that are over 1.5 ppm. As there are 182 samples, the proportion is 2/182 which equals 0.01. Therefore, 1% of the CO levels are over 1.5.

(2) The number of runs scored by a batsman in a series of nine cricket tests were:

14 30 0 10 21 48 23 110 11

(a) Find the mean and median of the data. Which would you think is the most useful measure of centrality in this case? Why?

Descriptive Statistics: C1

Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum

C1 9 0 29.7 11.0 33.1 0.0 10.5 21.0 39.0 110.0

The mean in this case is 29.7 and the median is 21.0. The median is a most useful measure of centrality in this case as this data is skewed. Further, there is an outlier in the data which results in the mean becoming much more sensitive.

...

...