Psychology340: Describing Distributions I (2024)

measures of center

What is a distribution?

Consider the final round scores in the 2002 NEC World Golf Championship

6569686868677071717269
6971727070717467676869
7271727272707071717172
7365717472747075727275
7170737270707874747173
7168747370696877727070
7473706978747369847573

These are all of the final round scores of the 77 golfers who particpated. In other words, this is the distribution of final round scores.

It is difficult to get a sense of the overall distribution by just looking at the raw scores. Instead, we use several descriptive statistical methods to summarize, simplify, and describe the distribution.

Three characteristics of distributions

There are 3 characteristics used that completely describe a distribution: shape, central tendency, and variability. We'll be talking about central tendency (roughly, the center of the distribution) and variability (how broad is the distribution) in future chapters.

Shape

Skewness

kurtosis

psychology

In a symmetrical distribution, it is possible to draw a vertical line through the middle so that one side of the distribution is an exact mirror image of the other.

red

In a skewed distribution, the scores tend to pile up toward one end of the scale and taper off gradually at the other end.

The section where the scores taper off towards one end of a distribution is called the tail of the distribution.

Psychology340: Describing Distributions I (2)

<------ tail points: negatively skewed

Psychology340: Describing Distributions I (3)

positively skewed: tail points this way ---->

positively skewed

negatively skewed

Kurtosis is a relative measure of the body and tail portions of the distribution.

Distributions that are "flat" are platykurtic

Distributions that are "peaked" are leptokurtic.

Measures of Center

Central tendency is a statistical measure that identifies a single score as representative of an entire distribution. The goal of central tendency is to find the single score that is most typical or most representative of the entire group.

We will focus on three measures of central tendency: the mean, the median, and the mode. All are measures of central tendency, but for some distributions, some are more meaningful or appropriate than the others.

Measures of Variability

Variability provides a quantitiative measure of the degree to which scores in a distribution are spread out or clustered together.

In other words variablility refers to the degree of "differentness" of the scores in the distribution. High variability means that the scores differ by a lot, while low variability means that the scores are all similar ("hom*ogeneousness").

We'll concentrate on three measures of variability, the range, the interquartile range, and the standard deviation.

Graphic and Tabular organizational methods

1) A frequency distribution tablesis an organized tabulation of the number of individuals located in each category on the scale of measurement.

Notice that if you add up the frequecy column, you get the total number of observations
S f = N

_____________________________ Xf%c% 8411.3100830098.7820098.7810098.7800098.7790098.77822.698.77711.396.1760094.87533.994.874810.490.97379.180.5721215.671.4711215.655.8701316.940.36979.123.46867.814.36733.96.566002.66522.62.6______________________________77100

If you wanted to know what the total of all of the X's was, how would you do it? The easiest way would be to multiply the (X) & (f) columns and then add (sum) the results.
S (Xf )

Percentages. What percent of the group got this value for X? How do you get this?
f / N * 100

Measuring the center of a distribution

There are a number of different measures of center. Which is appropriate largely depends of the kind of variable and the shape of the distributions. So consider these three distributions:

Where is the single value that is most representative of the enitre distribution? For first - 5, for second is it 7 or 5 (this one is neg. skewed)for the third, is it 5, nobody is at 5. this one is bi-modal, that is it may be most appropriate to talk about having two middles - more on this in a bit

The most commonly known measure of central tendency is the arithmetic average, or the mean. We've already talked about how you would go about figuring this out from the data in a frequency distribution table.

The mean for a distribution is the sum of the scores divided by the number of scores.

The formula for the mean is:
mean = sum of all scores (X's) divided by the total number (N)

We can think of the mean in a couple of different ways.

money

Psychology340: Describing Distributions I (14)

Weighted means

the weighted means of two (or more) groups is achieved by adding the sums and dividing by the sums of the sample sizes.

e.g.,

= S X₁ + S X₂

₁

₂

So suppose that I were to decide to make up my grading scale collapsing over all of my sections of stats. If I know that one section (n = 20) had a mean of 5 and the other 6 (n=30) how would I figure out the weighted mean?

(20)(5) + (30)(6) = 100 + 180 = 5.6 20 + 30 50

Effects of linear transformations on the mean

2) if you add (or subtract) a constant to each score, then the mean will change by adding that constant.- suppose that you want to factor out the fact that each girl spent $2 buying supplies for the bakesale. So you want to subtract 2 from each amount. Now the total is $180, so the mean is 180/10 = $18. But notice you could have just subtracted $2 from the previous mean of $20 and arrived at the same answer.

3) if you multiply (or divide) each score by a constant, then the mean will change by being multiplied by that constant.- suppose that the troop sponser agreed to match the money made by each girlscout. That is they agree to give each girl scout an additional amount of money equal to however much they make on the sale. So now the total is $400, and the mean for each girl is 400/10 = $40.

The median is the score that divides a distribution exactly in half. Exactly 50% of the individuals in a distribution have scores at or below the median. The median is equivalent to the 50th percentile.

So how do we find the median?Let's start by assuming that we have discrete categories.

3, 4, 4, 5, 5, 5, 6, 6, 7

Psychology340: Describing Distributions I (16)

2) With an even number of scores, just list them in order from lowest to highest. Then find the middle two scores and determine the point exactly midway between them. To do this add them together and divide by two.
-so what is the median for our girl scouts?

$8, 10, 12, 15, 15, 18, 18, 19, 25, 60

middle two are 15 & 18so 15 + 18 = 33 33/2 = 16.5

The final measure of central tendency that we'll consider is the mode.

In a frequency distribution, the mode is the score or category that has the greatest frequency.

Psychology340: Describing Distributions I (17)

so the mode is 5

However, be aware that a frequency distribution may have more than one mode.

Psychology340: Describing Distributions I (18)

so the modes are 2 and 8

if one were bigger than the other it would be called the major mode and the other would be the minor mode

So how do you know which measure of central tendency?

- the answer depends on a number of factors.

- You cannot find a mean or median of a nominal scale, however you can find a mode for a nominal scale

- Use the median if:

2) there are undetermined values - if for some reason you don't know the value of one (or more) of your items (e.g., the person died before answering your question)

3) your distributions are 'open-ended' - by this we mean that there is no upper or lower limit on the possible values of your variable (e.g. your top answer on your questionare is '5 or more')

4) If your data are on an ordinal scale (rankings), then use the median.

How do the shapes of distributions and relate the shapes with our measures of central tendency.

symmetric distribution mean = median = mode
positively skewed distribution mode < median < mean
negatively skewed distribution mean < median < mode
bimodal distribution mean = median, 2 modes

We will discuss the third characteristic variability (or spread) in the next time.
If you have any questions, please feel free to contact me at jccutti@mail.ilstu.edu.