Psychology340: Describing Distributions I (2024)

measures of center

What is a distribution?

    Recall that a variable is a characteristic that can take multiple values. The distribution of a variable is the set of all of the tokens of that variable.

    Consider the final round scores in the 2002 NEC World Golf Championship

    6569686868677071717269
    6971727070717467676869
    7271727272707071717172
    7365717472747075727275
    7170737270707874747173
    7168747370696877727070
    7473706978747369847573

    These are all of the final round scores of the 77 golfers who particpated. In other words, this is the distribution of final round scores.

    It is difficult to get a sense of the overall distribution by just looking at the raw scores. Instead, we use several descriptive statistical methods to summarize, simplify, and describe the distribution.

Three characteristics of distributions

There are 3 characteristics used that completely describe a distribution: shape, central tendency, and variability. We'll be talking about central tendency (roughly, the center of the distribution) and variability (how broad is the distribution) in future chapters.

    Shape

      Skewness and kurtosis are not typically used in psychology except as general descriptions of distributions. So we won't discuss how to compute these statistics numerically.

      In a symmetrical distribution, it is possible to draw a vertical line through the middle so that one side of the distribution is an exact mirror image of the other.

      Psychology340: Describing Distributions I (1)
        Note: in the figures below, the Normal distribution is presented in red for comparison.

      In a skewed distribution, the scores tend to pile up toward one end of the scale and taper off gradually at the other end.

      The section where the scores taper off towards one end of a distribution is called the tail of the distribution.

      Psychology340: Describing Distributions I (2)
      <------ tail points: negatively skewed
      Psychology340: Describing Distributions I (3)
      positively skewed: tail points this way ---->
      A skewed distribution with the tail on the right-hand side is said to be positively skewed (because the tail points towards positive numbers). If the tail points to the left, then the distribution is said to be negatively skewed.

      Kurtosis is a relative measure of the body and tail portions of the distribution.

      Distributions that are "flat" are platykurticPsychology340: Describing Distributions I (4)

      Distributions that are "peaked" are leptokurtic.Psychology340: Describing Distributions I (5)

      In addition to the shapes mentioned above, one should also look for whether a distribution is uni-modal or multi-modal.

      If there are two (or more) clear peaks, then the distribution is bi-modal (or multi-modal if more than two).Psychology340: Describing Distributions I (6)

    Measures of Center

      Central tendency is a statistical measure that identifies a single score as representative of an entire distribution. The goal of central tendency is to find the single score that is most typical or most representative of the entire group.

      We will focus on three measures of central tendency: the mean, the median, and the mode. All are measures of central tendency, but for some distributions, some are more meaningful or appropriate than the others.

    Measures of Variability

      Variability provides a quantitiative measure of the degree to which scores in a distribution are spread out or clustered together.

      In other words variablility refers to the degree of "differentness" of the scores in the distribution. High variability means that the scores differ by a lot, while low variability means that the scores are all similar ("hom*ogeneousness").

      We'll concentrate on three measures of variability, the range, the interquartile range, and the standard deviation.

Graphic and Tabular organizational methods

    1) A frequency distribution tablesis an organized tabulation of the number of individuals located in each category on the scale of measurement.

      what is the range of responses (highest and lowest numbers)? fill in the X column
      how many of each did we get? - fill in the f column - this is the frequency of occurrence

      Notice that if you add up the frequecy column, you get the total number of observations
      S f = N

_____________________________ Xf%c% 8411.3100830098.7820098.7810098.7800098.7790098.77822.698.77711.396.1760094.87533.994.874810.490.97379.180.5721215.671.4711215.655.8701316.940.36979.123.46867.814.36733.96.566002.66522.62.6______________________________77100


If you wanted to know what the total of all of the X's was, how would you do it? The easiest way would be to multiply the (X) & (f) columns and then add (sum) the results.
S (Xf )

    Some additional information.

    Percentages. What percent of the group got this value for X? How do you get this?
    f / N * 100

2) We can also summarize the data with graphs.

    For a histogram, vertical bars are drawn above each score so that 1) the height of the bar corresponds to the frequency, & 2) The width of the bar extends to the real limits of the score. A histogram is used when the data are measured on an interval or a ratio scale.

    Psychology340: Describing Distributions I (7)
    Psychology340: Describing Distributions I (8)
    Psychology340: Describing Distributions I (9)

    For a bar graph, a vertical bar is drawn above each score (or category) so that 1) The height of the bar corresponds to the frequency, & 2) there is a space separating each bar from the next. A bar graph is used when the data are measured on a nominal or an ordinal scale.

    Psychology340: Describing Distributions I (10)


    Stem and leaf displays - These displays break each number down into a lef part called the stem and a right part called the leaf. If numbers are two digits, then the left digit is the stem and the right digit is the leaf. -get a picture and can recover all of the individual data points

 8 | 8 | 4 7 | 555788 7 | 0000000000000111111111111222222222222333333344444444 6 | 557778888889999999 6 | 

Measuring the center of a distribution

There are a number of different measures of center. Which is appropriate largely depends of the kind of variable and the shape of the distributions. So consider these three distributions:

Psychology340: Describing Distributions I (11)Psychology340: Describing Distributions I (12)
Psychology340: Describing Distributions I (13)

Where is the single value that is most representative of the enitre distribution? For first - 5, for second is it 7 or 5 (this one is neg. skewed)for the third, is it 5, nobody is at 5. this one is bi-modal, that is it may be most appropriate to talk about having two middles - more on this in a bit

The most commonly known measure of central tendency is the arithmetic average, or the mean. We've already talked about how you would go about figuring this out from the data in a frequency distribution table.

The mean for a distribution is the sum of the scores divided by the number of scores.

    The formula for the mean is:
    mean = sum of all scores (X's) divided by the total number (N)

    We can think of the mean in a couple of different ways.

      1) the mean is the value that you would give to each individual if everybody were to get equal amounts.
        -e.g., you have a bakesale for your girlscout troop to fund the yearly camping trip. Each girl sells a different amount of goods, but you pool the money together, and then allocate equal portions to each girlscout to spend on supplies for the camping trip.
        - make it more concrete: there are 10 girls in the troop, each sells a bunch of baked goods. Their individual totals are:
        $12, 15, 18, 25, 10, 60, 8, 15, 19, 18
        the total is $200 dollars
        so each girl gets 200/10 = $20 to spend on their camp supplies
      2) the mean can also be thought of as the 'balance' point of the distribution. If you put the observations on an imaginary see-saw (teeter totter) with the mean at the center point, then the two sides of the see-saw should be balanced (that is both sides are off the ground and the see-saw is level)
      Psychology340: Describing Distributions I (14)
    Weighted means

    the weighted means of two (or more) groups is achieved by adding the sums and dividing by the sums of the sample sizes.

    e.g.,

    Psychology340: Describing Distributions I (15)= S X1 + S X2

      n 1 + n 2

    So suppose that I were to decide to make up my grading scale collapsing over all of my sections of stats. If I know that one section (n = 20) had a mean of 5 and the other 6 (n=30) how would I figure out the weighted mean?

    (20)(5) + (30)(6) = 100 + 180 = 5.6 20 + 30 50

    Effects of linear transformations on the mean

      1) if you change a given score, add and observation, delete an observation, then the mean will change.- suppose that one of the girl scouts discovered that she had really made $25 instead of $60. so now the total is 200-35=165 165/10 = $16.50

      2) if you add (or subtract) a constant to each score, then the mean will change by adding that constant.- suppose that you want to factor out the fact that each girl spent $2 buying supplies for the bakesale. So you want to subtract 2 from each amount. Now the total is $180, so the mean is 180/10 = $18. But notice you could have just subtracted $2 from the previous mean of $20 and arrived at the same answer.

      3) if you multiply (or divide) each score by a constant, then the mean will change by being multiplied by that constant.- suppose that the troop sponser agreed to match the money made by each girlscout. That is they agree to give each girl scout an additional amount of money equal to however much they make on the sale. So now the total is $400, and the mean for each girl is 400/10 = $40.

The median is the score that divides a distribution exactly in half. Exactly 50% of the individuals in a distribution have scores at or below the median. The median is equivalent to the 50th percentile.

    So how do we find the median?Let's start by assuming that we have discrete categories.

      1) With an odd number of scores, just list them in order from lowest to highest. The score that is in the middle is the median.
      3, 4, 4, 5, 5, 5, 6, 6, 7 Psychology340: Describing Distributions I (16)

      2) With an even number of scores, just list them in order from lowest to highest. Then find the middle two scores and determine the point exactly midway between them. To do this add them together and divide by two.
      -so what is the median for our girl scouts?

      $8, 10, 12, 15, 15, 18, 18, 19, 25, 60

      middle two are 15 & 18so 15 + 18 = 33 33/2 = 16.5

The final measure of central tendency that we'll consider is the mode.

In a frequency distribution, the mode is the score or category that has the greatest frequency.

    So look at your frequency table or graph and pick the variable that has the highest frequency.
    Psychology340: Describing Distributions I (17) so the mode is 5

    However, be aware that a frequency distribution may have more than one mode.

    Psychology340: Describing Distributions I (18) so the modes are 2 and 8

    if one were bigger than the other it would be called the major mode and the other would be the minor mode

So how do you know which measure of central tendency?

- the answer depends on a number of factors.
    The mean is the most prefered measure, it takes every item in the distribution into account, and it is closely related to measures of variability (which we'll talk about next week). However, there are times when the mean isn't the appropriate measure.

    - You cannot find a mean or median of a nominal scale, however you can find a mode for a nominal scale

    - Use the median if:

      1) there are a few extreme scores in the distribution (skewed distributions with long tails)

      2) there are undetermined values - if for some reason you don't know the value of one (or more) of your items (e.g., the person died before answering your question)

      3) your distributions are 'open-ended' - by this we mean that there is no upper or lower limit on the possible values of your variable (e.g. your top answer on your questionare is '5 or more')

      4) If your data are on an ordinal scale (rankings), then use the median.

How do the shapes of distributions and relate the shapes with our measures of central tendency.
    symmetric distribution
    mean = median = mode
    Psychology340: Describing Distributions I (19)
    positively skewed distribution
    mode < median < mean
    Psychology340: Describing Distributions I (20)
    negatively skewed distribution
    mean < median < mode
    Psychology340: Describing Distributions I (21)
    bimodal distribution
    mean = median, 2 modes
    Psychology340: Describing Distributions I (22)
We will discuss the third characteristic variability (or spread) in the next time.

If you have any questions, please feel free to contact me at jccutti@mail.ilstu.edu.

Psychology340: Describing Distributions I (2024)
Top Articles
Latest Posts
Article information

Author: Sen. Ignacio Ratke

Last Updated:

Views: 5824

Rating: 4.6 / 5 (76 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Sen. Ignacio Ratke

Birthday: 1999-05-27

Address: Apt. 171 8116 Bailey Via, Roberthaven, GA 58289

Phone: +2585395768220

Job: Lead Liaison

Hobby: Lockpicking, LARPing, Lego building, Lapidary, Macrame, Book restoration, Bodybuilding

Introduction: My name is Sen. Ignacio Ratke, I am a adventurous, zealous, outstanding, agreeable, precious, excited, gifted person who loves writing and wants to share my knowledge and understanding with you.