The Nature of Statistics











What is statistics

Types of data

Scales of measurement

Copyright 1993-97 Thomas P. Sturm

What is Statistics



Statistics is:

          The SCIENCE of

              a. COLLECTING,

              b. Classifying / Presenting / Tabulating / Describing, and

              c. INTERPRETING,

          NUMERICAL Data



All three areas will be covered:

          Collecting - Chapter 3

          Describing - Chapters 1 and 2

          Interpreting - bulk of the course



Course Goal:

          To produce good "statistical consumers"

Collecting  Data



Data must be collected with a purpose - to find information about a designated group of people/places/things/events


POPULATION - the collection of ALL objects that are of interest

          - must be carefully defined

          - must be able to determine under all circumstances whether something is in the population or not

                   e.g. employees - current? fired? retired? part-time?


Problem:  It's usually just too expensive (or impossible) to get the information for all objects in a population (a CENSUS)


SAMPLE - a subset of the population used to find information about the entire population

          - more economical

          - with care, can obtain an accurate picture of the population


So, to get information about the population, we take a sample and find information about the things in the sample

Variables  in  Statistics




          An attribute that is relevant for all things in the population (and therefore the sample)

              e.g. height, weight, color, result of casting a die, beauty



          Any characteristic than can be measured for all things in the population

              e.g. height (in inches), weight (in pounds), color (a word), # of spots on a die



          A VALUE for a variable is assigned through a process of MEASUREMENT

                   e.g. use a ruler to MEASURE a VALUE of 6'4" as the OBSERVED height of a basketball player



          values that COULD be obtained

              e.g. 0 to 100% on an exam



          values that are actually obtained in the current instance

              e.g. 97%, 92%, 84%, 63% in a class of 4 students

Types  of  Data





              useful only to place individuals into categories

                   (e.g. Earthlings, Martians)




              a finite set of values

                   e.g. number of students


              an infinite set of values in a bounded range

                   e.g. height of students



But statistics only deals with NUMERICAL data, (and MEASUREMENT assigns a numerical value to a VARIABLE,) so, for QUALITATIVE data, part of the measurement process is to assign a number to each attribute value

                   e.g. SEX - 1=male, 2=female, etc.


Thus, as part of the measurement process, everything gets a number.  But what can you DO with those numbers ???

Scales  of  Measurement



Nominal Scale  (Qualitative data)

          e.g. 1=male, 2=female

          come from qualitative (attribute) data

          can only count how many of each value you have to obtain FREQUENCY data

          cannot sort, add, subtract, multiply, or divide the numbers


Ordinal Scale  (Ordinal data)

          e.g. 1=never, 2=occasionally, 3=frequently, 4=always

          come from a condensation of quantitative data where asking for specific numbers would not be accurate

          can sort in addition to count, 1 < 2

          cannot add, subtract, multiply, or divide the numbers


Interval Scale  (Metric data)

          e.g. temperature in Fahrenheit

          come from quantitative values that are measured against arbitrary starting points

          can subtract in addition to sorting and counting, 24 outside, 72 inside, 48 degrees warmer inside

          cannot add, multiply, or divide the numbers


Ratio Scale  (Metric data)

          e.g. number of courses taken, any FREQUENCY data, rates

          come from quantitative values that have "natural" zeroes

          0 is meaningful, Pat took 6 courses, Chris took 2 courses, Pat took 3 times as many courses as Chris

          can perform all operations




In general, not all of the measurements yield the same value.  This could be because of different measurements of the same thing or measurement of different members of a sample.  This is called VARIATION.


The values of the data have some sort of a DISTRIBUTION which characterizes where in the range of POSSIBLE values the OBSERVED values most frequently fall.


Much of descriptive statistics deals with finding simple ways (perhaps as simple as a single number) of describing the distribution.


Nominal and ordinal data allow the least amount of mathematical manipulation, so the description of nominal and ordinal data is limited to counting the frequencies of the observations (and sorting the observations if on an ordinal scale) and then presenting the counts.