The Nature of Statistics

 

 

 

 

 

 

 

 

 

 

What is statistics

Types of data

Scales of measurement

Copyright 1993-97 Thomas P. Sturm


What is Statistics

 

 

Statistics is:

          The SCIENCE of

              a. COLLECTING,

              b. Classifying / Presenting / Tabulating / Describing, and

              c. INTERPRETING,

          NUMERICAL Data

 

 

All three areas will be covered:

          Collecting - Chapter 3

          Describing - Chapters 1 and 2

          Interpreting - bulk of the course

 

 

Course Goal:

          To produce good "statistical consumers"


Collecting  Data

 

 

Data must be collected with a purpose - to find information about a designated group of people/places/things/events

 

POPULATION - the collection of ALL objects that are of interest

          - must be carefully defined

          - must be able to determine under all circumstances whether something is in the population or not

                   e.g. employees - current? fired? retired? part-time?

 

Problem:  It's usually just too expensive (or impossible) to get the information for all objects in a population (a CENSUS)

 

SAMPLE - a subset of the population used to find information about the entire population

          - more economical

          - with care, can obtain an accurate picture of the population

 

So, to get information about the population, we take a sample and find information about the things in the sample


Variables  in  Statistics

 

 

PROPERTY

          An attribute that is relevant for all things in the population (and therefore the sample)

              e.g. height, weight, color, result of casting a die, beauty

 

VARIABLE

          Any characteristic than can be measured for all things in the population

              e.g. height (in inches), weight (in pounds), color (a word), # of spots on a die

 

OBSERVATION

          A VALUE for a variable is assigned through a process of MEASUREMENT

                   e.g. use a ruler to MEASURE a VALUE of 6'4" as the OBSERVED height of a basketball player

 

POSSIBLE VALUES

          values that COULD be obtained

              e.g. 0 to 100% on an exam

 

OBSERVED VALUES

          values that are actually obtained in the current instance

              e.g. 97%, 92%, 84%, 63% in a class of 4 students


Types  of  Data

 

 

QUALITATIVE

          ATTRIBUTE or CATEGORICAL data

              useful only to place individuals into categories

                   (e.g. Earthlings, Martians)

 

QUANTITATIVE

          DISCRETE

              a finite set of values

                   e.g. number of students

          CONTINUOUS

              an infinite set of values in a bounded range

                   e.g. height of students

 

 

But statistics only deals with NUMERICAL data, (and MEASUREMENT assigns a numerical value to a VARIABLE,) so, for QUALITATIVE data, part of the measurement process is to assign a number to each attribute value

                   e.g. SEX - 1=male, 2=female, etc.

 

Thus, as part of the measurement process, everything gets a number.  But what can you DO with those numbers ???


Scales  of  Measurement

 

 

Nominal Scale  (Qualitative data)

          e.g. 1=male, 2=female

          come from qualitative (attribute) data

          can only count how many of each value you have to obtain FREQUENCY data

          cannot sort, add, subtract, multiply, or divide the numbers

 

Ordinal Scale  (Ordinal data)

          e.g. 1=never, 2=occasionally, 3=frequently, 4=always

          come from a condensation of quantitative data where asking for specific numbers would not be accurate

          can sort in addition to count, 1 < 2

          cannot add, subtract, multiply, or divide the numbers

 

Interval Scale  (Metric data)

          e.g. temperature in Fahrenheit

          come from quantitative values that are measured against arbitrary starting points

          can subtract in addition to sorting and counting, 24 outside, 72 inside, 48 degrees warmer inside

          cannot add, multiply, or divide the numbers

 

Ratio Scale  (Metric data)

          e.g. number of courses taken, any FREQUENCY data, rates

          come from quantitative values that have "natural" zeroes

          0 is meaningful, Pat took 6 courses, Chris took 2 courses, Pat took 3 times as many courses as Chris

          can perform all operations


Distribution/Variation

 

 

In general, not all of the measurements yield the same value.  This could be because of different measurements of the same thing or measurement of different members of a sample.  This is called VARIATION.

 

The values of the data have some sort of a DISTRIBUTION which characterizes where in the range of POSSIBLE values the OBSERVED values most frequently fall.

 

Much of descriptive statistics deals with finding simple ways (perhaps as simple as a single number) of describing the distribution.

 

Nominal and ordinal data allow the least amount of mathematical manipulation, so the description of nominal and ordinal data is limited to counting the frequencies of the observations (and sorting the observations if on an ordinal scale) and then presenting the counts.