The Nature of Statistics

What is statistics

Types of data

Scales of measurement

Copyright 1993-97 Thomas P. Sturm

What is Statistics

Statistics is:

The SCIENCE of

a. COLLECTING,

b. Classifying / Presenting / Tabulating / Describing, and

c. INTERPRETING,

NUMERICAL Data

All three areas will be covered:

Collecting - Chapter 3

Describing - Chapters 1 and 2

Interpreting - bulk of the course

Course Goal:

To produce good "statistical consumers"

Collecting Data

Data must be collected with a purpose - to find information about a designated group of people/places/things/events

POPULATION - the collection of ALL objects that are of interest

- must be carefully defined

- must be able to determine under all circumstances whether something is in the population or not

e.g. employees - current? fired? retired? part-time?

Problem: It's usually just too expensive (or impossible) to get the information for all objects in a population (a CENSUS)

SAMPLE - a subset of the population used to find information about the entire population

- more economical

- with care, can obtain an accurate picture of the population

So, to get information about the population, we take a sample and find information about the things in the sample

Variables in Statistics

PROPERTY

An attribute that is relevant for all things in the population (and therefore the sample)

e.g. height, weight, color, result of casting a die, beauty

VARIABLE

Any characteristic than can be measured for all things in the population

e.g. height (in inches), weight (in pounds), color (a word), # of spots on a die

OBSERVATION

A VALUE for a variable is assigned through a process of MEASUREMENT

e.g. use a ruler to MEASURE a VALUE of 6'4" as the OBSERVED height of a basketball player

POSSIBLE VALUES

values that COULD be obtained

e.g. 0 to 100% on an exam

OBSERVED VALUES

values that are actually obtained in the current instance

e.g. 97%, 92%, 84%, 63% in a class of 4 students

Types of Data

QUALITATIVE

ATTRIBUTE or CATEGORICAL data

useful only to place individuals into categories

(e.g. Earthlings, Martians)

QUANTITATIVE

DISCRETE

a finite set of values

e.g. number of students

CONTINUOUS

an infinite set of values in a bounded range

e.g. height of students

But statistics only deals with NUMERICAL data, (and MEASUREMENT assigns a numerical value to a VARIABLE,) so, for QUALITATIVE data, part of the measurement process is to assign a number to each attribute value

e.g. SEX - 1=male, 2=female, etc.

Thus, as part of the measurement process, everything gets a number. But what can you DO with those numbers ???

Scales of Measurement

Nominal Scale (Qualitative data)

e.g. 1=male, 2=female

come from qualitative (attribute) data

can only count how many of each value you have to obtain FREQUENCY data

cannot sort, add, subtract, multiply, or divide the numbers

Ordinal Scale (Ordinal data)

e.g. 1=never, 2=occasionally, 3=frequently, 4=always

come from a condensation of quantitative data where asking for specific numbers would not be accurate

can sort in addition to count, 1 < 2

cannot add, subtract, multiply, or divide the numbers

Interval Scale (Metric data)

e.g. temperature in Fahrenheit

come from quantitative values that are measured against arbitrary starting points

can subtract in addition to sorting and counting, 24 outside, 72 inside, 48 degrees warmer inside

cannot add, multiply, or divide the numbers

Ratio Scale (Metric data)

e.g. number of courses taken, any FREQUENCY data, rates

come from quantitative values that have "natural" zeroes

0 is meaningful, Pat took 6 courses, Chris took 2 courses, Pat took 3 times as many courses as Chris

can perform all operations

Distribution/Variation

In general, not all of the measurements yield the same value. This could be because of different measurements of the same thing or measurement of different members of a sample. This is called VARIATION.

The values of the data have some sort of a DISTRIBUTION which characterizes where in the range of POSSIBLE values the OBSERVED values most frequently fall.

Much of descriptive statistics deals with finding simple ways (perhaps as simple as a single number) of describing the distribution.

Nominal and ordinal data allow the least amount of mathematical manipulation, so the description of nominal and ordinal data is limited to counting the frequencies of the observations (and sorting the observations if on an ordinal scale) and then presenting the counts.