Notes
Data
Applets
Examples

OnLine Help
New User
User's Guide
References

Notes on Topic 1:
Basic Statistical Concepts

    Statistics, Science, and Observations

       
      Science
      Science is based on the empirical method for making observations - for systematically obtaining information. It consists of methods for making observations.

      Observations
      Observations are the basic empirical "stuff" of science.

      Statistics
      Statistics is a set of methods and rules for organizing, summarizing and interpreting information.

      The methods and rules enable scientific researchers to describe and analyze the observations they have made. Statistical methods are tools for science.

      Science consists of methods for making observations;
      Statistics consists of methods for describing and analyzing the observations.

      Here are some of the "observations" we gathered in the survey we did on the first day of class in 1997 and 1998.

      Populations & Samples

      Populations
      A population is the set of all individuals of interest in a particular study. We will also refer to populations of scores.

      Samples
      A sample is a set of individuals selected from a population, usually intended to represent the population in a study. We will also refer to samples of scores.

      The data we gathered in class are a "sample" of scores obtained with a sample of individuals. The population we sampled from is the population of UNC undergraduates.

      Parameters
      A Parameter is a value, usually a numerical value, that describes a Population. A Parameter may be obtained from a single measurement, or it may be derived from a set of measurements from the Population.

      Statistics
      A Statistic is a value, usually a numerical value, that describes a Sample. A Statistic may be obtained from a single measurement, or it may be derived from a set of measurements from the Sample.

      Here are some "statistics" computed from our sample of data:

      Data
      Data (plural) are measurements or observations. A data set is a collection of measurements or observations. A datum (singular) is a single measurement or observation and is commonly called a data-value, a score, or a raw score.

      Descriptive Statistics
      Descriptive Statistics are statistical procedures used to summarize, organize and simplify data. It is also the branch of statistical activity focusing on the use of such procedures. These procedures are the focus of chapters 1 through 5.

      Statistical Visualization
      Recently developed computational statistical procedures used to visually summarize, organize and simplify data. The statistical system we are using is named ViSta for "Visual Statistics", because it includes statistical visualiation.

      A statistical visualization of our data is shown below. It shows the relationship between GPA and Satisfaction with the UNC experience. Higher satisfaction is associated with higher GPA.

      Exploratory Statistics
      The process of exploring data by using descriptive and visualization methods to "see what the data seem to say". The branch of statistics that focuses on "seeing what the data seem to say" (Tukey, 19??).

      Inferential Statistics
      Inferential Statistics consist of techniques that allow us to study samples and then to make generalizations about the populations from which the samples were selected. It is also the branch of statistical activity focusing on the use of such procedures. These procedures are the focus of chapters 8 through the remainder of the text. The groundwork for statistical inference is laid in chapters 6 and 7.

      Sampling Error
      Sampling error is the discrepency, or amount of error, that exists between a sample statistic and the corresponding population parameter.

      The Scientific Method and the Design of Experiments

      Science attempts to discover orderliness in the universe - to discover regularity in changes. Something that can change is called a variable.

      Variables
      A variable is a characteristic or condition that changes or has different values for different individuals. In the data we gathered, the variables include "Gender", "Age", etc.

      A constant is a characteristic or condition that does not vary, and is the same for every individual.

      The Correlational Method
      The scientific method in which two (or more) variables are observed without manipulation (i.e., as they exist naturally) to see if there is any relationship between them.

      The correlational method cannot establish cause-and-effect: Correlation is not causation!

      The data we gathered are an example of the correlational method. We can say that "Higher satisfaction is associated with higher GPA", but we can't say that "Higher GPA causes higher satisfaction" (or the converse).

      The Experimental Method
      The scientific method which can establish a cause-and-effect relationship between two (or more) variables. Some important points:
      1. The researcher manipulates one variable and observes what happens on the other. More than one variable may be manipulated or observed.
      2. To correctly establish cause-and-effect, the researcher must exercise some control over the experimental situation to ensure that some other variable(s) do(es) not influence the relationship being watched.
      3. Random Assignment can be used to eliminate other variables' influence on results.
      4. The experimental conditions must be identical, other than differing on values of the manipulated variable.

      Independent Variable (also called the predictor variable)
      The variable which is manipulated by the researcher.
      Dependent Variable (also called the response variable)
      The variable which is observed by the researcher for changes in order to access the effect of the treatment. (The treatment is the manipulation of the predictor variable).
      Confounding Variable
      An uncontrolled variable that is unintentionally allowed to vary systematically with the independent variable. Confounds the results (bad, bad, bad!).

      The control group
      This is a condition of the independent variable that does not receive the experimental treatment. Usually, the control group receives either no treatment or a placebo treatment.
      The experimental group
      This is a condition of the independent variable that does receive an experimental treatment. There may be several experimental groups.

      The Quasi-Experimental Method
      Examines differences between pre-existing groups of subjects (such as men vs. women) or differences between groups of scores obtained at different times (before and after treatment).

      Hypotheses
      A hypothesis is a prediction about the outcome of an experiment. In experimental research, a hypothesis makes a prediction about how the manipulation of the independent (predictor) variable will affect the dependent (response) variable.

      Measurement

      Data are measurements of observations which involve categorizing, ordering or using number to characterize amount. Several levels of measurement are involved. These in turn determine what statistics can be computed. Measurements may also be discrete or continuous.

         

      1. Scales (Levels) of Measurement
      2. Nominal
        The nominal level of measurement labels observations so that they fall into different categories. Football jersey numbers and home street addresses are common examples.

        In ViSta, nominal variables are called "Category" variables.

        Ordinal
        The ordinal level of measurement consists of categories that are ordered in a sequence. Order of finish in a race is a common example.

        In ViSta, ordinal variables are called "Ordinal" variables.

        Interval
        The interval level of measurement consists of ordered categories where all of the categories are intervals of exactly the same size. Temperature is a common example. Here, equal differences between numbers reflect equal differences in magnitude of the observed variable.

        Ratio
        The ratio level of measurement is an interval scale with an absolute zero point. Length and weight are common examples. Here, ratio of numbers reflect ratios of variable magnitude.

        In ViSta, interval and ratio variables are called "Numeric" variables.

      3. Discrete and Continuous Variables
      4. Discrete
        A discrete variable has separate, indivisible categories. No values can exist in between two neighboring categories.
        Continuous
        A continuous variable has an infinite number of possible values falling between any two observed values.

      Mathematical Notation

      In statistical calculations you will constantly be required to add a set of values to find a specific total. We use algebraic expressions to represent the values being added. For example
      X means "Scores on a Variable.
      For example X = [1 2 3] refers to a variable with three observations which are 1, 2, and 3."
      We will use the greek letter Sigma to signify the summation process. Thus, we write

      Note that
      1. All calculations within parentheses are done first.
      2. Squaring, multiplying, and dividing are done second, and should be completed in order from left to right.
      3. Adding and subtracting (including summation) are third, and should be completed in order from left to right.

      The following term, which is called the "squared sum" works as shown:

      Because of the order of operations, the following term, which is called "the sum of squares", works as shown:

      Consider how the following summation equation works:
      On the other hand, the next summation equation works differently:

      Finally, consider how this last summation equation works: