Distributions of Data, Random Variables, and Probability Distributions
In data analysis, variables whose values depend on chance play an important role in linking distributions
of data to probability distributions. Such variables are called random variables. In this lesson, we will learn
distributions of data, random variables and probability distributions.
Distributions of Data
Recall that relative frequency distributions given in a table or histogram are a common way to show how
numerical data are distributed. In a histogram, the areas of the bars indicate where the data are
concentrated.
Recall that the sum of the areas of the bars of a relative frequency histogram is 1.
Although the units on the horizontal axis of a histogram vary from one data set to another, the vertical
scale can be adjusted (stretched or shrunk) so that the sum of the areas of the bars is 1. With this vertical
scale adjustment, the area under the curve that models the distribution is also 1. This model curve is called
a distribution curve, but it has other names as well, including density curve and frequency curve.
The purpose of the distribution curve is to give a good illustration of a large distribution of numerical data
that doesnt depend on specific classes. To achieve this, the main property of a distribution curve is that
the area under the curve in any vertical slice, just like a histogram bar, represents the proportion of the
data that lies in the corresponding interval on the horizontal axis, which is at the base of the slice.
Density Curves and Introduction to the Normal Distribution
Properties of Density Curves
• All values are above the x-axis
• The total area under the curve equals 1
• The area under the curve between two values corresponds to the proportion of all observations that fall within that range.
Random Variables
When analyzing data, it is common to choose a value of the data at random and consider that choice as a
random experiment, as introduced in section 4.4. Then, the probabilities of events involving the randomly
chosen value may be determined. Given a distribution of data, a variable, say X, may be used to represent
a randomly chosen value from the distribution. Such a variable X is an example of a
random variable,
which is a variable whose value is a numerical outcome of a random experiment.
In the histogram for a random variable, the area of each bar is proportional to the probability represented by the bar. The sum of the areas is 1 and the sum of the
probabilities is 1.
This is also true for a continuous probability distribution: The area of the region under
the curve is 1, and the areas of vertical slices of the region—similar to the bars of a histogram—are equal
to probabilities of a random variable associated with the distribution. Such a random variable is called a
continuous random variable, and it plays the same role as a random variable that represents a randomly
chosen value from a distribution of data. The main difference is that we seldom consider the event in
which a continuous random variable is equal to a single value like X = 4; rather, we consider events that
are described by intervals of values such as 1 < X < 4 and X > 15. Such events correspond to vertical
slices under a continuous probability distribution, and the areas of the vertical slices are the probabilities
of the corresponding events. (Consequently, the probability of an event such as, X = 4, would correspond
to the area of a line segment, which is 0.)
Basic idea and definitions of random variables.
Discrete and continuous random variables
Defining discrete and continuous random variables. Working through examples of both discrete and continuous random variables.
Discrete uniform distribution
Working through more examples of discrete probability distribution.
The Normal Distribution
Many natural processes yield data that have a relative frequency distribution shaped somewhat like a bell,
as in the distribution below with mean m and standard deviation d.
Such data are said to be approximately normally distributed and have the following properties.
The mean, median, and mode are all nearly equal.
The data are grouped fairly symmetrically about the mean.
About two-thirds of the data are within 1 standard deviation of the mean.
Almost all of the data are within 2 standard deviations of the mean.
As stated above, you can always associate a random variable X with a distribution of data by letting X be
a randomly chosen value from the distribution. If X is such a random variable for the distribution above,
we say that X is approximately normally distributed.
As described, relative frequency distributions are often approximated using a smooth
curvea distribution curve or density curvefor the tops of the bars in the histogram. The region below
such a curve represents a distribution, called a continuous probability distribution. There are many
different continuous probability distributions, but the most important one is the normal distribution,
which has a bell-shaped curve.
Just as a data distribution has a mean and standard deviation, the normal probability distribution has a
mean and standard deviation. Also, the properties listed above for the approximately normal distribution
of data hold for the normal distribution, except that the mean, median, and mode are exactly the same and
the distribution is perfectly symmetric about the mean.
A normal distribution, though always shaped like a bell, can be centered around any mean and can be
spread out to a greater or lesser degree, depending on the standard deviation.
Normal Distribution
Properties of the Normal Distribution
1. The curve is continuous
2. The curve is bell-shape
3. The curve is symmetrical about the mean
4. Th mean, median and mode are equal to each other
5. The curve never touches the x-axis
6. The area under the curve is 1
7. The distribution is described by the mean and standard deviation.
The Normal Distribution and the 68-95-99.7 Rule.
This video shows the normal distribution and what percentage of observed values fall within either 1, 2, or 3 standard deviations from the mean. One specific example is discussed.
Try the free Mathway calculator and
problem solver below to practice various math topics. Try the given examples, or type in your own
problem and check your answer with the step-by-step explanations.