Recall the car data set you identified in Week 2. We know that this data set is normally distributed using the mean and SD you calculated. (Be sure you use the numbers without the supercar outlier)
For the next 4 cars that are sampled, what is the probability that the price will be less than $500 dollars below the mean? Make sure you interpret your results.
Please note: we are given a new sample size, we will need to calculate a new SD. Then, to find the value that is $500 below the mean you will need to take the mean and subtract $500 from it. For example, if the mean is $15,000 then $500 below this would be $14,500. Thus the probability you would want to find is P(x < 14,500).
For the next 4 cars that are sampled, what is the probability that the price will be higher than $1000 dollars above the mean? Make sure you interpret your results. Use the same logic as above. If your mean is $15,000 then $1,000 above is 15,000 + 1,000 = $16,000. Thus the probability you would want to find is P(x > 16,000).
For the next 4 cars that are sampled, what is the probability that the price will be equal to the mean? Make sure you interpret your results. Use the same logic as above.
For the next 4 cars that are sampled, what is the probability that the price will be $1500 within the mean? Make sure you interpret your results. Use the same logic as above.
I encourage you to review the Week 4 normal probabilities PDF at the bottom of the discussion. This will give you a step by step example to follow and show you how to find probabilities using Excel. I also encourage you to review the Week 4 Empirical Rule PDF. This will give you a better understanding on how to utilize the empirical rule. You can also use this PDF in the Quizzes section.
There are additional PDFs that were created to help you with the Homework, Lessons and Tests in Quizzes section. While they won’t be used to answer the questions in the discussion, they are just as useful and beneficial. I encourage you to review these ASAP!

week2dataset.png

Week4EmpiricalRule.pdf

Week4uniformprobabilities.pdf

Week4Exponentialprobabilities.pdf

Week4normalprobabilities.pdf
This week we will discuss the Empirical Rule.
The empirical rule allows you to determine the proximity of the data to the mean.
This only works for bell shape or symmetric distributions.
• The interval that is one standard deviation away contains approximately
68% of the data.
(�̅� ± 1 *SD)
• The interval that is two standard deviations away contains approximately
95% of the data.
(�̅� ± 2 *SD)
• The interval that is three standard deviations away contains approximately
99.7% of the data.
(�̅� ± 3 *SD)
Let’s continue to look at the Data from Week 2.
Car Price:
Observation 1 $ 20,000 Observation 2 $ 25,000
Observation 3 $ 30,000 Observation 4 $ 31,000
Observation 5 $ 22,500
Observation 6 $ 25,000 Observation 7 $ 29,500
Observation 8 $ 24,000 Observation 9 $ 24,500
Observation 10 $ 25,000
Mean: $ 25,650 Median: $ 25,000
SD: $ 3,488.47 Sample Size: 10
Using the Empirical Rule calculate how many data points fall within the 1, 2 and 3
SD’s?
1) First, we will need to calculate each interval.
25, 650 – 3,488.47 = $22,162 > round to the nearest dollar 25, 650 + 3,488.47 = $29,138 > round to the nearest dollar The interval for approximal 68% of the data is ($22,162, $29,138). But how many data points fall within this interval? We see that observations, 1, 2, 5, 6, 8 ,9, and 10. 7 of the 10 observations fall
within this interval. That is 7
10 = 70% of the data falls within 1 SD. This is very
close to 68%.
2) We will calculate the next interval.
25, 650 – (2) 3,488.47 = $18,673 > round to the nearest dollar 25, 650 + (2) 3,488.47 = $32,627 > round to the nearest dollar The interval for approximal 95% of the data is ($18,673, $32,627). But how many data points fall within this interval? We see that observations, 1, 2, 3, 4 5, 6, 7 8 ,9, and 10. All 10 of the 10
observations fall within this interval. That is 10
10 = 100% of the data falls within 2
SD’s. This is very close to 95%. Since this is a smaller data set see that all the data points fall within the first 2 observations is not uncommon. We would expect results like this.
3) But it is still a good idea to calculate the last interval.
25, 650 – (3) 3,488.47 = $15,185 > round to the nearest dollar 25, 650 + (3) 3,488.47 = $36,115 > round to the nearest dollar The interval for approximal 99.7% of the data is ($15,185, $36,115). But how many data points fall within this interval? Just like with the last interval all the data points fall within this interval and because this is a small data set the results are as expected.
There are no data points that fall outside this range. There doesn’t appear to be
any outliers in this data set. We also see that the mean and median are close
together. There isn’t a big difference between the two values. Because of these
explanations, this data appears to be normal and have a normal distribution. The
data set does not seem to be skewed, in either direction.
We can see how the SD’s line up along the xaxis and it creates the bellshaped
curve.
,
Uniform Probabilities
The uniform distribution is a continuous probability distribution and is concerned
with events that are equally likely to occur. When working out problems that have
a uniform distribution, be careful to note if the data is inclusive or exclusive.
Formula Review
x = is a real number between a and b
In some instances, x can take on the values a and b, if that happens then
a = smallest X; b = largest X
X ~ U (a, b), where a ≤ x ≤ b
The mean is 𝜇 = 𝑎+𝑏
2
The standard deviation is 𝜎 = √ (𝑏−𝑎)2
12
Probability density function: 𝑓(𝑥) = 1
𝑏−𝑎 where a ≤ x ≤ b
Cumulative density function: P(X ≤ x) = 𝑥−𝑎
𝑏−𝑎
Area to the Left of x: P(X < x) = (𝑥 − 𝑎) ( 1
𝑏−𝑎 ) =
𝑥−𝑎
𝑏−𝑎
Area to the Right of x: P(X > x) = (𝑏 − 𝑥) ( 1
𝑏−𝑎 ) =
𝑏−𝑥
𝑏−𝑎
Area Between c and d: P(c < x < d) = (base)(height) = (𝑑 − 𝑐) ( 1
𝑏−𝑎 ) =
𝑑−𝑐
𝑏−𝑎
Note: (d – c) is the base and ( 1
𝑏−𝑎 ) is the height
Example:
Researchers have developed a safe method for rapidly detecting anthrax spores in
powders and on surfaces. The method has been found to work well even when
there are very few anthrax sports in a powered specimen. Consider a powder
specimen has exactly 30 anthrax spores. Supposed that the number of anthrax
spores in the sample detected by the new method follows a uniform distribution
from 10 to 30. Find the following probabilities.
First, we see that a = 10 and b = 30.
1) Find f(x)
𝑓(𝑥) = 1
𝑏 − 𝑎
𝑓(𝑥) = 1
30 − 10 =
1
20 = .05
2) Find the mean and standard deviation
𝜇 = 𝑎 + 𝑏
2
𝜇 = 10 + 30
2 = 20
𝜎 = √ (𝑏 − 𝑎)2
12
𝜎 = √ (30 − 10)2
12 = √
400
12 = 5.7735
3) Find the probability that 22 or fewer anthrax spores are detected in the
powdered specimen.
P(X ≤ x) = 𝑥−𝑎
𝑏−𝑎
P(X ≤ 22) = 22−10
30−10 =
12
20 = .6
4) Find the probability that between 10 and 25 anthrax spores are detected in the
powdered specimen.
P(c < x < d) = (base)(height) = (𝑑 − 𝑐) ( 1
𝑏−𝑎 )
P(10 < x < 25) = (base)(height) = (25 − 10) ( 1
30−10 ) = 15 ∗ (
1
20 ) = .75
5) Find the probability that fewer than 13 anthrax spores are detected in the
powdered specimen.
P(X < x) = (𝑥 − 𝑎)( 1
𝑏−𝑎 )
P(X < 13) = (13 − 10) ( 1
30−10 ) = 3 (
1
20 ) = .15
6) Find the probability that more than 26 anthrax spores are detected in the
powdered specimen.
P(x > x) = (𝑏 − 𝑥)( 1
𝑏−𝑎 )
P(x > 26) = (30 − 26) ( 1
30−10 ) = 4 (
1
20 ) = .20
7) Find the probability that more than 19 anthrax spores are detected given that
12 anthrax spores have already been detected in the powdered specimen.
P(x > 19  x > 12) = 𝑃(𝑥>19)
𝑃(𝑥>12)
From here, we will use this probability twice and then divide the two answers.
P(x > x) = (𝑏 − 𝑥)( 1
𝑏−𝑎 )
P(x > 19) = (30 − 19) ( 1
30−10 ) = 11 (
1
20 ) = .55
P(x > x) = (𝑏 − 𝑥)( 1
𝑏−𝑎 )
P(x > 12) = (30 − 12) ( 1
30−10 ) = 18 (
1
20 ) = .9
P(x > 19  x > 12) = 𝑃(𝑥>19)
𝑃(𝑥>12) =
.55
.9 = .6111
8) 78% of all anthrax spores that are detected in the powdered specimen fall
below the 78th percentile. Find x.
P(X < x) = .78
Using this equation,
P(X < x) = 𝑥−𝑎
𝑏−𝑎
.78 = 𝑥−𝑎
𝑏−𝑎
.78 = 𝑥−10
30−10
.78 = 𝑥−10
20
.78*20 = x – 10
15.6 = x – 10
15.6 + 10 = x
25.6 = x
The 78th percentile of all anthrax spores detected by the powdered specimen is
25.6.
9) 33% of all anthrax spores that are detected in the powdered specimen fall
above the 33rd percentile. Find x.
P(X > x ) = .33
P(X > x) = 𝑏−𝑥
𝑏−𝑎
.33 = 𝑏−𝑥
𝑏−𝑎
.33 = 30−𝑥
30−10
.33 = 30−𝑥
20
.33*20 = 30 – x
6.6 – 30 = x
23.4 = – x
23.4 = x
The upper 33rd percentile of all anthrax spores detected by the powdered
specimen is 23.4.
,
Exponential Probabilities
The exponential distribution is often concerned with the amount of time until
some specific event occurs.
For this reason, the exponential distribution is sometimes called the waitingtime
distribution.
You need to rewrite the probabilities in the less than form to use the function in
EXCEL. We will use Excel to find Exponential Probabilities. The probabilities do
need to be in the less than form to use Excel. This is very important.
The Less Than or Equal To Form is not as important because exponential
probabilities are continuous not discrete. But the probabilities do need to be in
the less than form to use Excel.
• P( x = r) • P( x ≤ r) same as P(x < r) • P( x ≥ r) = 1 – P(x < r ) • P(x > r) = 1 – P(x < r) • P(r < x < k) = P(x < k) – P(x < r)
• Expected Value is µ or 𝜆
• For Exponential Distributions using the Excel function you will take 1
𝜆 , and
use this value in the Excel function • r and k are the number of occurrences
To find Exponential Probabilities we will use the =EXPON.DIST( ) function.
To find a certain percentage using Exponential Distribution you will use the
following equation
− ln(1 − 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒)
1 𝜆
You will use the natural log function (ln), the percentage you are looking for and
then 1
𝜆 in the denominator.
Note: this is the same 1
𝜆 that you use in the Excel function.
Example:
A specific species of trees, the western hemlock, was found to have a breast
height diameter distribution that resembled an exponential distribution with
𝜆 = 30.
1) Find the probability that a western hemlock tree growing in the forest has a
diameter that is exactly 23 centimeters in length?
Because of the word “exactly” we want to find this probability P(x = 23). We will
use the EXPON.DIST() function to find this probability.
In Excel you can take =1/30 = .03333. We will use this value in your Excel
function.
P(x = 23) = EXPON.DIST(23,.03333, FALSE)
In Excel make sure you hit the “=“ sign first then start typing in EXPON.DIST( From
here make sure you include the left parenthesis then type in the x value, then
1/lambda, then either TRUE or FALSE. Then close the parenthesis ) and hit Enter.
Type in a TRUE when you have a less than or equal to probability and type in a
FALSE when you have an equals probability. This example has an “=“ sign so we
will use a FALSE.
There is a 1.55% probability that a western hemlock tree growing in the forest has
a diameter that is exactly 23 centimeters in length.
Note: When you hit “Enter” the answer will return as a decimal, .0155. You will
then need to convert it to a percent.
2) Find the probability that a western hemlock tree growing in the forest has a
diameter that is less than 27 centimeters in length?
Because of the word “less than” we will use the less than sign.
This is the probability we want to find, P(x < 27) or P(x ≤ 27)
P(x < 27) = EXPON.DIST(27,.0333,TRUE)
In Excel make sure you hit the “=“ sign first then start typing in EXPON.DIST(.
From here make sure you include the left parenthesis then type in the x value,
then 1/lambda, then either TRUE or FALSE. Then close the parenthesis ) and hit
Enter.
Type in a TRUE when you have a less than probability and type in a FALSE when
you have an equals probability. This example has an “<“ sign so we will use a
TRUE.
There is an 59.34% probability that a western hemlock tree growing in the forest
has a diameter that is less than 27 centimeters in length.
Note: When you hit “Enter” the answer will return as a decimal, .5934. You will
then need to convert it to a percent.
3) Find the probability that a western hemlock tree growing in the forest has a
diameter that is exceeds 25 centimeters in length?
Because of the words “exceeds” we will use the greater than or equal to sign.
This is the probability we want to find, P(x > 25).
This probability is in the greater than or equal to form NOT the less than so we
need to rewrite this in the less than or equal to form.
Remember: P( x > r) = 1 – P( x ≤ r )
P( x > 25) = 1 – P(x ≤ 25). Now that the probability is in the less than form we can
use Excel.
1 – P(x ≤ 25) = 1 EXPON.DIST(25,.0333, TRUE)
In Excel make sure you hit the “=“ sign first, then the 1 – and then, EXPON.DIST(.
From here make sure you include the left parenthesis then type in the x value,
then 1/lambda, then either TRUE or FALSE. Then close the parenthesis ) and hit
Enter.
Type in a TRUE when you have a less than probability and type in a FALSE when
you have an equals probability. This example has an “<“ sign so we will use a
TRUE.
There is a 43.46% probability that a western hemlock tree growing in the forest
has a diameter that is exceeds 25 centimeters in length.
Note: When you hit “Enter” the answer will return as a decimal, .4346. You will
then need to convert it to a percent.
4) Find the probability that a western hemlock tree growing in the forest has a
diameter that is between 28 and 34 centimeters in length?
Because of the word “between” we will this probability P(r < x < k) = P(x < k) – P(x < r) This is the probability we want to find, P(28 < x < 34).
P(28 < x < 34) = P(x < 34) – P(x < 28)
Now that the probability is in the less than form we can use Excel.
P(x < 34) – P(x < 28) = EXPON.DIST(34,.0333,TRUE) – EXPON.DIST(28,.0333,TRUE)
In Excel make sure you hit the “=“ sign first, then EXPON.DIST(. From here make
sure you include the left parenthesis then type in the x value, then 1/lambda,
then TRUE, then close the parenthesis ) and hit the minus sign “ –“ and then
repeat the steps and hit Enter.
Type in a TRUE when you have a less than probability and type in a FALSE when
you have an equals probability. This example has an “<“ sign so we will use a
TRUE.
There is a 7.13% probability that a western hemlock tree growing in the forest has
a diameter that is between 28 and 34 centimeters in length.
Note: When you hit “Enter” the answer will return as a decimal, .0713. You will
then need to convert it to a percent.
63% of all trees will have a diameter of how many centimeters in length?
We will use this equation and plug this equation into Excel:
− ln(1 − 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒)
1 𝜆
There is an LN function in Excel and the percentage we will use is .63
63% of all trees will have a diameter of 29.8276 centimeters.
If you want to plug this in and calculate this by hand. But I find Excel a lot easier.
− ln(1 − 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒)
1 𝜆
− ln(1 − .63)
1 30
,
Normal Distributions are the most common distributions in statistics. If a random variable X is normally distributed with a mean μ and a standard deviation σ.
X ~ N(μ, σ) ; Z ~ N(0, 1) Normal distributions are known as “bellshaped curve” To find the probabilities of normal distributions using a Normal Distribution Table, we would start by converting the x values to a standard normal zcurve. The equation of the z – score;
𝑧 = 𝑥 − 𝜇
𝜎
Nowadays, we do not need to do this conversion to the standard normal distribution, since Excel does it automatically for us. Excel can only find Less Than probabilities, therefore it is important to make sure that your problem is only including the less than inequality (<). Less Than OR Less Than and Equal To is not as important because normal probabilities are continuous not discrete. Here are some common Normal Probabilities and how they would get rewritten to calculate in the less than form, to use Excel.
• P(X ≤ j) same as P( X < j) • P( X ≥ j) same as 1 – P(X < j ) • P(j < X < k) = P(X < k) – P(X < j) • Expected Value = µ (Mean) • Standard deviation = σ (SD)
To find Normal Probabilities we will use the =NORM.DIST( ) function. The Central Limit Theorem states that given any distribution with a mean μ and a standard deviation of σ, the sample mean will approach a normal distribution as the sample size, n, increases. The new mean of the sample mean will equal the old mean; new μ = old μ and the new standard deviation of the sample mean (this is also called standard error) will be written as;
𝒏𝒆𝒘 𝒔𝒅 (or standard error) = 𝜎
√𝑛
Let’s use our Car Price Data from Week 1 and calculate 4 different probabilities.
Car Price:
Observation 1 $ 20,000 Observation 2 $ 25,000
Observation 3 $ 30,000 Observation 4 $ 31,000
Observation 5 $ 22,500
Observation 6 $ 25,000 Observation 7 $ 29,500
Observation 8 $ 24,000 Observation 9 $ 24,500
Observation 10 $ 25,000 1. Using our data, we believe that the cost of the type of car we calculated is normally distributed with a mean of $25,650 and a SD of $3,488.47. Assume that 5 additional cars are randomly sampled, and their prices are recorded. What is the probability that the sample mean price of the 5 new cars will be less than $24,000? The probability is already in the less than form, P(�̅� < 24,000), so we do not need to do additional work in Excel to find the probability. We also notice that the new sample size is n = 5. The mean will stay the same, but we will need to calculate a new SD. We will apply the Central Limit Theorem to do this. Remember you need to put in the “=” sign and then we will click on the cell that contains the old SD, and will hit the “ / “ sign and then use the SQRT( ) function and put 5 within the parentheses because the new sample size is 5.
𝒏𝒆𝒘 𝒔𝒅 = 𝜎
√𝑛 =
3488.47
√5 = 1560.09
Next, we want to find this probability P(�̅� < 24,000) and we will use the NORM.DIST() function in Excel to do this. P(�̅� < 24000) = NORM.DIST(24000, 25650, 1560.09, true)
In Excel make sure you hit the “=“ sign first, then start typing in NORM.DIST(. From here make sure you include the left parenthesis then type in the x value, the mean, the standard deviation, then either True. Then close the parenthesis ) and hit Enter. ALWAYS type in a True for continuous probability functions (the normal distribution is continuous). This example has an “<“ sign so we will use a True.
The probability that the sample mean for the new sample of 5 cars is below $24,000 is 14.51%. Remember: Once you hit “Enter” the answer returns a decimal. You need to convert it to a percentage if you want to read a percentage.
2. Assume that 5 additional cars are randomly sampled, and their prices are recorded. What is the probability that the sample mean price of the 5 new cars will be higher than $25,000? Because of the words “higher than”, we want to find this probability P(�̅� > 25,000). Since we are using the same data the mean and the new SD will be the same. Remember the function in Excel are in the less than form. This means we will need to do an extra step in Excel to get the probability we want. P(�̅� > 25,000) = 1 – NORM.DIST(25000,25650,1560.09,TRUE) In Excel make sure you hit the “=“ sign first, then the 1 – and then start typing in NORM.DIST(. From here make sure you include the left parenthesis then type in the x value, the mean, the standard deviation, then either True. Then close the parenthesis ) and hit Enter.
The probability that the sample mean for the new sample of 5 cars is below $25,000 is 66.15%. Remember: Once you hit “Enter” the answer returns a decimal. You need to convert it to a percentage if you want to read a percentage. 3. Assume that 5 additional cars are randomly sampled, and their prices are recorded. What is the probability that the sample mean price of the 5 new cars will be between $24,000 and $25,000? Because of the word “between”, we want to find this probability P(24000 < �̅�< 25000). Since we are using the same data the mean and the new SD will be the same. Remember the function in Excel are in the less than form. This means we will need to do an extra step in Excel to get the probability we want. P(24000 < �̅� < 25000) = P(�̅� < 25000) – P(�̅� < 24000) = NORM.DIST(25000, 25650, 1560.09,TRUE) – NORM.DIST(24000,25650,1560.09,TRUE)
In Excel make sure you hit the “=“ sign first, then start typing in NORM.DIST(. From here make sure you include the left parenthesis then type in the x value, the mean, the standard deviation, then either True. Then close the parenthesis ), hit the minus – sign then Repeat and then hit Enter.
The probability that the sample mean for the new sample of 5 cars is between $24,000 and $25,000 is 19.34%. Remember: Once you hit “Enter” the answer returns a decimal. You need to convert it to a percentage if you want to read a percentage.
The post Recall the car data set you identified in Week 2. We know that this data set is normally distributed using the mean and SD you calculated.? (Be sure you use the numbers without the supercar first appeared on Writeden.