What to submit: Please submit a single Word file containing your numerical results, comments and graphics (if any) for all questions. Also submit the worksheets (if any) you used to produce the report – a total of two separate files (teaching assistants will review the worksheets in the event of errors in the report).

We have been tossing coins (or letting a computer toss them for us) to see what happens. One thing we learned is that we can get a slightly different answer each time we do it. There are cases in which it is possible to get an exact answer. This Assignment explores that. You can read the Exhaustion Methodto see how you could generate a list of all possible outcomes for a coin tossing experiment. We suggest you use it to list the outcomes for seven tosses. We hope this will help you see that such a list is not too difficult to make if you go at it in a logical and organized way. However, you do not *need* to follow the process to do this Assignment. Please use the list of all 128 outcomes to answer the questions 8-10 in the assignment and to check the list you made.

Statistics 1 Assignment 2 (40 points)

Q.1 (2 pts) In a survey of engineers at a hard drive manufacturer it was found that 18% were female, 7% were black, 35% had degrees in electrical or computer engineering, and 40% were under the age of 35. Would it make sense to present this information in a pie chart? Why or why not?

Q.2 (2 pts)

In basketball, some fouls result in “free throws” (unimpeded shots) by the player fouled. Over his career, a basketball player has scored on 1210 free throw attempts and missed 214 free throw attempts. What is his estimated probability of successfully scoring on a free throw attempt?

Q.3 (2 pts)

A political commentator makes the following observation in 1991: “From 1973 to 1982, the US economy grew at an annual rate of only 2%. From 1983 to 1990, the growth rate doubled to 4%. That’s a big difference.” Review the spreadsheet on GDP that was presented in this chapter and critique this statement, especially with respect to choice of comparison periods.

Q.4 (2 pts)

Consider the following data on the median home value in Boston neighborhoods (from the mid 20th century):

22

13.1

17.8

20.3

15.4

11.7

25.3

15.2

27.1

23.2

23.1

18.1

32.9

20.3

21.1

21.1

19.9

23.1

16.1

10.4

Find the standard normal score for the first value (22). (For purposes of calculating the standard deviation, you can consider this either as the entire population, or as a sample.

Q.5 (3 pts)

Evidence has been produced that famous people are less likely to die in the month of their birthday than in other months. The (skeptical) hypothesis is that dying is equally likely in any month regardless of birthday.

Now suppose that out of 120 celebrity deaths, only 7 occurred in the month of their birthday.

Imagine a hat with 12 cards, each card a month, as well as a list of the 120 celebrity birthdays. We shuffle and pick a card, noting whether it matched the first celebrity birth month. We then repeat this (replacing the card each time, of course), each time noting whether the month picked from the hat matched the next birth month, etc., until we have gone all the way through the 120 names on the list.

Then we repeat this procedure 100 times, each time recording how many matches we got between the 120 picks from the hat, and the list of 120 birthdays. We got the following frequency distribution. What is your conclusion and why?

Number dying in birthday month | Frequency |

6 | 1 |

7 | 3 |

8 | 9 |

9 | 20 |

10 | 32 |

11 | 25 |

12 | 7 |

13 | 1 |

14 | 2 |

Q.6 (4 pts) With the CBC simulation that you already ran, run it again nine more times and report the ten p-values you obtain. So, you will (1) toss a coin ten times, (2) repeat step 1 a thousand times, recording what proportion of the 1000 got 7 or more heads, and then (3) doing steps 1 and 2 nine more times for a total of 100 000 tosses. NOTE: here is a <link> that will lead you to an Excel spreadsheet already set up to do this and another Excel for Windows spreadsheet using Box Sampler (you can download macro-enabled workbook or you can install Box Sampler on your Windows computer). So all you really need to do is press a key or click on a couple menu items ten times and write down what you get. Then make a nice statistical summary of the results and estimate the true p-value. Also give an estimate of how far off that value might be from the true value.

Q.7 (5 pts) This exercise continues our work with the CBC story. In the text we remarked that cutting the number of major medical errors in half would have been more impressive if the number of errors had been larger. Redo question 6, but this time imagine we had 20 major medical errors to assign to years. If you use one of the spreadsheets we provided, you will need to make at least these changes:

a. Change the number of tosses from 10 to 20.

b. The formula that counts how many times 2008 came up will have to be changed to point to a range of 20 numbers rather than 10.

c. The table where you record the frequency distribution of the outcomes will have to expand to have 21 rows instead of 11. (Make good use of cut-and- paste here.)

d. Cutting the errors in half will now mean 14 or more errors in 2008 so you will have to change what you count in the frequency table when you compute the p-value.

Report a frequency table of outcomes and a p-value. Compare the p-value to what you got with just 10 medical errors.

Q.8 (3 Points) Use the list of 128 outcomes for seven tosses of a coin (the link is in the assignment introduction) to make a table for the frequency and probability distribution of the random variable “number of heads in seven tosses”. (You can do this simply by counting.) There should be eight possible values, and each requires a probability. What do your probabilities add up to?

Q.9 (2 Points) Use the table you made above to compute the probability that the number of heads will be at least double the number of tails (this translates to “five or more heads”).

Q.10 (8 Points Total) Suppose you bought some really cheap blank DVDs at the dollar store. Then you look them up on the web and find that half these disks are dead on arrival and when data are recorded on the remainder, about half of those become unreadable within the first year, half of the survivors die in the second year, etc. Let’s see what happens over seven years. We can model the distribution of “time before failure” with a coin toss. Make a table for the probability distribution of the number of tosses before you got a head (=failure). For example this random variable assigns 3 to TTTHTHT and 0 to HTHTHTH, meaning one disk lasted three years and another was dead on arrival. (Make sure you get the right counts for these two examples before you continue.) If you never get a head (TTTTTTT), assign the value 7.

Use the “list of 128 outcomes” linked in the instructions for this homework to answer the following two questions.

a. (5 points) What is the probability that a disk will last 6 years (i.e. 7 tosses of the coin) before failing?

b. (3 points) Often of interest in such situations is the mean time before failure (MTBF). This is a common spec for computer hard drives. Find this for the distribution above.

Note: Could you use a simulation to solve these problems? Yes and a link will be included to a Box Sampler solution in the model answer. However, the setup is a bit involved and we are not asking you to do a simulation for this problem. This is the type of simulation more suited to a programming language than a statistical analysis package.

Q.11 (1 pts) Here is a table of column percents for Department D in the Berkeley study of graduate admissions.

Female | Male | All | |

Admitted | 34.93 | 33.09 | 33.96 |

Rejected | 65.07 | 66.91 | 66.04 |

All | 100.00 | 100.00 | 100.00 |

What does this tell you about the relative admission rates of males and females in this department?

Q.12 (2 pts) Here is a table for Department F at UC Berkeley.

Admitted | Rejected | All | |

Female | 24 | 317 | 341 |

Male | 22 | 351 | 373 |

All | 46 | 668 | 714 |

Read it carefully and do an appropriate computation to compare the admission rates of males and females.

Q.13 (4 pts) Here is a contingency table for the variables Dept. and Admit from Berkeley.

A | B | C | D | E | F | All | |

Admitted | 601 | 370 | 322 | 269 | 147 | 46 | 1755 |

Rejected | 332 | 215 | 596 | 523 | 437 | 668 | 2771 |

All | 933 | 585 | 918 | 792 | 584 | 714 | 4526 |

Find P(C), P(R) and P(C∩R). (Note: The last is the probability of C intersect R in case your browser does not show math. symbols. “R” means “rejected”)

## Possible Outcomes for Seven Tosses of a Fair Coin

## Needs help with similar assignment?

We are available 24×7 to deliver the best services and assignment ready within 3-4 hours? Order a custom-written, plagiarism-free paper

Get Answer Over WhatsApp

Order Paper Now