Paul E. Hand
MA 224.1
Fall 2006

Wednesday, 6 Sep:
Todays discussion about the predictive power of two voters was murky. This was somewhat intended. All we actually said was:

If an individual is voting at random, it is highly improbable that s/he will vote for the winning candidate 20 times in a row. Nonetheless, it is probable that in a population of 200,000,000 such individuals there is someone who is right at least 20 times in a row.
The points I hope you got out of the discussion are:
• There is a need for clear, precise statements that do not overstate their claim.
• Real systems are complicated. Make statistical inference with care.
• Some seemingly improbable things are not.
Friday, 8 Sep:
We discussed limitations and problems with the Laplace definition of probability. In class, two of you commented that it would be difficult to use this to describe people's actions and whether the L train is late. These are great answers. Indeed, it would be a challenging task to come up with and justify some collection of equiprobable states of the world that would allow us to calculate probabilities a priori. The Laplace definition is not useful in these cases because the world isn't so simple.

We discussed simulation data. We had ~20 data points corresponding to 100 simulated matches from the problem of the points. We also had ~20 data points corresponding to 10 simulated matches. We came up with important observations and differences between these results. The most important ones are:

• Both were roughly centered around 75%.
• The data with 100 simulated trials had less fractional spread around 75%.
• With 100 simulated trials, no one actually got 75/100 wins for player 1.
• While most people did get closer to a 75% win percentage for player 1, some individual data points were closer to 67%.
We will analyze these statements further throughout the rest of the class. Keep in mind, there will always be variability in random outcomes. With enough trials, it is very likely some will be "extreme."

Wednesday, 13 Sep:
Perhaps the whole point of this lecture was not so clear. Last Friday, we discussed how we could use trees to find probabilities by directly counting the relevant cases. Due to that's infeasability we wondered whether we could somehow count the cases of interest without enumerating every element of the state space. The big question we are trying to answer now is:

If we flip a coin n times, in how many ways can we get k heads?

The notes enumerated the n=6 and k=1 and 2 cases. We tried to find a formula for general n and k based on the logic in the notes. That logic is:

If we have k heads in n tosses, this must mean that the first heads occurred at least by toss n-k+1. If the first heads didn't show by this toss, then it is not possible to have k heads in the remining k-1 tosses. Similarly, the second heads must have occured by toss n-k+2. The kth heads must have occured by toss n.

In the end, we got a formula involving k summations. Becuase this answer is some what ugly, we have reached a dead end. We need a different way of looking at the question above with hopes of getting a nice simple answer. We will discuss this way on Friday.

In lecture, I am trying to show you how difficult problems get solved. This means we will run into dead ends quite often, as an initial approach to solving the problem might not be fruitful.

Friday, 15 Sep:

The most important thing we did in class was counting the number of ways to permute words with some repeated letters. The logic is as follows. Consider the word GOGGLE. We can make each letter distinct by assigning subscripts 1,2,3 to the G's. There are 6! ways to permute the 6 distinct letters G1 O G2 G3 L E. For any permutation of GOGGLE, there are 3! ways to assign the subscripts, so the number of permutations of GOGGLE is 6!/3!.

We applied this logic to the case where there is a word of k H's and (n-k) T's. There are k! (n-k)! ways to distinguish the H's and T's. There are n! ways to permute n distinct letters. So we get the result that there are n! / (k! n-k!) permutations of the word with k H's and n-k T's.

Another important remark was that "n choose k" represents the number of ways of selecting k objects from a group of n distinguishable ones, where the order of selection doesn't matter.

A remark on homework. Your homework solutions will need to meet high standards of correctness and explanation. If your score on a HW problem isn't as high as you like, you are welcome to revise it for full credit. A course grade of A requires getting a lot of 4's and 5's on the homework AND being able to orally explain your answers. Rushing through solutions will not be a good use of your time, as you will likely need to revise the answer and figure out how to explain it. I prefer you give a well thought answer that is a few days late to a poorly understood answer which is on time. The homeworks will keep piling up; do not get behind.

Friday, 22 Sep:

What you should understand most out of todays lecture was our derivation of n choose k. Recall that n choose k is the number of ways of selecting k objects from a set of n distinguishable objects where order of selection doesn't matter. To find the value of n choose k, we consider a bag of n distinguishable objects. We reach in and pull out one object. Then we reach in and pull out a second. We continue until we pull out k objects. The number of ways of pulling out k objects with order mattering is n*(n-1)*(n-2)* ... * (n-k+1) = n! / (n-k)! As each choice of k objects can be pulled out in any of k! ways, we must divide by k! to arrive at the number of ways to pull out different groups of k objects. Hence n choose k = n! / (k! (n-k)!).

If you don't fully believe today's "strange but true," you should consider simulating it. You could write a Matlab program that picks out 5 cards at random from a deck. Do this, say, 100000 times. Count the number of hands with an Ace. Of those, count the number of hands with at least 2 aces. Count the number of hands with the Ace of spades. Of those, count the number of hands with at least two aces. You should get pretty close to the probabilities derived in class. Some useful Matlab functions for this are "deck=randperm(52); hand=deck(1:5)". These two lines first randomly rearrange the numbers 1-52 into the variable deck. Then the variable hand is set to the first 5 elements.

Friday, 29 Sep:

Today, we introduced the concept of significance level of hypothesis tests. A test decides between two hypotheses, Ho and HA based on the outcome of some random experiment. In class, Ho was the hypothesis that a coin is fair, and HA was the hypothesis that a coin is unfair. We rejected Ho if the observed number of heads, X, was far from expected. We defined the rejection region by the range of X which occurs with less than 0.05 probability. If X happens to be in this range, we reject Ho, otherwise we accept.

The significance level was the probability that the test declares the coin biased when the coin is actually fair.

Wednesday, 4 Oct:

The most important point of today was that there is more to life than significance levels. That is to say, while good tests have low significance levels, not all tests with low significance levels are good. You should be familiar with the examples that illustrate this point.

A more general point about statistics that these examples illustrate is that in order to believe our statistical inferences, we must understand the underlying process being analyzed. In the case of coins, this understanding tells us that getting a fraction of heads very far from 1/2 is cause to doubt fairness.

Wednesday, 18 Oct:

We considered a population of N people. We compared surveys of n people, with or without replacement. By comparisons of pairs of such survey, we concluded that

• The primary difference between surveying with or without replacement is that in the latter case, the same people can be surveyed multiple times. This acts to increase the variability of the survey's results.
• Comparing surveys with replacement, N does not matter. Increasing n gives less variability.
• If N is large compared to n, the difference between surveying with and without replacement is small.
Friday, 20 Oct:

Consider a population of N people and a survey of n with replacement. Let NA be the number of people who support A and NB the number of people who support B. Note that NA + NB = N. Let fA = NA/N and fB = NB/N be the fractions of people who support A and B. Let X be the number of surveyed people who support A. Note that 0 <= X <= n.

Without doing a calculation, we suspect that E[X] = n * NA/N. That is, we expect the survey and the population to have the same percentage of support for A. Performing the explicit calculation, we showed the above equality. By the same method, we could show that the standard deviation of X is sqrt(n*fA * fB).

Fix an n. Is there a fA which maximizes the standard deviation?

Wednesday, 25 Oct:

The state space of a random experiment is defined as the set of all possible outcomes the experiment could produce. Each outcome has a certain probability. To find the probability of an event, we may add up the probabilities of the elements within the event.

When we are solving a probability problem, we have some degree of freedom for what to choose the state space to be. For example, from the homework problem, we could be interested in the probability that P2 and P4 have no diamonds given that P1 has 6 diamonds. We could choose the state space to be the set of pairs of thirteen card hands available to P2 and P4. Alternatively, we could realize that the above probability is the same as the probability that P3 has 7 diamonds given that P1 has 6 diamonds. We could choose the state space to solve the problem then as the set of hands available to P3. Both ways will give us the correct answer.

As a further example, we could consider flipping a coin once. Naturally, we would choose the state space to be {Heads, Tails}, with P(heads) = P(tails) = 1/2. There is nothing stopping us from considering the state space {Heads, Tails, Side}, where side would indicate the coin lands upright. When we come to assess the probabilities of each state, we would make the reasonable choice that P(heads) = P(tails) = 1/2 and P(side) = 0. Even though he possibility of landing on the side is included in the state space, it is assigned zero probability. Hence it is exactly the same situation as if "side" is not included in the state space.

The choice of the state space and the probabilities assigned to each element comes from understanding the problem.

Friday, 27 Oct:

Our simplest example illustrating the failure of the product rule was the female/long hair example. Suppose that 1/2 of population is female and that 1/3 of population has long hair. It does not necessarily follow that 1/6 of the population is both female and has long hair. This is due to the fact that people with long hair are more likely than not to be female.

Given two events, A and B, they are independent, by definition, if P(A and B) = P(A) * P(B). This definition only gives a name to when the product rule is valid. It doesn't elucidate the scenarios under which it is valid.

More meaningfully, A and B are independent if the knowledge of A does not change the probability of B. For example, the knowledge that someone has long hair does change the probability that they are female, hence the events of being female and having long hair are dependent. We showed that A and B are independent exactly when P(B|A) = P(B).

Consider flipping a fair coin 4 times. Let A = {exactly 3 Heads} and B = {first flip is H}. Knowing that there are 3 H's makes it more likely that each flip was an H. Hence A and B are dependent. We can instead consider A = {exactly 2 Heads} and B = {first flip is H}. In this case, since exactly half of the flips were heads, the probability any given position was heads is 1/2. Hence A and B are independent.

See if you can determine if the following events are independent:

• Consider a survey of n people from a population of N people WITH replacement. A = {person 1 surveyed}. B = {person 2 suveyed}
• Consider a person who takes the SAT twice. A = {scored above 1400 on first try}. B = {scored above 1400 on second try}

Wednesday, 1 Nov:

Today we discussed Bayes' Rule. We considered a population in which each individual had a given disease with probability 0.01. A test for this disease gives the correct result with probabililty 0.95. That is P(tests + | has disease) = 0.95 and P(tests - | doesn't have disease) = 0.95. We are interested in how trustworthy the answer given by the disease test is. That is, we want to find P(has disease | tests +). Bayes's rule tells us how to flip the two events in the conditional probability. We ended up finding that the P(has disease | tests +) was very small. Try to explain this discrepancy by considering the prevalence of false positives.

Friday, 3 Nov:

Today was about two things: the geometric distribution and linearity of expected values.

Consider a random experiment resulting in only success or failure (For example flipping a coin with success being heads). If we flip the coin until we get a success, the number of flips required follows a geometric distribution. If the probability of success on each try is p, then the probability that it takes exactly k tries to get a success is (1-p)k p. We computed the expected value of this to be 1/p in class. That is, if each trial gives success with probability 1/20, then we would expect it to take 20 tries to get our first success.

Even if two random variables are not independent, the expected value of their sum is the sum of their exptected values. We can cleverly exploit this fact by considering indicator variables. Consider n people in a room. They all put their hat into a pile. Then everyone chooses a hat at random. We can define the indicator variables I1 to be 1 if the first person gets his or her hat and 0 otherwise. Similarly for I2 ... In. Observe that I1 + I2 + ... + In = number of ppl who get their hat back. Also note that E[I1] = E[I2] = ... = 1/n. So hence the expected value of the number of ppl who get their hat back is n * 1/n = 1.

You should attempt to use indicator variables to find the expected number of pairs of people who share birthdays in a room of n people. Compare the ease of this calculation with the combintorics required to evaluate the expected value directly.

Wednesday, 8 Nov:

We developed the following model of diffusion. A particle starts at position 0 at time 0. At every integer multiple of delta t, it moves delta x to either the left or right with probability 0.5. We called Y(t) the position of a particle at time t. We showed that E[Y(t)] = 0, which must be true due to symmetry. We also showed that Var[Y(t)] = D*t where D = (delta x)^2 / (delta t). Hence, the standard deviation of the particle position goes like the square root of t. This is a fundamental fact about diffusion.

The mathematics of this calculation were not too difficult. The main issue was in finding E[Y(t)^2]. We wrote out what Y(t)^2 means in terms of each of the individual steps and used linearity of expectations.

Friday, 10 Nov:

We showed the results of simulations of the diffusion process mentioned on Wednesday. On Wednesday, we observed that the standard deviation of the particle positions is sqrt(D*t). By inspecting the width of the bell like data, we can see it agrees with what we expected the standard deviation to be.

The binomial distribution can be well approximated by a normal distribution with the same mean and variance. We will use this approximation to aid computation.

Wednesday, 15 Nov:

Today we showed that the density function of a normal distribution indeed integrates up to 1. We also calculuated that N(mu, sigma^2) does have mean mu and standard deviation sigma. These calculations are started by rescaling x in terms of the number of standard deviations away from the mean and then using some trick for the integration.

Friday, 17 Nov:

The first major point of today is that all normal distributions are the same when viewed in terms of number of standard deviations away from the mean. That is if X~N(mu, sigma^2) then (x-mu)/sigma ~ N(0,1). This means that to find probabilities, we only need a table for the standard normal. It also means that too see if normally distributed data is statistically significantly higher than expected, we need only see if it is more than 1.65 standard deviations more than the mean.

The second major point was the modus operandi of statistics. Basically, it says that we figure out the variable we are going to use to make our decision, find its distribution, and determine the top 5% most extreme values it can take.