Tutorial 6

1) Nucleic Acids

Question 1a: Transcribe the following DNA sequence into RNA:
AGGCTAGCTAGGCA


Question 1b: Replicate the following DNA sequence into DNA:
AGGCTAGCTAGGCA


Question 1c: Replicate the following DNA sequence into DNA but with a single nucleotide polymorphism event at base 3:
AGGCTAGCTAGGCA


Question 1d: Replicate the following DNA sequence into DNA but with a 3A insertion after base 5:
AGGCT---AGCTAGGCA


Question 1e: Replicate the following DNA sequence into DNA but with a 3 base deletion after base 5:
AGGCTAGCTAGGCA







2) Generating a Random Poisson Variable



The Poisson probability distribution is:

$p(X = x) = \frac{\lambda^x e^{-\lambda}}{x!}$

The mean of the Poisson distribution is the rate $\lambda$.

The Exponential probability distribution is:

$p(x) = \lambda e^{-\lambda x}$

The mean of the exponential distribution is $\frac{1}{\lambda}$.

Question 2: Using the numpy.random.exponential() function, write some code which will generate 1000 random Poisson variables with the following parameters.

a) A Poisson process with rate $\lambda = 5$ over 1 unit of time.

b) A Poisson process with rate $\lambda = 1$ over 5 units of time.

c) Are these two distributions the same or different?






3) Markov Chains

$$A = \begin{bmatrix} 0.2 & 0.7 & 0.1 \\ 0.05 & 0.9 & 0.05 \\ 0.5 & 0.4 & 0.1 \end{bmatrix}$$

$$B = \begin{bmatrix} 0.1 & 0 & 0 & 0.9 \\ 0.2 & 0.5 & 0.3 & 0 \\ 0.5 & 0.4 & 0.05 & 0.05 \\ 0.1 & 0.2 & 0.3 & 0.4 \end{bmatrix}$$

Entry $(i,j)$ in a transition matrix represents the probability of transiting from state $i$ to state $j$. Notice how transition matrices are square and each row sums to 1.

Question 3: Draw Markov model diagrams for transition matrices $A$ and $B$.

$A:$


$B:$







4) Maximum Likelihood Estimate



The likelihood is the probability of seeing the data, $D$, you saw, assuming that your model/hypothesis $\theta$, is true. $L(D) = P(D | \theta)$.
The maximum likelihood estimate (MLE) is the model/hypothesis $\hat{\theta}$ which has the maximum likelihood.

Question 4: Find the maximum likelihood estimates for the following situations:

a) You are on a bus. You can see the back of someone's head and they have long hair. Given that 20% of men have long hair and 85% of women have long hair, what is the maximum likelihood estimate for the person's sex?

b) You have observed the amount of time between mutations for 3 independent sites in a genome. These three waiting times are $D = (1, 1.5, 2.5)$. What is the likelihood expression? (ie. the probability of observing these 3 values, assuming that the mutation rate is $\lambda$).     $L(D) = p(D|\lambda) = $

c) Find the maximum likelihood estimate for the mutation rate $\lambda$ by differentiating the likelihood expression and finding the turning point. It may be easier if you use the log-likelihood.     $\hat{\lambda}: $






5) Bayesian Inference

$$ P(\theta | D) = \frac{P(D | \theta)P(\theta)}{P(D)} $$

where $\theta$ is the model/hypothesis and $D$ is the data.

Question 5: Continuing from question 4 (a), suppose that instead of being on a bus you are in an environment with mostly males (95% male). Calculate the probability of your two hypotheses that the person is (1) male or (2) female given that they have long hair. Is your maximum posterior estimate the same as your maximum likelihood estimate?

P(female | long hair) =

P(male | long hair) =

Maximum posterior estimate: