Question 5 (6 marks) Recall that we discussed the problem of approximating the density of the bivariate normal distribution, p(x1, x2), using variational inference (VI) with the mean-field variational family (in the tutorial of week 12). We observed that the approximated density, q(x1,x2), using VI with the mean-field variational family is very different from the true density when the true correlation between 21 and X2 is very strong. The following figure shows the contour plots of 9(x1, x2) (in blue) and p(21, x2) (in black) where the mean is successfully captured by VI, but the variance of q(21, x2) is controlled by the direction of smallest variance of p(x1, x2), and the variance along the orthogonal direction is significantly underestimated. It is a general result that VI tends to underestimate the marginal variance. 1 Y GO 7 -3 -2 - 1 0 1 2 It is known that this result is due to the Kullback-Leibler (KL) divergence which VI uses as a measurement of difference between two functions. Recall that VI with the mean-field variational family, D = {g(x1, C2)|g(x1, x2) = 91 (21)q2(22)}, aims to find q*(21, 22) = 91 (21)47(x2) which minimises the KL divergence to the exact density p(x1,x2), q*(x1, 12) arg min 9(21,12)ED KL(9(x1, x2)||p(x1, x2)) = arg min 9(21,22)ED I 1(21,02) og (21, 23) – log p(81, 82)]dx482. 21,22 Explain why the KL divergence above results in the underestimation of the marginal variance in the approximated density.