Recent Question/Assignment

attached. I need a detailed solution.
3.57 Conjugate of the positive part function. Let f (x) (x) + max{0, x} for x e R. (This function has various names, such as the positive part of x, or ReLU for Rectified Linear Unit in the context of neural networks.) What is f* ?
15.9 Optimal jamming power allocation. A set of n jammers transmit with (nonnegative) powers PI, . . . ,pn, which are to be chosen subject to the constraints
pi o, Fp i g.
The jammers produce interference power at m receivers, given by
EGijpj, i = l, . . . , m, j=l
where Gij is the (nonnegative) channel gain from jammer j to receiver i.
Receiver i has capacity (in bits/s) given by
l, . . . , m,
where a, Pi, and (Ti are positive constants. (Here Pi is proportional to the signal power at receiver i and Oi2 is the receiver i self-noise, but you wont need to know this to solve the problem.
Explain how to choose p to minimize the sum channel capacity, C = Cl + • • • + Cm, using convex optimization. (This corresponds to the most effective jamming, given the power constraints.) The problem data are F, g, G, a, (3i,
If you change variables, or transform your problem in any way that is not obvious (for example, you form a relaxation), you must explain fully how your method works, and why it gives the solution. If your method relies on any convex functions that we have not encountered before, you must show that the functions are convex.
Disclaimer. The teaching staff does not endorse jamming, optimal or otherwise.
7.29 Maximum likelihood prediction of team ability. (A more CVX-friendly tweak of problem 7.4.) A set of n teams compete in a tournament. We model each teams ability by a number aj e [0, 1]
, n. When teams j and k play each other, the probability that team j wins is equal to prob(aj — ak + v 0), where v is a symmetric random variable with density
20—1
(et/O +
where a controls the standard deviation of v. For this question, you will likely find it useful that the cumulative distribution function (CDF) of c is
et/a + e—t/a •
You are given the outcome of m past games. These are organized as

meaning that game i was played between teams j(i) and k(i); y 1 means that team j won, while y(i) —1 means that team k(i) won. (We assume there are no ties.)
(a) Formulate the problem of finding the maximum likelihood estimate of team abilities, å e RIL given the outcomes, as a convex optimization problem. You will find the game incidence matrix A e R,mxn , defined as
0 otherwise,
useful.
The prior constraints åi e [0, 1] should be included in the problem formulation. Also, we note that if a constant is added to all team abilities, there is no change in the probabilities of game outcomes. This means that å is determined only up to a constant, like a potential. But this doesnt affect the ML estimation problem, or any subsequent predictions made using the estimated parameters.
(b) Find å for the team data given in team_data. j 1, in the matrix train. (This matrix gives the outcomes for a tournament in which each team plays each other team once.) You can form A using the commands
using SparseArrays ;
Al — - sparse(l:m, train C : , 1] , train C: , 3] , m, n) ; sparse(l:m, train C : , 2] —train C : , 3] , m, n) ;

(c) Use the maximum likelihood estimate å found in part (b) to predict the outcomes of next years tournament games, given in the matrix test, using D — ak(i)). Compare these predictions with the actual outcomes, given in the third column of test. Give the fraction of correctly predicted outcomes.
The games played in train and test are the same, so another, simpler method for predicting the outcomes in test it to just assume the team that won last years match will also win this years match. Give the percentage of correctly predicted outcomes using this simple method.
18.6 Fitting a simple neural network model. A neural network is a widely used model of the form = f (x;r 0), where the n-vector .rr is the feature vector and the p-vector 0 is the model parameter. In a neural network model, the function f is not an affine function of the parameter vector 0. In this exercise we consider a very simple neural network, with two layers, three internal nodes, and two inputs (i.e., n = 2). This model has p = 13 parameters, and is given by
01+(02.T1 + 03C2 -k 04) + 05+(06X1 + 07X2 + 08) + 09+(010x1 + OllX2 + 012) -k 013
where d) : R + R is the sigmoid function defined in (18.16). This function is shown as a signal flow graph in figure 18.25. In this graph each edge from an input to an internal node, or from an internal node to the output node, corresponds to multiplication by one of the parameters. At each node (shown as the small filled circles) the incoming values and the constant offset are added together, then passed through the sigmoid function, to become the outgoing edge value.
04

011
Figure 18.25 Signal flow graph of a simple neural network.
Fitting such a model to a data set consisting of the n-vectors x(1) (N) and the associated scalar outcomes y(1) y( N) by minimizing the sum of the squares of the residuals is a nonlinear least squares problem with objective (18.4).
(a) Derive an expression for 0). Your expression can use and O, the sigmoid function and its derivative. (You do not need to express these in terms of exponentials.)
(b) Derive an expression for the derivative matrix Dr(0), where r : RP R is the vector of model fitting residuals,

Your expression can use the gradient found in part (a).
(c) Try fitting this neural network to the function g(X1, 82) = First generate N = 200 random points and take y(i) for i = 200. Use the Levenberg—Marquardt algorithm to try to minimize
f(0)
with = 10 5 . Plot the value of f and the norm of its gradient versus iteration. Report the RMS fitting error achieved by the neural network model. Experiment with choosing different starting points to see the effect on the final model found.
(d) Fit the same data set with a (linear) regression model Flu (x; ß, c) = CT{3 + and report the RMS fitting error achieved. (You can add regularization in your fitting, but it wont improve the results.) Compare the RMS fitting error with the neural network model RMS fitting error from part (c).
Remarks. Neural networks used in practice employ many more regressors, layers, and internal modes. Specialized methods and software are used to minimize the fitting objective, and evaluate the required gradients and derivatives.