Run the AIspace.org neural network learner on the data of Figure 7.1 (page 289). Suppose that you decide to use any predicted value from the neural network greater than 0.5 as true, and any value less than 0.5 as false. How many examples are misclassified initially? How many examples are misclassified after 40 iterations? How many examples are misclassified after 80 iterations? Try the same example and the same initial values, with different step sizes for the gradient descent. Try at least η = 0.1, η = 1.0, and η = 5.0. Comment on the relationship between step size and convergence. Given the final parameter values you found, give a logical formula for what each of the units is computing. You can do this by considering, for each of the units, the truth tables for the input values and by determining the output for each combination, then reducing this formula. Is it always possible to find such a formula? All of the parameters were set to different initial values. What happens if the parameter values were all set to the same (random) value? Test it out for this example, and hypothesize what occurs in general. For the neural network algorithm, comment on the following stopping criteria: Learn for a limited number of iterations, where the limit is set initially. Stop when the sum-of-squares error is less than 0.25. Explain why 0.25 may be an appropriate number. Stop when the derivatives all become within some e of zero. Split the data into training data and test data, and stop when the error on the test data increases. Which would you expect to better handle overfitting? Which criteria guar- antee the gradient descent will stop? Which criteria would guarantee that, if it stops, the network can be used to predict the test data accurately?