# (solution) .QUESTIONS 1) I- A pattern should exist II- The pattern should

.QUESTIONS
1) I- A pattern should exist
II- The pattern should not be expressed mathematically
III- Data should exist
Which of the statements above can be count as one of the basis for deciding to solve a
problem using machine learning?
a)Only II b) I and II c) I and III d) II and III e) All

****the rest of question in the attachment see it below*****

1) 2) QUESTIONS
I- A pattern should exist
II- The pattern should not be expressed mathematically
III- Data should exist
Which of the statements above can be count as one of the basis for deciding to solve a
problem using machine learning?
a)Only II
b) I and II
c) I and III
d) II and III
e) All
I- A known function is tried to be learned
II- Parameters of a problem are adjusted according to past data
III- An hypothesis is a function supposed to converge a known function
Which of the statements given above is true?
a) Only I 3) b) Only II c) I and III d)II and III e) All I- Can not be known unless learning model is known
II- Consist of parameters of the target function to be learned
III- Finite for most of the learning models
IV- Consist of the probabilities to be chosen of training
samples
V- Consist of candidate solutions for the problem
Which of the statement given above is true for a hypothesis set?
a) I and II
b) I and III
c) I and V
d)Only V
e) I, III and V 4) I-Target function II- Training samples III- Learning Algorithm IV- Hypothesis set
Which of the given concepts in the options above can be controlled by the designer
when applying machine learning for solving a problem?
a) I and II
b) II and III
c) III and IV
d) I and IV
e) Only III
5) I- Target function
II- Training samples
III-Hypothesis set
IV- Learning algorithm
If we know the learning model, which of the above options can be known?
a) I and II
b) II and III
c) III and IV
d) I and IV
e) Only III 6) I- Determining at what age a medical operation is suitable to take.
II- Classifying the given numbers as prime or not prime
III- Determining possible credit card fraud on credit card receipts
IV- Determining how long it takes for a falling object to crash on the ground
V- Determining the best cycle of the traffic lights in a congested crossroad.
Which of the given problems are suitable for solving with machine learning?
a) I, II and V
b)II,III and IV c) I, III and V d) III and IV
e)III and V 7) Which is the updated weight vector W of a perceptron model having initial weight vector W
as [0.4, -0.2, 0.2] and bias as 0.1 when X pattern [0.5, 1, 0.5] is shown to it and desired output
Y is 0?
a) W= [ 1.3, -0.2, 1.6]
b) W=[0.9, 0.8, 0.7] c) W= [-0.1, -1.2, -0.3]
d) W=[0.2, -0.5, 0.4]
e) W=[0.6, -1.4, 0.9] 8) Find the in sample error Ein of a linear regression model that uses mean squared error as
error measurement in the case that model takes input matrix X including 3 training samples
and gives output vector y using weight vector w. (Put true option in a circle)
1 3
3.2
0.4
= 5 7 = 8.4
=
0.8
9 8
9.3
a) 0.56 9) b) 1.18 c) 1.29 d)0.43 e)2.13 When a linear regression model takes input matrix X including 2 training samples it gives
output vector y. In this case find the optimal weight vector w of this linear regression model
that will give minimum in sample error Ein. (Put true option in a circle)
2 4
9
=
=
6 8
10
a) 2.5
?4.5 b) ?0.25
3.75 c) ?4
4.25 d) 0.7
?0.75 e) 0.25
?2.75 10) What is the maximum number of dichotomies for a hypothesis set that classifies the 8 points
of X space as +1 or -1. (Put true option in a circle)
a) 64
b) 65
c)127
d)128
e)256 11) One dimensional hypothesis in an hypothesis set H classifies all the points as +1 that are in
the range specified by two points and -1 for the points out of this range. What is the
maximum number of dichotomies of this hypothesis for 20 points?
a)211 12) b)266 c) 331 d) 388 e) 412 Which of the following procedures is sufficient and necessary and most efficient
for proving that the VC dimension of a learner is N?
a) Show that the classifier can shatter all possible dichotomies with N points.
b) Show that the classifier can shatter a subset of all possible dichotomies with N points.
c) Show that the classifier can shatter all possible dichotomies with N points and that it
cannot shatter any of the dichotomies with N+1 points.
d) Show that the classifier can shatter all possible dichotomies with N points and that it
cannot shatter one of the dichotomies with N+1 points.
e) Show that the classifier can shatter a subset of all possible dichotomies with N points
and that it cannot shatter one of the dichotomies with N+1 points. 13) What is the maximum number of dichotomies for a machine learning model on 6 points that
has VC dimension as 4?
a) 24 14) b)36 c) 57 d) 86 e)112 A neuron with 4 inputs has the weight vector w = [1, 2, 3, 4]T and a bias = 0 (zero). The
activation function is linear, where the constant of proportionality equals 2, that is, the
activation function is given by f(net) = 2 × net. If the input vector is x = [4, 8, 5, 6]T then the
output of the neuron will be
a) 1 b)56 c)59 d)112 e)118 15) A perceptron with signal function has two inputs with weights w1 = 0.5 and w2 = ?0.2, and a
bias = 0.3. For a given training example x = [0, 1]T , the desired output is 1. Does the
a) Yes b)No 16) VC (Vapnik?Chervonenkis) bound, where it builds a bridge between what we learn in the
training set and how it performs in the test set. The simplified VC bound is as below (in the
big O notation): According to VC bound specify the following statements as TRUE of FALSE putting one option
in a circle.
( TRUE / FALSE ) I- If the complexity of the model increases, in sample error increases.
( TRUE / FALSE ) II- If the complexity of the model increases, upper bound of the
generalization gap between in sample error and out of sample error
increases.
( TRUE / FALSE ) III- If number of samples in training set increases, upper bound of the
generalization gap between in sample error and out of sample error
increases.
( TRUE / FALSE ) IV- If dimensionality of feature vectors increases, complexity of the model
decreases and this leads increase of in sample error
( TRUE / FALSE ) V- If we get high in sample error we should increase complexity of the
learning model.
( TRUE / FALSE ) VI- If we get small in sample error but high , upper bound of the
generalization gap between in sample error and out of sample error then
we should either increase complexity of the learning model or increase
the number of samples in tarining set.
( TRUE / FALSE ) VII- When the learning model and feature dimensionality are fixed,
increasing the size of the training set generally improves out of sample
error.
( TRUE / FALSE ) VIII- When the learning model is fixed for having same , upper bound of the
generalization gap between in sample error and out of sample error,
we should increase number of samples in training set.
17) In the backpropagation algorithm, how is the error function usually defined?
1
2
1
b) 2
1
c) 2
1
d) 2
1
e) 2 a) ? ? ? ? 2 2 Answer the questions numbered between 18 and 23 according to the feed forward network
given below. 18) A training pattern, consisting of an input vector x = [x1, x2, x3]T and desired outputs
t=[t1,t2,t3]T , is presented to the following neural network. What is the usual sequence of
events for training the network using the backpropagation algorithm?
a)(1) calculate yj = f(Hj ), (2) calculate zk = f(Ik), (3) update vji, (4) update wkj .
b)(1) calculate yj = f(Hj ), (2) calculate zk = f(Ik), (3) update wkj , (4) update vji.
c) (1) calculate yj = f(Hj ), (2) update vji, (3) calculate zk = f(Ik), (4) update wkj .
d)(1) calculate zk = f(Ik), (2) update wkj , (3) calculate yj = f(Hj ), (4) update vji.
e) (1) calculate zk = f(Ik), (2) calculate yj = f(Hj ), (3) update wkj , (4) update vji. 19) For the same neural network, the input vector to the network is x=[x1,x2,x3]T , the vector of
hidden layer outputs is y = [y1, y2]T , the vector of actual outputs is z=[z1,z2, z3]T , and the
vector of desired outputs is t = [t1, t2, t3]T . The network has the following weight vectors: Assume that all units have sigmoid activation functions given by and that each unit has a bias b = 0 (zero). If the network is tested with an input vector
x=[1.0,2.0, 3.0]T then the output y1 of the first hidden neuron will be
a) -2.300
20) b) 0.091 c) 0.644 d) 0.993 e) 4.900 Assuming exactly the same neural network and the same input vector as in the previous
question, what is the activation I2 of the second output neuron?
a) 0.353 b) 0.387 c) 0.596 d) 0.662 e) 0.674 21) For the hidden units of the network, the generalized Delta rule can be written as where ?vji is the change to the weights from unit i to unit j, ? is the learning rate, ?j is the
error term for unit j, and xi is the ith input to unit j. In the backpropagation algorithm, what is
the error term ?j?
a) = ? ( ? )
b) = ? ( ? )
c) = ? ?
d) = where f' (net) is the derivative of the activation function f(net).
22) For the hidden units of the network, the generalized Delta rule can be written as where ?Wkj is the change to the weights from unit j to unit k, ? is the learning rate, ?k is the
error term for unit k, and yj is the jth input to unit k. In the backpropagation algorithm, what is
the error term ?k?
a) = ? ( ? )
b) = ? ( ? )
c) = ? ?
d) = where f' (net) is the derivative of the activation function f(net).
23) The following figure shows part of the neural network. A new input pattern is presented to
the network and training proceeds as follows. The actual outputs of the network are given by
z=[0.32, 0.05, 0.67]T and the corresponding target outputs are given by t = [1.00, 1.00, 1.00]T
The weights w12, w22 and w32 are also shown below. For the output units, the derivative of the sigmoid function can be rewritten as ? = . (1 ? What is the error for each of the output units?
a)
b)
c)
d) output 1 = ?0.2304, output 2 = 0.3402, and output 3 = ?0.8476. output 1 = 0.1084, output 2 = 0.1475, and output 3 = 0.1054. output 1 = 0.1480, output 2 = 0.0451, and output 3 = 0.0730. output 1 = 0.4225, output 2 = ?0.1056, and output 3 = 0.1849. 