1. Based on the following confusion matrix Cluster Entertainment

1. Based on the following confusion matrix Cluster Entertainment Financial Foreign Metro National Sports Total #1 1 1 0 11 4 676 693 #2 27 89 333 827 253 33 1562 #3 326 465 8 105 16 29 949 Total 354 555 341 943 273 738 3204 a) Compute the entropy and purity of each cluster and the overall clustering. Purity is computed as counting the number of the class which is the most frequent in the cluster dividing by the total number. b) Compute the precision of the class "Sports" with respect to each cluster and overall clustering. Precision measures the proportion of the correct pages returned to all the pages returned. c) Compute the recall of the class "Entertainment" with respect to each cluster and overall clustering. Recall measures the proportion of the correct pages returned to all the correct pages available on the Web. Hint: Definitions from handout2, Precision measures the proportion of the correct pages returned to all the pages returned, Recall measures the proportion of the correct pages returned to all the correct pages available on the Web. How to calculate the recall of the overall clusters, SUM (the cluster total #/overall total * recall in each cluster) 2. Here is a dataset, using Eucledian Distance Measure and the 5-nearest neighbor techniques, predict the following new data is likely to be in which class based on simple majority voting. (Hint, combine the weight and height attributes into a ratio=weight/height, which could simply the 2 dimensions data into 1 dimension). If you want to use the to calculate, you need to normalize first (since weight is in the range of [130, 289] and height is in the range of [1.5, 2.4], which are not the same measurement). Name Weight Height Class Kristina 160 lb 1.6 m Average Jim 210 lb 2.0 m Average Maggie 207 lb 1.9 m Average Martha 130 lb 1.8 m Underweight Stephanie 221 lb 1.7 m Overweight Bob 215 lb 1.8 m Average Kathy 178 lb 1.6 m Average Dave 138 lb 1.7 m Underweight Worth 160 lb 2.2 m Underweight Steven 190 lb 2.1 m Average Debbie 234 lb 1.8 m Overweight Todd 285 lb 1.9 m Overweight Kim 135 lb 1.9 m Underweight Amy 198 lb 1.8 m Average Lynette 289 lb 1.7 m Overweight a) John [185 lb, 2.0 m] b) Kelly [165 lb, 1.5 m] c) Sam [180 lb, 2.4 m] d) Laura [195 lb, 1.8 m] e) Mike [220 lb, 1.7 m]

Solution details:

