Final answer:
The total cost for the test set is proven by the matrix trace operation tr(CTA), and in imbalanced classification problems, the normalized confusion matrix is preferred as it offers a fair comparison across classes with different sample sizes.
Step-by-step explanation:
The question addresses the concepts of a confusion matrix and a cost matrix within the context of a multi-class classification problem. In part (a), the proof involves matrix multiplication and the trace of a matrix. The trace operation, denoted as tr(), is the sum of the elements on the main diagonal of a matrix.
The total cost is calculated by multiplying the transpose of the cost matrix (C) with the confusion matrix (A), which represents the frequency of predictions, and then taking the trace of the resulting matrix. The formula tr(CTA) gives the summation of the costs weighted by their respective frequencies of occurrence.
For part (b), in an imbalanced classification problem, there is usually a preference toward the normalized confusion matrix because it provides insights into classification performance relative to the presence of classes with different sample sizes.
A normalized matrix helps to compare the performance across classes more fairly, since each row sums to 1, showing the distribution of predictions for a given true class.