📋 Exam Information

Item	Details
Total Points	100
Time Allowed	90 minutes
Format	Closed book, calculator allowed
Structure	Q1 (20pts) + Q2 (30pts, choose 3/5) + Q3 (25pts) + Q4 (25pts)

Question 1: Python Output Analysis (20 points)

Answer ALL questions. Determine exact output.

Q1.1 (5 points)

x = 15
y = 4
print((x // y) ** 2 + x % y)

💡 Click to View Answer & Explanation

Step-by-step breakdown:

# Given values
x = 15
y = 4
 
# Step 1: Floor division x // y
# 15 // 4 = 3 (integer part of 15/4 = 3.75)
floor_result = 15 // 4  # = 3
 
# Step 2: Modulo x % y  
# 15 % 4 = 3 (remainder when 15 divided by 4)
# 15 = 4 × 3 + 3, so remainder is 3
mod_result = 15 % 4  # = 3
 
# Step 3: Power (floor_result) ** 2
# 3 ** 2 = 9
power_result = 3 ** 2  # = 9
 
# Step 4: Addition
# 9 + 3 = 12
final = 9 + 3  # = 12

Answer: 12

Key operators explained:

Operator	Name	Example
`//`	Floor division	15 // 4 = 3
`%`	Modulo	15 % 4 = 3
`**`	Exponentiation	3 ** 2 = 9

Q1.2 (5 points)

numbers = [10, 20, 30, 40, 50]
numbers[1:4] = [100]
print(len(numbers))
print(numbers[2])

💡 Click to View Answer & Explanation

Step-by-step breakdown:

# Original list
numbers = [10, 20, 30, 40, 50]
# Indices:   0   1   2   3   4
 
# Slice assignment: numbers[1:4] = [100]
# This replaces elements at indices 1, 2, 3 with a single element 100
# Before: [10, 20, 30, 40, 50]
#              ^^^^^^^^^^^^ <- indices 1:4 (elements 20, 30, 40)
# After:  [10, 100, 50]
#              ^^^ <- replaced with single element
 
# Result after slice assignment
# numbers = [10, 100, 50]
# Indices:    0    1   2
 
# len(numbers) = 3 (was 5, replaced 3 elements with 1)
# numbers[2] = 50 (third element)

Answers:

len(numbers) → 3
numbers[2] → 50

Important concept: Slice assignment can change list size! Here we replaced 3 elements (indices 1, 2, 3) with 1 element, reducing length from 5 to 3.

Q1.3 (5 points)

def mystery(a, b=5, c=10):
    return a * 2 + b - c
 
result = mystery(3, c=4)
print(result)

💡 Click to View Answer & Explanation

Step-by-step breakdown:

# Function definition
def mystery(a, b=5, c=10):
    # a: required parameter
    # b: optional, default = 5
    # c: optional, default = 10
    return a * 2 + b - c
 
# Function call: mystery(3, c=4)
# a = 3 (positional argument, first position)
# b = 5 (uses default value, NOT provided in call)
# c = 4 (keyword argument, overrides default of 10)
 
# Calculation:
# a * 2 + b - c
# = 3 * 2 + 5 - 4
# = 6 + 5 - 4
# = 7

Answer: 7

Key concept: Keyword arguments (c=4) allow you to skip over parameters with defaults. Here b uses its default value of 5.

Q1.4 (5 points)

data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
total = 0
for key in data:
    total += sum(data[key])
print(total)

💡 Click to View Answer & Explanation

Step-by-step breakdown:

# Dictionary with lists as values
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
 
total = 0
 
# Iterating over a dictionary gives KEYS, not values
for key in data:  # key will be 'A', then 'B'
    
    # First iteration: key = 'A'
    # data['A'] = [1, 2, 3]
    # sum([1, 2, 3]) = 6
    # total = 0 + 6 = 6
    
    # Second iteration: key = 'B'
    # data['B'] = [4, 5, 6]
    # sum([4, 5, 6]) = 15
    # total = 6 + 15 = 21
    
    total += sum(data[key])
 
print(total)  # 21

Answer: 21

Calculation summary:

sum([1, 2, 3]) = 6
sum([4, 5, 6]) = 15
Total = 6 + 15 = 21

Question 2: Code Writing (30 points)

Choose 3 out of 5 questions. Each worth 10 points.

Q2.1 - Grade Calculator (10 points)

Write a function grade_calculator(score) that:

Returns letter grade: 90+ → "A", 80+ → "B", 70+ → "C", 60+ → "D", <60 → "F"
Returns "Invalid" for scores < 0 or > 100

💡 Click to View Verified Answer

def grade_calculator(score):
    """
    Convert numeric score to letter grade.
    
    Args:
        score: Numeric score (expected range: 0-100)
        
    Returns:
        str: Letter grade (A/B/C/D/F) or "Invalid" for out-of-range scores
    
    Examples:
        >>> grade_calculator(95)
        'A'
        >>> grade_calculator(-5)
        'Invalid'
    """
    # STEP 1: Validate input range FIRST
    # Must check invalid cases before checking grade ranges
    if score < 0 or score > 100:
        return "Invalid"
    
    # STEP 2: Check grades from highest to lowest
    # Using elif ensures only one condition matches
    if score >= 90:
        return "A"  # 90-100
    elif score >= 80:
        return "B"  # 80-89
    elif score >= 70:
        return "C"  # 70-79
    elif score >= 60:
        return "D"  # 60-69
    else:
        return "F"  # 0-59
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [
        (95, "A"),
        (85, "B"),
        (73, "C"),
        (65, "D"),
        (45, "F"),
        (-5, "Invalid"),
        (105, "Invalid"),
        (100, "A"),  # Edge case: exactly 100
        (0, "F"),    # Edge case: exactly 0
    ]
    
    print("Testing grade_calculator:")
    for score, expected in test_cases:
        result = grade_calculator(score)
        status = "✓" if result == expected else "✗"
        print(f"  {status} grade_calculator({score}) = {result} (expected {expected})")

Test Output:

Testing grade_calculator:
  ✓ grade_calculator(95) = A (expected A)
  ✓ grade_calculator(85) = B (expected B)
  ✓ grade_calculator(73) = C (expected C)
  ✓ grade_calculator(65) = D (expected D)
  ✓ grade_calculator(45) = F (expected F)
  ✓ grade_calculator(-5) = Invalid (expected Invalid)
  ✓ grade_calculator(105) = Invalid (expected Invalid)
  ✓ grade_calculator(100) = A (expected A)
  ✓ grade_calculator(0) = F (expected F)

Common mistakes to avoid:

Not validating input range first
Using multiple if statements instead of elif
Checking in wrong order (e.g., 60+ before 90+)

Q2.2 - Remove Duplicates (10 points)

Write a function remove_duplicates(lst) that:

Removes duplicates from a list
Preserves the order of first occurrence
Example: [1, 2, 2, 3, 1, 4] → [1, 2, 3, 4]

💡 Click to View Verified Answer

def remove_duplicates(lst):
    """
    Remove duplicate elements while preserving order of first occurrence.
    
    Args:
        lst: Input list with possible duplicates
        
    Returns:
        list: New list with duplicates removed, order preserved
        
    Examples:
        >>> remove_duplicates([1, 2, 2, 3, 1, 4])
        [1, 2, 3, 4]
    """
    # Track elements we've already seen
    seen = []
    
    # Iterate through original list
    for item in lst:
        # Only add to result if not seen before
        if item not in seen:
            seen.append(item)
    
    return seen
 
 
# Alternative approach using dictionary (Python 3.7+ preserves order)
def remove_duplicates_v2(lst):
    """
    Remove duplicates using dict.fromkeys() - more efficient for large lists.
    Works because dictionaries preserve insertion order in Python 3.7+.
    """
    return list(dict.fromkeys(lst))
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [
        [1, 2, 2, 3, 1, 4],        # Basic case
        [5, 5, 5, 5],              # All duplicates
        [1, 2, 3, 4],              # No duplicates
        [],                        # Empty list
        ['a', 'b', 'a', 'c'],      # Strings
    ]
    
    print("Testing remove_duplicates:")
    for test in test_cases:
        result = remove_duplicates(test)
        print(f"  {test} → {result}")

Test Output:

Testing remove_duplicates:
  [1, 2, 2, 3, 1, 4] → [1, 2, 3, 4]
  [5, 5, 5, 5] → [5]
  [1, 2, 3, 4] → [1, 2, 3, 4]
  [] → []
  ['a', 'b', 'a', 'c'] → ['a', 'b', 'c']

Why not use set()? Sets don't preserve order! list(set([1, 2, 2, 3, 1, 4])) might give [1, 2, 3, 4] but order is NOT guaranteed.

Q2.3 - Fibonacci Sequence (10 points)

Write a function fibonacci(n) that:

Returns the first n Fibonacci numbers as a list
Sequence: 0, 1, 1, 2, 3, 5, 8, 13...

💡 Click to View Verified Answer

def fibonacci(n):
    """
    Generate the first n Fibonacci numbers.
    
    Fibonacci sequence: Each number is the sum of the two preceding ones.
    Starts with 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ...
    
    Args:
        n: Number of Fibonacci numbers to generate (non-negative integer)
        
    Returns:
        list: First n Fibonacci numbers
        
    Examples:
        >>> fibonacci(5)
        [0, 1, 1, 2, 3]
        >>> fibonacci(0)
        []
    """
    # Handle edge cases
    if n <= 0:
        return []  # No numbers requested
    if n == 1:
        return [0]  # Only first number
    
    # Initialize with first two Fibonacci numbers
    result = [0, 1]
    
    # Generate remaining numbers
    for i in range(2, n):
        # Each new number = sum of last two numbers
        # Using negative indexing: result[-1] is last, result[-2] is second-to-last
        next_num = result[-1] + result[-2]
        result.append(next_num)
    
    return result
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [0, 1, 2, 5, 8, 10]
    
    print("Testing fibonacci:")
    for n in test_cases:
        result = fibonacci(n)
        print(f"  fibonacci({n}) = {result}")

Test Output:

Testing fibonacci:
  fibonacci(0) = []
  fibonacci(1) = [0]
  fibonacci(2) = [0, 1]
  fibonacci(5) = [0, 1, 1, 2, 3]
  fibonacci(8) = [0, 1, 1, 2, 3, 5, 8, 13]
  fibonacci(10) = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

How it works:

Position:  0   1   2   3   4   5   6   7
Value:     0   1   1   2   3   5   8   13
                 ↑   ↑
           0+1=1  1+1=2  1+2=3  2+3=5  3+5=8

Q2.4 - Prime Number Check (10 points)

Write a function is_prime(num) that:

Returns True if num is prime, False otherwise
Handle edge cases (num < 2)

💡 Click to View Verified Answer

def is_prime(num):
    """
    Check if a number is prime.
    
    A prime number is a natural number greater than 1 that has no positive 
    divisors other than 1 and itself.
    
    Args:
        num: Integer to check
        
    Returns:
        bool: True if prime, False otherwise
        
    Examples:
        >>> is_prime(7)
        True
        >>> is_prime(12)
        False
    """
    # Numbers less than 2 are not prime by definition
    # This handles 0, 1, and negative numbers
    if num < 2:
        return False
    
    # 2 is the only even prime number
    if num == 2:
        return True
    
    # All other even numbers are not prime
    # (They're divisible by 2)
    if num % 2 == 0:
        return False
    
    # Check odd divisors from 3 up to √num
    # Why √num? If num = a × b, at least one of a,b must be ≤ √num
    # If no divisor found up to √num, num is prime
    for i in range(3, int(num ** 0.5) + 1, 2):  # Step by 2 (odd numbers only)
        if num % i == 0:
            return False  # Found a divisor, not prime
    
    return True  # No divisors found, it's prime
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [
        (1, False),   # Not prime (less than 2)
        (2, True),    # Prime (smallest prime)
        (3, True),    # Prime
        (4, False),   # Not prime (2 × 2)
        (7, True),    # Prime
        (9, False),   # Not prime (3 × 3)
        (11, True),   # Prime
        (25, False),  # Not prime (5 × 5)
        (29, True),   # Prime
        (-5, False),  # Negative, not prime
    ]
    
    print("Testing is_prime:")
    for num, expected in test_cases:
        result = is_prime(num)
        status = "✓" if result == expected else "✗"
        print(f"  {status} is_prime({num}) = {result}")

Test Output:

Testing is_prime:
  ✓ is_prime(1) = False
  ✓ is_prime(2) = True
  ✓ is_prime(3) = True
  ✓ is_prime(4) = False
  ✓ is_prime(7) = True
  ✓ is_prime(9) = False
  ✓ is_prime(11) = True
  ✓ is_prime(25) = False
  ✓ is_prime(29) = True
  ✓ is_prime(-5) = False

Optimization: Checking up to √n instead of n reduces time complexity from O(n) to O(√n).

Q2.5 - Tuple Statistics (10 points)

Write a function tuple_stats(data) that:

Input: tuple of numbers
Return: tuple of (min, max, average rounded to 2 decimals)

💡 Click to View Verified Answer

def tuple_stats(data):
    """
    Calculate statistics for a tuple of numbers.
    
    Args:
        data: Tuple of numeric values
        
    Returns:
        tuple: (minimum, maximum, average) where average is rounded to 2 decimals
        
    Raises:
        ValueError: If tuple is empty
        
    Examples:
        >>> tuple_stats((10, 20, 30, 40))
        (10, 40, 25.0)
    """
    # Handle empty tuple edge case
    if len(data) == 0:
        raise ValueError("Cannot compute stats for empty tuple")
    
    # Calculate statistics using built-in functions
    minimum = min(data)        # Smallest value
    maximum = max(data)        # Largest value
    average = sum(data) / len(data)  # Arithmetic mean
    
    # Round average to 2 decimal places
    average = round(average, 2)
    
    # Return as tuple (note: using parentheses to make it clear)
    return (minimum, maximum, average)
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [
        (10, 20, 30, 40),      # Even spread
        (5, 15, 25),           # Odd count
        (7,),                  # Single element
        (1, 2, 3, 4, 5, 6, 7, 8, 9, 10),  # Larger tuple
    ]
    
    print("Testing tuple_stats:")
    for data in test_cases:
        result = tuple_stats(data)
        print(f"  tuple_stats({data})")
        print(f"    → (min={result[0]}, max={result[1]}, avg={result[2]})")

Test Output:

Testing tuple_stats:
  tuple_stats((10, 20, 30, 40))
    → (min=10, max=40, avg=25.0)
  tuple_stats((5, 15, 25))
    → (min=5, max=25, avg=15.0)
  tuple_stats((7,))
    → (min=7, max=7, avg=7.0)
  tuple_stats((1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
    → (min=1, max=10, avg=5.5)

Question 3: Pandas & ML Basics (25 points)

Part A: Theory (10 points)

Q3.A1 (5 points) Explain the difference between fit() and predict() in scikit-learn.

💡 Click to View Answer

Method	Purpose	When Called	What It Does
`fit()`	Train the model	Once, on training data	Learns patterns/parameters from data
`predict()`	Use the model	On test/new data	Applies learned patterns to make predictions

Workflow example:

# Step 1: Create model
model = DecisionTreeClassifier()
 
# Step 2: Train model (learn from training data)
model.fit(X_train, y_train)  # Learns patterns
 
# Step 3: Use model (apply to new data)
predictions = model.predict(X_test)  # Makes predictions

Analogy:

fit() = studying for an exam
predict() = taking the exam

Q3.A2 (5 points) Why do we need train-test split? Why not use all data for training?

💡 Click to View Answer

Why train-test split is essential:

Evaluate on unseen data: We need to test how the model performs on data it hasn't seen during training.
Detect overfitting: If we train and test on the same data, the model might just memorize the answers (overfitting). Train-test split reveals if the model generalizes well.
Simulate real-world usage: In production, the model will encounter new, unseen data. Testing on held-out data simulates this.
Get honest performance estimate: Training accuracy is often misleadingly high; test accuracy gives a realistic measure.

What happens without split:

Model could achieve 100% accuracy on training data
But fail completely on new data
No way to detect this problem until deployment

Typical split ratios:

80/20 (training/test)
70/30 (training/test)

Part B: Pandas Code (15 points)

Given this DataFrame:

import pandas as pd
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, None, 28],
    'Salary': [50000, 60000, 75000, 65000, None],
    'Department': ['IT', 'HR', 'IT', 'Finance', 'HR']
}
df = pd.DataFrame(data)

Q3.B1 (5 points) Fill missing Age with mean, missing Salary with 55000.

💡 Click to View Verified Answer

import pandas as pd
 
# Create the DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, None, 28],
    'Salary': [50000, 60000, 75000, 65000, None],
    'Department': ['IT', 'HR', 'IT', 'Finance', 'HR']
}
df = pd.DataFrame(data)
 
print("Before filling:")
print(df)
print()
 
# Method 1: Using fillna with inplace=True
# Fill missing Age with mean
age_mean = df['Age'].mean()  # Calculate mean first (ignores NaN)
print(f"Age mean (excluding NaN): {age_mean}")  # = 29.5
df['Age'].fillna(age_mean, inplace=True)
 
# Fill missing Salary with 55000
df['Salary'].fillna(55000, inplace=True)
 
print("\nAfter filling:")
print(df)

Output:

Before filling:
      Name   Age   Salary Department
0    Alice  25.0  50000.0         IT
1      Bob  30.0  60000.0         HR
2  Charlie  35.0  75000.0         IT
3    David   NaN  65000.0    Finance
4      Eve  28.0      NaN         HR

Age mean (excluding NaN): 29.5

After filling:
      Name   Age   Salary Department
0    Alice  25.0  50000.0         IT
1      Bob  30.0  60000.0         HR
2  Charlie  35.0  75000.0         IT
3    David  29.5  65000.0    Finance
4      Eve  28.0  55000.0         HR

Note: mean() automatically ignores NaN values when calculating.

Q3.B2 (5 points) Calculate average salary by Department.

💡 Click to View Verified Answer

# Group by Department and calculate mean of Salary
avg_salary_by_dept = df.groupby('Department')['Salary'].mean()
 
print("Average Salary by Department:")
print(avg_salary_by_dept)

Output (after filling missing values):

Average Salary by Department:
Department
Finance    65000.0
HR         57500.0
IT         62500.0
Name: Salary, dtype: float64

Calculation breakdown:

Finance: 65000 (only David)
HR: (60000 + 55000) / 2 = 57500 (Bob + Eve)
IT: (50000 + 75000) / 2 = 62500 (Alice + Charlie)

Q3.B3 (5 points) Select IT employees with Age > 26.

💡 Click to View Verified Answer

# Filter with multiple conditions
# IMPORTANT: Use & for AND, | for OR
# IMPORTANT: Wrap each condition in parentheses
result = df[(df['Department'] == 'IT') & (df['Age'] > 26)]
 
print("IT employees with Age > 26:")
print(result)

Output:

IT employees with Age > 26:
      Name   Age   Salary Department
2  Charlie  35.0  75000.0         IT

Syntax rules for pandas filtering:

Use & instead of and
Use | instead of or
Use ~ instead of not
Wrap each condition in parentheses

Wrong: df[df['A'] == 1 and df['B'] == 2] Correct: df[(df['A'] == 1) & (df['B'] == 2)]

Question 4: Decision Tree & Naive Bayes (25 points)

Part A: Theory (10 points)

Q4.A1 (5 points) List THREE advantages of Decision Trees.

💡 Click to View Answer

Easy to interpret and visualize
- Can draw the tree and follow decision paths
- Non-technical stakeholders can understand the logic
- "If-then" rules are intuitive
No feature scaling required
- Works directly with raw data values
- Unlike SVM or KNN, doesn't need normalization
- Saves preprocessing time
Handles both numerical and categorical data
- Can split on continuous values (Age > 30)
- Can split on categories (Color == 'Red')
- Versatile for mixed datasets
Captures non-linear relationships
- Can model complex decision boundaries
- Doesn't assume linear separability
Shows feature importance
- Reveals which features matter most
- Helps with feature selection

Q4.A2 (5 points) Gini Index vs Information Gain. Which does CART use?

💡 Click to View Answer

Metric	Formula	Range (binary)	Used By
Gini Index	1 - Σ(pᵢ²)	0 to 0.5	CART
Entropy/Information Gain	-Σ(pᵢ log₂ pᵢ)	0 to 1	ID3, C4.5

CART (Classification and Regression Trees) uses Gini Index.

Why Gini?

Faster to compute (no logarithm)
Similar results to entropy in practice
Slightly favors larger partitions

Interpretation:

Gini = 0 → Pure node (all same class)
Gini = 0.5 → Maximum impurity (50/50 split)

Part B: Gini Calculation (15 points)

Scenario: Email classification with 20 emails (12 Spam, 8 Not Spam)

Split 1 - "Contains free":

Contains "free": 10 emails (9 Spam, 1 Not Spam)
No "free": 10 emails (3 Spam, 7 Not Spam)

Split 2 - "Contains meeting":

Contains "meeting": 8 emails (2 Spam, 6 Not Spam)
No "meeting": 12 emails (10 Spam, 2 Not Spam)

Q4.B1 (8 points) Calculate Gini Index for Split 1.

💡 Click to View Verified Answer

Formula: Gini = 1 - Σ(pᵢ²)

Step 1: Gini for "Contains free" node (10 emails: 9 Spam, 1 Not Spam)

P(Spam) = 9/10 = 0.9
P(Not Spam) = 1/10 = 0.1

Gini = 1 - (0.9² + 0.1²)
     = 1 - (0.81 + 0.01)
     = 1 - 0.82
     = 0.18

Step 2: Gini for "No free" node (10 emails: 3 Spam, 7 Not Spam)

P(Spam) = 3/10 = 0.3
P(Not Spam) = 7/10 = 0.7

Gini = 1 - (0.3² + 0.7²)
     = 1 - (0.09 + 0.49)
     = 1 - 0.58
     = 0.42

Step 3: Weighted Average Gini

Gini(Split 1) = (10/20) × 0.18 + (10/20) × 0.42
              = 0.5 × 0.18 + 0.5 × 0.42
              = 0.09 + 0.21
              = 0.30

Answer: Split 1 Gini = 0.30

Q4.B2 (7 points) Calculate Gini Index for Split 2. Which split is better?

💡 Click to View Verified Answer

Step 1: Gini for "Contains meeting" node (8 emails: 2 Spam, 6 Not Spam)

P(Spam) = 2/8 = 0.25
P(Not Spam) = 6/8 = 0.75

Gini = 1 - (0.25² + 0.75²)
     = 1 - (0.0625 + 0.5625)
     = 1 - 0.625
     = 0.375

Step 2: Gini for "No meeting" node (12 emails: 10 Spam, 2 Not Spam)

P(Spam) = 10/12 = 0.833
P(Not Spam) = 2/12 = 0.167

Gini = 1 - (0.833² + 0.167²)
     = 1 - (0.694 + 0.028)
     = 1 - 0.722
     = 0.278

Step 3: Weighted Average Gini

Gini(Split 2) = (8/20) × 0.375 + (12/20) × 0.278
              = 0.4 × 0.375 + 0.6 × 0.278
              = 0.15 + 0.167
              = 0.317

Answer: Split 2 Gini = 0.317

Comparison:

Split	Gini Index
Split 1 ("free")	0.30 ← Better
Split 2 ("meeting")	0.317

Better split: Split 1 ("Contains free")

Reason: Lower Gini = Lower impurity = Better separation of classes

🏁 End of Exam

Question	Topic	Points
Q1	Python Output Analysis	20
Q2	Code Writing (choose 3/5)	30
Q3	Pandas & ML Theory	25
Q4	Decision Tree & Gini	25
Total		100

📝 Key Formulas Reference

Concept	Formula
Gini Index	1 - Σ(pᵢ²)
Entropy	-Σ pᵢ log₂(pᵢ)
Info Gain	H(parent) - Σ weighted H(children)
Bayes	P(A\|B) ∝ P(B\|A) × P(A)
Z-score	(x - μ) / σ
MinMax	(x - min) / (max - min)

All code verified and tested. Show your work for partial credit. Good luck!

ABW505 Mock Exam 2 - Python & Machine Learning

📋 Exam Information

Item	Details
Total Points	100
Time Allowed	90 minutes
Format	Closed book, calculator allowed
Structure	Q1 (20pts) + Q2 (30pts, choose 3/5) + Q3 (25pts) + Q4 (25pts)

Question 1: Python Output Analysis (20 points)

Answer ALL. Determine exact output.

Q1.1 (5 points)

a = [1, 2, 3]
b = a
b.append(4)
print(a)
print(a is b)

💡 Click to View Answer & Explanation

Step-by-step breakdown:

# Step 1: Create a list and assign to variable 'a'
a = [1, 2, 3]
# Memory: a points to list object [1, 2, 3]
 
# Step 2: Assign 'a' to 'b'
b = a
# IMPORTANT: This does NOT copy the list!
# Both 'a' and 'b' now point to the SAME list object in memory
# Memory: a → [1, 2, 3] ← b
 
# Step 3: Modify list through 'b'
b.append(4)
# Since a and b point to the same object, 
# the change is visible through both variables
# Memory: a → [1, 2, 3, 4] ← b
 
# Step 4: Print results
print(a)        # [1, 2, 3, 4] - modified through b
print(a is b)   # True - same object in memory

Answers:

print(a) → [1, 2, 3, 4]
print(a is b) → True

Key concept: In Python, assignment creates a REFERENCE, not a copy.

To create an independent copy:

b = a.copy()     # Method 1: copy() method
b = a[:]         # Method 2: slice notation
b = list(a)      # Method 3: list constructor

Q1.2 (5 points)

text = "Hello World"
print(text[0:5:2])
print(text[-5:-1])

💡 Click to View Answer & Explanation

Step-by-step breakdown:

text = "Hello World"
 
# Index map:
# Character: H   e   l   l   o       W   o   r   l   d
# Positive:  0   1   2   3   4   5   6   7   8   9   10
# Negative:-11 -10  -9  -8  -7  -6  -5  -4  -3  -2  -1
 
# Line 1: text[0:5:2]
# Format: [start:stop:step]
# start=0 (H), stop=5 (exclusive), step=2 (every 2nd char)
# Indices: 0, 2, 4 → Characters: 'H', 'l', 'o'
result1 = text[0:5:2]  # "Hlo"
 
# Line 2: text[-5:-1]
# start=-5 (W), stop=-1 (exclusive, before 'd')
# Indices: -5, -4, -3, -2 → Characters: 'W', 'o', 'r', 'l'
result2 = text[-5:-1]  # "Worl"

Answers:

text[0:5:2] → Hlo
text[-5:-1] → Worl

Slicing syntax: [start:stop:step]

start: inclusive (default: 0)
stop: exclusive (default: end)
step: increment (default: 1)

Q1.3 (5 points)

def outer():
    x = 10
    def inner():
        nonlocal x
        x += 5
        return x
    return inner()
 
print(outer())
print(outer())

💡 Click to View Answer & Explanation

Step-by-step breakdown:

def outer():
    x = 10  # Local variable in outer's scope
    
    def inner():
        nonlocal x  # Refers to x in enclosing (outer) scope
        x += 5      # Modify outer's x: 10 + 5 = 15
        return x    # Return 15
    
    return inner()  # Call inner() and return its result
 
# First call: outer()
# - x starts at 10 in a NEW local scope
# - inner() adds 5: x = 15
# - Returns 15
print(outer())  # 15
 
# Second call: outer()
# - FRESH call creates NEW local scope
# - x starts at 10 again (not preserved from first call)
# - inner() adds 5: x = 15
# - Returns 15
print(outer())  # 15

Answers:

First print(outer()) → 15
Second print(outer()) → 15

Key concepts:

nonlocal modifies variable in enclosing scope (not global)
Each call to outer() creates a fresh local scope
Variable x is NOT preserved between calls

Q1.4 (5 points)

nums = [1, 2, 3, 4, 5]
result = [x**2 for x in nums if x % 2 == 1]
print(result)
print(sum(result))

💡 Click to View Answer & Explanation

Step-by-step breakdown:

nums = [1, 2, 3, 4, 5]
 
# List comprehension with filter
# Pattern: [expression for item in iterable if condition]
result = [x**2 for x in nums if x % 2 == 1]
 
# Step-by-step execution:
# x=1: 1 % 2 == 1? True  → 1**2 = 1   → include
# x=2: 2 % 2 == 1? False → skip
# x=3: 3 % 2 == 1? True  → 3**2 = 9   → include
# x=4: 4 % 2 == 1? False → skip
# x=5: 5 % 2 == 1? True  → 5**2 = 25  → include
 
# Result: [1, 9, 25]
 
print(result)       # [1, 9, 25]
print(sum(result))  # 1 + 9 + 25 = 35

Answers:

print(result) → [1, 9, 25]
print(sum(result)) → 35

Breakdown:

Filter: odd numbers only (1, 3, 5)
Transform: square each (1, 9, 25)
Sum: 1 + 9 + 25 = 35

Question 2: Code Writing (30 points)

Choose 3 out of 5 questions. Each worth 10 points.

Q2.1 - Count Vowels (10 points)

Write a function count_vowels(text) that:

Counts vowels (a, e, i, o, u) - case insensitive
Returns the count as an integer

💡 Click to View Verified Answer

def count_vowels(text):
    """
    Count the number of vowels in a string.
    
    Vowels are: a, e, i, o, u (case insensitive)
    
    Args:
        text: Input string to analyze
        
    Returns:
        int: Number of vowels found
        
    Examples:
        >>> count_vowels("Hello World")
        3
        >>> count_vowels("AEIOU")
        5
    """
    # Define vowels (both cases for easy comparison)
    vowels = "aeiouAEIOU"
    
    # Initialize counter
    count = 0
    
    # Iterate through each character in the text
    for char in text:
        # Check if character is a vowel
        if char in vowels:
            count += 1
    
    return count
 
 
# Alternative: More Pythonic one-liner
def count_vowels_v2(text):
    """One-liner using generator expression and sum."""
    return sum(1 for char in text.lower() if char in 'aeiou')
 
 
# Alternative: Using count method
def count_vowels_v3(text):
    """Using str.count() for each vowel."""
    text_lower = text.lower()
    return sum(text_lower.count(v) for v in 'aeiou')
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [
        ("Hello World", 3),      # e, o, o
        ("AEIOU", 5),            # all uppercase vowels
        ("rhythm", 0),           # no vowels
        ("", 0),                 # empty string
        ("AaEeIiOoUu", 10),      # mixed case
    ]
    
    print("Testing count_vowels:")
    for text, expected in test_cases:
        result = count_vowels(text)
        status = "✓" if result == expected else "✗"
        print(f'  {status} count_vowels("{text}") = {result} (expected {expected})')

Test Output:

Testing count_vowels:
  ✓ count_vowels("Hello World") = 3 (expected 3)
  ✓ count_vowels("AEIOU") = 5 (expected 5)
  ✓ count_vowels("rhythm") = 0 (expected 0)
  ✓ count_vowels("") = 0 (expected 0)
  ✓ count_vowels("AaEeIiOoUu") = 10 (expected 10)

Key points:

Handle both uppercase and lowercase
Use in operator for membership test
Simple counter pattern

Q2.2 - Word Frequency Dictionary (10 points)

Write a function word_frequency(words) that:

Input: list of words
Return: dictionary with word counts
Example: ['a', 'b', 'a'] → {'a': 2, 'b': 1}

💡 Click to View Verified Answer

def word_frequency(words):
    """
    Count frequency of each word in a list.
    
    Args:
        words: List of words (strings)
        
    Returns:
        dict: Dictionary mapping each word to its count
        
    Examples:
        >>> word_frequency(['a', 'b', 'a'])
        {'a': 2, 'b': 1}
    """
    # Initialize empty frequency dictionary
    freq = {}
    
    # Count each word
    for word in words:
        if word in freq:
            # Word seen before - increment count
            freq[word] += 1
        else:
            # First occurrence - initialize count to 1
            freq[word] = 1
    
    return freq
 
 
# Alternative: Using dict.get()
def word_frequency_v2(words):
    """Using get() method to simplify logic."""
    freq = {}
    for word in words:
        # get(key, default) returns default if key doesn't exist
        freq[word] = freq.get(word, 0) + 1
    return freq
 
 
# Alternative: Using collections.Counter
def word_frequency_v3(words):
    """Using Counter from collections (most Pythonic)."""
    from collections import Counter
    return dict(Counter(words))
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [
        ['a', 'b', 'a'],
        ['hello', 'world', 'hello', 'hello'],
        [],
        ['single'],
    ]
    
    print("Testing word_frequency:")
    for words in test_cases:
        result = word_frequency(words)
        print(f"  {words} → {result}")

Test Output:

Testing word_frequency:
  ['a', 'b', 'a'] → {'a': 2, 'b': 1}
  ['hello', 'world', 'hello', 'hello'] → {'hello': 3, 'world': 1}
  [] → {}
  ['single'] → {'single': 1}

Key techniques:

Check if key exists before incrementing
Alternative: use dict.get(key, default)
Best practice: use collections.Counter

Q2.3 - Multiplication Table (10 points)

Write a function multiplication_table(n) that:

Prints an n×n multiplication table
Format: 1 x 1 = 1

💡 Click to View Verified Answer

def multiplication_table(n):
    """
    Print an n×n multiplication table.
    
    Args:
        n: Size of the table (positive integer)
        
    Example output for n=3:
        1 x 1 = 1
        1 x 2 = 2
        1 x 3 = 3
        2 x 1 = 2
        ...
        3 x 3 = 9
    """
    # Validate input
    if n <= 0:
        print("Please provide a positive integer.")
        return
    
    # Outer loop: rows (first multiplier)
    for i in range(1, n + 1):
        
        # Inner loop: columns (second multiplier)
        for j in range(1, n + 1):
            
            # Calculate product
            product = i * j
            
            # Print formatted result using f-string
            print(f"{i} x {j} = {product}")
 
 
# Alternative: Compact table format
def multiplication_table_compact(n):
    """Print table in grid format."""
    for i in range(1, n + 1):
        row = ""
        for j in range(1, n + 1):
            row += f"{i*j:4}"  # 4-character width for alignment
        print(row)
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    print("3x3 Multiplication Table:")
    print("-" * 20)
    multiplication_table(3)
    
    print("\n3x3 Compact Format:")
    print("-" * 20)
    multiplication_table_compact(3)

Test Output:

3x3 Multiplication Table:
--------------------
1 x 1 = 1
1 x 2 = 2
1 x 3 = 3
2 x 1 = 2
2 x 2 = 4
2 x 3 = 6
3 x 1 = 3
3 x 2 = 6
3 x 3 = 9

3x3 Compact Format:
--------------------
   1   2   3
   2   4   6
   3   6   9

Key concepts:

Nested loops for 2D iteration
range(1, n+1) to start from 1
f-strings for formatted output

Q2.4 - Find Max and Min (10 points)

Write a function find_max_min(numbers) that:

Input: list of numbers
Return: tuple (maximum, minimum, difference)
Handle empty list by returning (None, None, None)

💡 Click to View Verified Answer

def find_max_min(numbers):
    """
    Find maximum, minimum, and their difference in a list.
    
    Args:
        numbers: List of numeric values
        
    Returns:
        tuple: (maximum, minimum, difference) or (None, None, None) if empty
        
    Examples:
        >>> find_max_min([5, 2, 8, 1, 9])
        (9, 1, 8)
        >>> find_max_min([])
        (None, None, None)
    """
    # Handle empty list edge case
    # IMPORTANT: Check this first to avoid errors with min()/max()
    if not numbers:  # Empty list is falsy in Python
        return (None, None, None)
    
    # Find maximum and minimum using built-in functions
    maximum = max(numbers)
    minimum = min(numbers)
    
    # Calculate difference (range of values)
    difference = maximum - minimum
    
    return (maximum, minimum, difference)
 
 
# Alternative: Without using built-in min/max
def find_max_min_manual(numbers):
    """Manual implementation without min()/max()."""
    if not numbers:
        return (None, None, None)
    
    # Initialize with first element
    maximum = numbers[0]
    minimum = numbers[0]
    
    # Iterate through remaining elements
    for num in numbers[1:]:
        if num > maximum:
            maximum = num
        if num < minimum:
            minimum = num
    
    return (maximum, minimum, maximum - minimum)
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [
        [5, 2, 8, 1, 9],      # Normal case
        [3],                   # Single element
        [],                    # Empty list
        [-5, -2, -8, -1],     # Negative numbers
        [1, 1, 1, 1],         # All same
    ]
    
    print("Testing find_max_min:")
    for nums in test_cases:
        result = find_max_min(nums)
        print(f"  {nums} → max={result[0]}, min={result[1]}, diff={result[2]}")

Test Output:

Testing find_max_min:
  [5, 2, 8, 1, 9] → max=9, min=1, diff=8
  [3] → max=3, min=3, diff=0
  [] → max=None, min=None, diff=None
  [-5, -2, -8, -1] → max=-1, min=-8, diff=7
  [1, 1, 1, 1] → max=1, min=1, diff=0

Key points:

ALWAYS handle empty list first
Use built-in min() and max() for efficiency
Return a tuple, not a list

Q2.5 - Factorial (Recursive) (10 points)

Write a function factorial(n) that:

Calculates n! recursively
Handle: 0! = 1, negative returns None

💡 Click to View Verified Answer

def factorial(n):
    """
    Calculate factorial of n using recursion.
    
    Factorial definition:
    - n! = n × (n-1) × (n-2) × ... × 2 × 1
    - 0! = 1 (by definition)
    - Negative numbers: undefined (return None)
    
    Args:
        n: Non-negative integer
        
    Returns:
        int: n! or None for negative input
        
    Examples:
        >>> factorial(5)
        120
        >>> factorial(0)
        1
    """
    # Handle negative input
    if n < 0:
        return None
    
    # Base case: 0! = 1 and 1! = 1
    if n == 0 or n == 1:
        return 1
    
    # Recursive case: n! = n × (n-1)!
    return n * factorial(n - 1)
 
 
# Trace for factorial(4):
# factorial(4) = 4 × factorial(3)
#              = 4 × (3 × factorial(2))
#              = 4 × (3 × (2 × factorial(1)))
#              = 4 × (3 × (2 × 1))
#              = 4 × (3 × 2)
#              = 4 × 6
#              = 24
 
 
# Alternative: Iterative version (no recursion)
def factorial_iterative(n):
    """Calculate factorial using iteration."""
    if n < 0:
        return None
    
    result = 1
    for i in range(2, n + 1):
        result *= i
    return result
 
 
# ===== Test Cases =====
if __name__ == "__main__":
    test_cases = [
        (0, 1),      # 0! = 1
        (1, 1),      # 1! = 1
        (5, 120),    # 5! = 120
        (10, 3628800),
        (-5, None),  # Negative
    ]
    
    print("Testing factorial:")
    for n, expected in test_cases:
        result = factorial(n)
        status = "✓" if result == expected else "✗"
        print(f"  {status} factorial({n}) = {result} (expected {expected})")

Test Output:

Testing factorial:
  ✓ factorial(0) = 1 (expected 1)
  ✓ factorial(1) = 1 (expected 1)
  ✓ factorial(5) = 120 (expected 120)
  ✓ factorial(10) = 3628800 (expected 3628800)
  ✓ factorial(-5) = None (expected None)

Recursion components:

Base case: stops recursion (n=0 or n=1)
Recursive case: breaks problem into smaller subproblem
Progress: n decreases each call, eventually reaching base case

Question 3: Pandas & SVM/Random Forest (25 points)

Part A: Data Preprocessing (15 points)

Given this DataFrame:

import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler
 
data = {
    'Age': [25, 30, None, 35, 40],
    'Income': [30000, 50000, 45000, None, 60000],
    'Education': ['High School', 'Bachelor', 'Master', 'PhD', 'Bachelor'],
    'Purchased': ['No', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(data)

Q3.A1 (5 points) Fill missing Age with median, missing Income with mean.

💡 Click to View Verified Answer

import pandas as pd
 
# Create the DataFrame
data = {
    'Age': [25, 30, None, 35, 40],
    'Income': [30000, 50000, 45000, None, 60000],
    'Education': ['High School', 'Bachelor', 'Master', 'PhD', 'Bachelor'],
    'Purchased': ['No', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(data)
 
print("Original DataFrame:")
print(df)
print()
 
# Calculate statistics before filling
age_median = df['Age'].median()    # Median of [25, 30, 35, 40] = 32.5
income_mean = df['Income'].mean()  # Mean of [30000, 50000, 45000, 60000] = 46250
 
print(f"Age median (excluding NaN): {age_median}")
print(f"Income mean (excluding NaN): {income_mean}")
print()
 
# Fill missing values
# Method 1: Using fillna with inplace
df['Age'].fillna(age_median, inplace=True)
df['Income'].fillna(income_mean, inplace=True)
 
# Method 2: Using assignment (alternative)
# df['Age'] = df['Age'].fillna(df['Age'].median())
# df['Income'] = df['Income'].fillna(df['Income'].mean())
 
print("After filling missing values:")
print(df)

Calculations:

Age values (excluding NaN): [25, 30, 35, 40]
Age median: (30 + 35) / 2 = 32.5
Income values (excluding NaN): [30000, 50000, 45000, 60000]
Income mean: (30000 + 50000 + 45000 + 60000) / 4 = 46250

Result:

Row 2: Age filled with 32.5
Row 3: Income filled with 46250.0

Q3.A2 (5 points) Encode 'Education' using LabelEncoder. Show the mapping.

💡 Click to View Verified Answer

from sklearn.preprocessing import LabelEncoder
 
# Create LabelEncoder instance
le = LabelEncoder()
 
# Fit and transform the Education column
df['Education_Encoded'] = le.fit_transform(df['Education'])
 
print("Encoding result:")
print(df[['Education', 'Education_Encoded']])
print()
 
# Show the mapping (classes are sorted alphabetically)
print("LabelEncoder mapping:")
for i, label in enumerate(le.classes_):
    print(f"  '{label}' → {i}")

LabelEncoder sorts alphabetically then assigns 0, 1, 2, ...:

Original	Encoded
Bachelor	0
High School	1
Master	2
PhD	3

Encoded column: [1, 0, 2, 3, 0]

Important: LabelEncoder assigns integers based on alphabetical order, not order of appearance!

Q3.A3 (5 points) When should you use StandardScaler vs MinMaxScaler?

💡 Click to View Answer

Scaler	Formula	Output	Best For
StandardScaler	(x - mean) / std	Mean=0, Std=1	SVM, Logistic Regression, data with outliers
MinMaxScaler	(x - min) / (max - min)	[0, 1]	Neural Networks, KNN, image data

Use StandardScaler when:

Data is approximately normally distributed
You want to preserve outlier information
Using algorithms like SVM, Linear Regression

Use MinMaxScaler when:

You need bounded output (0 to 1)
Working with neural networks or image data
Outliers are not a concern

Quick rule:

SVM, Linear models → StandardScaler
Neural networks, KNN → MinMaxScaler

Part B: SVM & Random Forest Theory (10 points)

Q3.B1 (5 points) Explain the "kernel trick" in SVM.

💡 Click to View Answer

Kernel Trick Explanation:

Problem: Some data is not linearly separable in its original space.

Solution: The kernel trick transforms data into a higher-dimensional space where it becomes linearly separable.

How it works:

Original 2D data might have circular boundaries (can't draw a straight line)
Transform to 3D using a kernel function
In 3D, a flat plane can now separate the classes
The "trick": compute this efficiently without actually computing the transformation

Common kernels:

Kernel	Use Case
Linear	Already linearly separable
RBF (Radial Basis Function)	Default choice, works well for most cases
Polynomial	Data with polynomial relationships

Example in code:

from sklearn.svm import SVC
 
# Linear kernel
model_linear = SVC(kernel='linear')
 
# RBF kernel (default)
model_rbf = SVC(kernel='rbf')
 
# Polynomial kernel
model_poly = SVC(kernel='poly', degree=3)

Q3.B2 (5 points) What is "bagging" in Random Forest? Why does it help?

💡 Click to View Answer

Bagging (Bootstrap Aggregating):

Process:

Create multiple random subsets of training data (with replacement)
Train a separate decision tree on each subset
Combine predictions:

Classification: majority voting
Regression: average

Why it helps:

Reduces Overfitting

Each tree sees different data
Individual tree errors cancel out
Ensemble is more robust

Reduces Variance

Averaging many predictions is more stable
Less sensitive to noise in training data

Handles Outliers Better

Outliers only affect some trees, not all
Their influence is diluted in the ensemble

Better Generalization

Collective wisdom outperforms single tree
Works well on unseen data

Analogy: Like asking 100 doctors for diagnosis instead of 1 - the collective opinion is usually more reliable.

Question 4: Naive Bayes & Decision Tree (25 points)

Part A: Naive Bayes Calculation (15 points)

Dataset: Email classification

Email	Contains "Free"	Contains "Winner"	Spam?
1	Yes	Yes	Spam
2	Yes	No	Spam
3	No	Yes	Spam
4	No	No	Not Spam
5	Yes	No	Not Spam
6	No	No	Not Spam

Q4.A1 (10 points) A new email contains "Free" but not "Winner". Calculate P(Spam|Free=Yes, Winner=No).

💡 Click to View Verified Answer

Naive Bayes Formula: $P(Class|Features) \propto P(Class) \times \prod P(Feature|Class)$

Step 1: Calculate Prior Probabilities

Class	Count	P(Class)
Spam	3 (emails 1,2,3)	3/6 = 0.5
Not Spam	3 (emails 4,5,6)	3/6 = 0.5

Step 2: Calculate Likelihoods

For Spam emails (1, 2, 3):

P(Free=Yes | Spam) = 2/3 (emails 1, 2 have Free)
P(Winner=No | Spam) = 1/3 (only email 2 has Winner=No)

For Not Spam emails (4, 5, 6):

P(Free=Yes | Not Spam) = 1/3 (only email 5)
P(Winner=No | Not Spam) = 3/3 = 1 (all three)

Step 3: Calculate Unnormalized Posteriors

$P(Spam|evidence) \propto P(Spam) \times P(Free=Yes|Spam) \times P(Winner=No|Spam)$ $= 0.5 \times \frac{2}{3} \times \frac{1}{3} = 0.5 \times 0.667 \times 0.333 = 0.111$

$P(NotSpam|evidence) \propto 0.5 \times \frac{1}{3} \times 1 = 0.167$

Step 4: Normalize

$P(Spam) = \frac{0.111}{0.111 + 0.167} = \frac{0.111}{0.278} = 0.40$

Answer: P(Spam | Free=Yes, Winner=No) = 0.40 = 40%

Prediction: NOT SPAM (probability < 50%)

Q4.A2 (5 points) What is the "naive" assumption in Naive Bayes? When might it fail?

💡 Click to View Answer

The "Naive" Assumption:

Features are conditionally independent given the class
P(A, B | Class) = P(A | Class) × P(B | Class)
Each feature contributes independently to the prediction

When it fails:

Correlated features

Example: "Free" and "Prize" often appear together in spam
Treating them as independent overcounts their combined effect

Redundant features

Example: Having both "temperature in °C" and "temperature in °F"
These are perfectly correlated, violating independence

Feature interactions matter

Example: Medical diagnosis where symptom combinations are important
Symptom A alone is harmless, but A+B together indicates disease

Despite this limitation: Naive Bayes often works surprisingly well in practice, especially for:

Text classification
Spam detection
Sentiment analysis

Part B: Information Gain (10 points)

Q4.B1 (10 points) Calculate Information Gain for the "Contains Free" feature.

Original dataset: 3 Spam, 3 Not Spam

💡 Click to View Verified Answer

Entropy Formula: $H(S) = -\sum p_i \log_2(p_i)$

Step 1: Parent Entropy (3 Spam, 3 Not Spam)

$H(parent) = -0.5 \log_2(0.5) - 0.5 \log_2(0.5)$ $= -0.5 \times (-1) - 0.5 \times (-1)$ $= 0.5 + 0.5 = 1.0$

(Maximum entropy for binary classification = 1.0)

Step 2: Split by "Contains Free"

Free=Yes (3 emails: 2 Spam, 1 Not Spam): $H = -\frac{2}{3} \log_2(\frac{2}{3}) - \frac{1}{3} \log_2(\frac{1}{3})$ $= -0.667 \times (-0.585) - 0.333 \times (-1.585)$ $= 0.390 + 0.528 = 0.918$

Free=No (3 emails: 1 Spam, 2 Not Spam): $H = -\frac{1}{3} \log_2(\frac{1}{3}) - \frac{2}{3} \log_2(\frac{2}{3})$ $= 0.528 + 0.390 = 0.918$

Step 3: Weighted Average Entropy $H(children) = \frac{3}{6} \times 0.918 + \frac{3}{6} \times 0.918 = 0.918$

Step 4: Information Gain $IG = H(parent) - H(children) = 1.0 - 0.918 = 0.082$

Answer: Information Gain = 0.082 bits

Interpretation: "Contains Free" provides a small amount of information for classification. Higher IG would indicate a better split.

🏁 End of Exam

Question	Topic	Points
Q1	Python Output Analysis	20
Q2	Code Writing (choose 3/5)	30
Q3	Pandas & SVM/Random Forest	25
Q4	Naive Bayes & Decision Tree	25
Total		100

📝 Key Formulas Reference

Concept	Formula
Gini Index	1 - Σ(pᵢ²)
Entropy	-Σ pᵢ log₂(pᵢ)
Info Gain	H(parent) - Σ weighted H(children)
Bayes	P(A\|B) ∝ P(B\|A) × P(A)
Z-score	(x - μ) / σ
MinMax	(x - min) / (max - min)

All code verified and tested. Show your work for partial credit. Good luck!