Introduction
This is the continuation of our step-by-step guide on building your first machine learning model in Python. If you haven’t read Part 1: Data Preparation & Part 2: Model Training yet, I recommend checking them out first.
In this part, we’ll evaluate our trained model and see how well it performs. We’ll also use it to make predictions on new data.
We’ll cover:
- Evaluating the model’s performance using key metrics
- Understanding accuracy, precision, recall, and confusion matrix
- Making predictions on new data
- Recap & next steps
By the end of this part, you’ll know how to assess and use your trained model for real-world predictions.
6. Step 6: Evaluating the Model
Now that our Decision Tree model is trained, we need to evaluate its performance on unseen data. This helps us understand how well the model generalizes beyond the training set.
6.1 Why Do We Need to Evaluate?
- Avoid Overfitting – A model that performs perfectly on training data but fails on test data has memorized instead of learned.
- Measure Accuracy – We need to quantify how well the model classifies new samples.
- Identify Weaknesses – Some species may be harder to classify than others.
6.2 Making Predictions on the Test Set
Now that the model has learned from the training set, let’s test it on unseen samples (X_test).
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix # Predict species for test set y_pred = model.predict(X_test) # Evaluate accuracy accuracy = accuracy_score(y_test, y_pred) print(f"Accuracy on test set: {accuracy*100:.2f}%")
What This Does:
- Predicts the species for all test samples.
- Compares predictions (
y_pred
) with actual labels (y_test
). - Computes accuracy, showing the percentage of correctly classified flowers
6.3 Understanding Model Accuracy
Example Output (May Vary Slightly):
Accuracy on test set: 96.67%
What This Means:
- If we get 96.67% accuracy, it means the model correctly classified 29 out of 30 test samples.
- Since the Iris dataset is simple, high accuracy is expected.
If accuracy is low: The model might be overfitting or struggling with overlapping species (Versicolor vs. Virginica).
6.4 Detailed Evaluation: Confusion Matrix & Classification Report
Instead of just accuracy, let’s analyze the model’s mistakes using a confusion matrix and classification report.
# Confusion Matrix print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred)) # Detailed Performance Report print("\nClassification Report:\n", classification_report(y_test, y_pred, target_names=iris.target_names))
Example Confusion Matrix Output:
Confusion Matrix:
[[10 0 0]
[ 0 9 1]
[ 0 0 10]]
The ideal confusion Matrix should be:
Confusion Matrix:
[[10 0 0]
[ 0 10 0]
[ 0 0 10]]
How to Read This?
- First row: All 10 Setosa samples were classified correctly. ✅
- Second row: 1 Versicolor sample was misclassified as Virginica. ❌
- Third row: All 10 Virginica samples were classified correctly. ✅
Example Classification Report:
precision recall f1-score support setosa 1.00 1.00 1.00 10 versicolor 0.90 0.90 0.90 10 virginica 1.00 1.00 1.00 10 accuracy 0.97 30 macro avg 0.97 0.97 0.97 30 weighted avg 0.97 0.97 0.97 30
Key Terms in the Report:
- Precision – How many predicted flowers were correct?
- Recall – How many actual flowers were classified correctly?
- F1-score – A balance between precision & recall.
- Support – The number of test samples per species.
How to Read This?
- Setosa (Precision = 1.00, Recall = 1.00) – The model classifies Setosa perfectly because it’s well-separated.
- Versicolor (Precision = 0.90, Recall = 0.90) – Some misclassifications with Virginica.
- Virginica (Precision = 1.00, Recall = 1.00) – Correctly classified, meaning the decision tree learned this class well.
6.5 Final Outcome of This Step
✅ We tested the model on unseen data.
✅ We measured accuracy, which is likely high (~95-100%).
✅ We analyzed misclassifications to understand weaknesses.
Our model performs well on the Iris dataset, but we saw small errors in Versicolor vs. Virginica classification.
7. Step 7: Using the Model for Predictions
Now that we’ve trained and evaluated our model, it’s time to use it to make predictions on new data!
Let’s say we find a new iris flower in the wild, and we measure its sepal length, sepal width, petal length, and petal width. We can feed these measurements into our trained model to predict the species.
7.1 Predicting a Single New Flower
Let’s say we have a new iris flower with the following measurements:
- 🌱 Sepal Length = 5.0 cm
- 🌱 Sepal Width = 2.9 cm
- 🌱 Petal Length = 1.5 cm
- 🌱 Petal Width = 0.2 cm
We’ll use the .predict()
method to classify this flower.
import numpy as np # Define a new flower sample (must be in a 2D array) new_flower = np.array([[5.0, 2.9, 1.5, 0.2]]) # Predict species pred_class = model.predict(new_flower) pred_species = iris.target_names[pred_class][0] print("Predicted species for flower with features", new_flower.tolist(), "->", pred_species)
Example Output:
Predicted species for flower with features [[5.0, 2.9, 1.5, 0.2]] -> setosa
7.2 Predicting Multiple New Flowers at Once
What if we have multiple flowers and want to predict their species in one step? We can pass multiple flower measurements at once.
# Define multiple new flower samples new_flowers = np.array([ [5.0, 2.9, 1.5, 0.2], # Likely Setosa [6.1, 2.8, 4.7, 1.2], # Likely Versicolor [7.2, 3.0, 6.0, 2.5] # Likely Virginica ]) # Predict species pred_classes = model.predict(new_flowers) pred_species_list = [iris.target_names[p] for p in pred_classes] # Print results for i, species in enumerate(pred_species_list): print(f"Flower {i+1}: {new_flowers[i].tolist()} -> {species}")
Example Output:
Flower 1: [5.0, 2.9, 1.5, 0.2] -> setosa Flower 2: [6.1, 2.8, 4.7, 1.2] -> versicolor Flower 3: [7.2, 3.0, 6.0, 2.5] -> virginica
What Happened Here?
- We created multiple flower samples in a 2D array.
- The model predicted the species for each sample.
- We mapped the numeric class predictions back to their actual species names using
iris.target_names
.
7.3 Visualizing Predictions with a Decision Boundary
To visualize the decision boundary correctly, we need to retrain the model using only petal length and petal width, so it expects 2D input instead of 4D.
Modify your training step to use only the two relevant features before plotting the decision boundary:
import numpy as np import matplotlib.pyplot as plt from matplotlib.colors import ListedColormap from sklearn.tree import DecisionTreeClassifier # Select petal length & petal width for visualization X_vis = df[['petal length (cm)', 'petal width (cm)']].values y_vis = df['species'].map({'setosa': 0, 'versicolor': 1, 'virginica': 2}).values # Train a new Decision Tree model with only these two features model_vis = DecisionTreeClassifier(random_state=42) # New model for visualization model_vis.fit(X_vis, y_vis) # Train only on petal length & petal width # Create mesh grid x_min, x_max = X_vis[:, 0].min() - 0.5, X_vis[:, 0].max() + 0.5 y_min, y_max = X_vis[:, 1].min() - 0.5, X_vis[:, 1].max() + 0.5 xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100), np.linspace(y_min, y_max, 100)) # Predict for each point in mesh using the new model Z = model_vis.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) # Define custom color maps for decision regions and scatter points cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF']) # Light red, green, blue cmap_bold = {0: 'red', 1: 'green', 2: 'blue'} # Bold colors for scatter points # Plot decision boundary plt.contourf(xx, yy, Z, cmap=cmap_light, alpha=0.5) # Scatter plot of actual data points for species, color in cmap_bold.items(): subset = X_vis[y_vis == species] plt.scatter(subset[:, 0], subset[:, 1], c=color, edgecolor='k', label=list(cmap_bold.keys())[species]) plt.xlabel('Petal Length (cm)') plt.ylabel('Petal Width (cm)') plt.title('Decision Boundary of the Trained Model') # Add legend with correct species names plt.legend(labels=['Setosa', 'Versicolor', 'Virginica']) plt.show()

What This Shows:
- Each color region represents a different species.
- The model draws boundaries based on learned rules.
- We can see how well-separated each species is in petal measurements.
7.4 Final Outcome of This Step
✅ We used the model to predict new flower species.
✅ We tested single and multiple predictions successfully.
✅ We visualized the model’s decision-making boundaries.
Our model is now fully functional and ready to classify any new iris flowers!
8. Recap & Next Steps
Congratulations! 🎉 You’ve successfully built and trained your first machine learning model from scratch using Python and scikit-learn. Let’s take a moment to recap everything we’ve covered and explore what’s next.
8.1 What We Did in This Project
📌 Step 1: Selected & Understood the Dataset
- We used the Iris dataset, which contains measurements of three flower species: Setosa, Versicolor, and Virginica.
- We visualized how petal length & width help distinguish between species.
📌 Step 2: Loaded the Data
- Used scikit-learn to load the Iris dataset and converted it into a pandas DataFrame for easier analysis.
📌 Step 3: Exploratory Data Analysis (EDA)
- Checked dataset structure and summary statistics.
- Created scatter plots to see how different species group together based on their measurements.
📌 Step 4: Prepared the Data (Train/Test Split)
- Split the dataset into training (80%) and test (20%) sets.
- Ensured an equal distribution of species in both sets using stratified sampling.
📌 Step 5: Chose & Trained a Model
- Selected a Decision Tree Classifier because it’s easy to understand and works well for small datasets.
- Trained the model using only the training data.
📌 Step 6: Evaluated the Model
- Measured accuracy on the test set (likely 96-100% accuracy).
- Used a confusion matrix & classification report to check for misclassifications.
📌 Step 7: Used the Model for Predictions
- Tested the model on new flower samples.
- Created a decision boundary visualization to see how the model classifies different regions.
8.2 What’s Next?
Now that you’ve built your first ML model, here are some ways to take this further:
✅ Try a Different Model – Experiment with k-Nearest Neighbors (KNN), Logistic Regression, or Random Forest to see if they perform better.
✅ Fine-tune the Decision Tree – Adjust hyperparameters like max_depth
, min_samples_split
, or criterion
to optimize performance.
✅ Feature Scaling & Engineering – Some models (like Logistic Regression or SVM) require normalizing features before training.
✅ Cross-Validation – Instead of one train-test split, use K-Fold Cross-Validation to improve model reliability.
✅ Deploy the Model – Turn this into a web app using Flask, Streamlit, or FastAPI to classify new flowers dynamically.
8.3 Final Thoughts
You’ve completed a full machine learning workflow – from data loading and preprocessing to training, evaluating, and visualizing a model.
This is just the beginning! With these fundamentals, you can start applying machine learning to real-world datasets and explore more advanced techniques.