Welcome to this article where we dive into the realm of machine learning preprocessing using Scikit-Learn’s OrdinalEncoder. Preprocessing is a crucial step in any machine learning pipeline. The OrdinalEncoder is one of the Scikit-Learn Encoders used for handling ordinal categorical data.

Understanding Ordinal Categorical Data
Ordinal categorical data consists of non-numeric values that have a clear order or ranking, like education levels or customer satisfaction ratings.
The Role of OrdinalEncoder
The OrdinalEncoder is designed to transform ordinal categorical variables into numerical values while preserving the order information.
Handling Ordinal Variables
OrdinalEncoder addresses the challenge of encoding ordinal variables by mapping categories to ordered numerical values.
Working Principle
OrdinalEncoder takes a list of categories and assigns them corresponding ordinal values.
Use Cases
- Education levels
- Socioeconomic status
- Customer satisfaction ratings
Benefits of OrdinalEncoder
- Preserves the ordinal relationship between categories
- Enables numerical representation of ordinal data for machine learning models
- Useful when applying algorithms that require numeric input
Challenges and Considerations
OrdinalEncoder assumes a meaningful order in the categories, which might not always be accurate.
Applying OrdinalEncoder
OrdinalEncoder is commonly used when dealing with ordinal categorical features, either as a standalone preprocessing step or as part of a more extensive data transformation process.
Python Code Examples
Example 1: Using Scikit-Learn Preprocessing OrdinalEncoder
from sklearn.preprocessing import OrdinalEncoder
import numpy as np
data = np.array([['Low'],
['Medium'],
['High'],
['Medium'],
['Low']])
encoder = OrdinalEncoder(categories=[['Low', 'Medium', 'High']])
encoded_data = encoder.fit_transform(data)
print("Original Data:")
print(data)
print("\nEncoded Data:")
print(encoded_data)

Visualize Scikit-Learn Preprocessing OrdinalEncoder with Python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import OrdinalEncoder
# Load the Iris dataset
iris = load_iris()
X = iris.data
species = iris.target_names[iris.target]
# Apply OrdinalEncoder to species
encoder = OrdinalEncoder()
species_encoded = encoder.fit_transform(species.reshape(-1, 1))
# Plot the original and encoded data
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.scatter(X[:, 0], X[:, 1], c=iris.target)
plt.title('Original Data')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.subplot(1, 2, 2)
plt.scatter(X[:, 0], X[:, 1], c=species_encoded)
plt.title('Encoded Data')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.tight_layout()
plt.show()
This code uses the Matplotlib library to visualize the effect of the Scikit-Learn Preprocessing OrdinalEncoder on the Iris dataset. It loads the Iris dataset, applies the OrdinalEncoder to encode the species labels, and then creates a side-by-side comparison of the original and encoded data.

Sklearn Encoders
Scikit-Learn provides three distinct encoders for handling categorical data: LabelEncoder, OneHotEncoder, and OrdinalEncoder.
LabelEncoder
converts categorical labels into sequential integer values, often used for encoding target variables in classification.OneHotEncoder
transforms categorical features into a binary matrix, representing the presence or absence of each category. This prevents biases due to category relationships.OrdinalEncoder
encodes ordinal categorical data by assigning numerical values based on order, maintaining relationships between categories. These encoders play vital roles in transforming diverse categorical data types into formats compatible with various machine learning algorithms.
Encoder | Advantages | Disadvantages | Best Use Case |
---|---|---|---|
LabelEncoder | Simple and efficient encoding. Useful for target variables. Preserves natural order. | Doesn’t create additional features. Not suitable for features without order. | Classification tasks where labels have a meaningful order. |
OneHotEncoder | Prevents bias due to category relationships. Useful for nominal categorical features. Compatible with various algorithms. | Creates high-dimensional data. Potential multicollinearity issues. | Machine learning algorithms requiring numeric input, especially for nominal data. |
OrdinalEncoder | Maintains ordinal relationships. Handles meaningful order. Useful for features with inherent hierarchy. | May introduce unintended relationships. Not suitable for nominal data. | Features with clear ordinal rankings, like education levels or ratings. |
Python Example
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, OrdinalEncoder
import pandas as pd
# Create a sample dataset
data = pd.DataFrame({
'color': ['Red', 'Blue', 'Green', 'Red', 'Blue'],
'size': ['Small', 'Large', 'Medium', 'Medium', 'Small'],
'class': ['A', 'B', 'C', 'A', 'C']
})
# Using LabelEncoder
label_encoder = LabelEncoder()
data_label_encoded = data.copy()
for column in data.columns:
data_label_encoded[column] = label_encoder.fit_transform(data[column])
# Using OneHotEncoder
onehot_encoder = OneHotEncoder()
data_onehot_encoded = onehot_encoder.fit_transform(data[['color', 'size']]).toarray()
# Using OrdinalEncoder
ordinal_encoder = OrdinalEncoder(categories=[['Small', 'Medium', 'Large']])
data_ordinal_encoded = ordinal_encoder.fit_transform(data[['size']])
print("Original Data:")
print(data)
print("\nLabel Encoded Data:")
print(data_label_encoded)
print("\nOneHot Encoded Data:")
print(data_onehot_encoded)
print("\nOrdinal Encoded Data:")
print(data_ordinal_encoded)

To learn more, read our blog post on Scikit-learn encoders.
Important Concepts in Scikit-Learn Preprocessing OrdinalEncoder
- Categorical data and its types
- Understanding ordinal categorical data
- Order-preserving encoding techniques
- Handling nominal data vs. ordinal data
- Mapping categories to numerical values
To Know Before You Learn Scikit-Learn Preprocessing OrdinalEncoder?
- Basics of categorical data and its significance in machine learning
- Understanding of ordinal relationships in data
- Familiarity with encoding techniques for categorical variables
- Experience using Scikit-Learn for machine learning tasks
- Appreciation of how different encoders handle categorical data
What’s Next?
- Exploration of other Scikit-Learn preprocessing techniques
- Introduction to feature scaling and normalization
- Handling missing data in machine learning
- Advanced encoding methods (Target Encoding, Frequency Encoding)
- Application of preprocessing techniques in real-world datasets
- Building complete machine learning pipelines
Relevant Entities
Entities | Properties |
---|---|
Scikit-Learn OrdinalEncoder | Converts ordinal categorical variables into numeric values while preserving order. |
Ordinal Categorical Data | Non-numeric values with meaningful order, like education levels. |
Ordinal Variables | Categorical features with a distinct order or ranking. |
Numerical Mapping | Assigning numerical values based on the order of categories. |
Use Cases | Education levels, customer satisfaction ratings, socioeconomic status. |
Preserved Order | Ensuring that ordinal relationships are maintained after encoding. |
Sources
- scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html" target="_blank" rel="noreferrer noopener">Scikit-Learn Documentation on OrdinalEncoder
- A Comprehensive Guide to Different Types of Categorical Data Encoding
- onehotencoder-vs-labelencoder-vs-dictvectorizor" target="_blank" rel="noreferrer noopener">When to Use OneHotEncoder vs. LabelEncoder vs. DictVectorizer?
- How to Prepare Categorical Data for Deep Learning in Python
- All About Categorical Variable Encoding
Conclusion
The Scikit-Learn OrdinalEncoder is a valuable tool for converting ordinal categorical data into numerical values that retain the order information. By understanding how to use it effectively, data scientists can enhance the quality of their machine learning models when dealing with ordinal features.