Scikit-Learn's preprocessing.OrdinalEncoder in Python (with Examples)

Scikit-Learn’s preprocessing.OrdinalEncoder in Python (with Examples)

August 21, 2023

By Admin

Welcome to this article where we dive into the realm of machine learning preprocessing using Scikit-Learn’s OrdinalEncoder. Preprocessing is a crucial step in any machine learning pipeline. The OrdinalEncoder is one of the Scikit-Learn Encoders used for handling ordinal categorical data.

Sklearn Preprocessing OrdinalEncoder in Matplotlib — Scikit-learn Preprocessing OrdinalEncoder in Python

Contents hide

1 Understanding Ordinal Categorical Data

2 The Role of OrdinalEncoder

3 Handling Ordinal Variables

4 Working Principle

5 Use Cases

6 Benefits of OrdinalEncoder

7 Challenges and Considerations

8 Applying OrdinalEncoder

9 Python Code Examples

9.1 Example 1: Using Scikit-Learn Preprocessing OrdinalEncoder

10 Visualize Scikit-Learn Preprocessing OrdinalEncoder with Python

11 Sklearn Encoders

11.1 Python Example

12 Important Concepts in Scikit-Learn Preprocessing OrdinalEncoder

13 To Know Before You Learn Scikit-Learn Preprocessing OrdinalEncoder?

Understanding Ordinal Categorical Data

Ordinal categorical data consists of non-numeric values that have a clear order or ranking, like education levels or customer satisfaction ratings.

The Role of OrdinalEncoder

The OrdinalEncoder is designed to transform ordinal categorical variables into numerical values while preserving the order information.

Handling Ordinal Variables

OrdinalEncoder addresses the challenge of encoding ordinal variables by mapping categories to ordered numerical values.

Working Principle

OrdinalEncoder takes a list of categories and assigns them corresponding ordinal values.

Use Cases

Education levels

Socioeconomic status
Customer satisfaction ratings

Benefits of OrdinalEncoder

Preserves the ordinal relationship between categories

Enables numerical representation of ordinal data for machine learning models
Useful when applying algorithms that require numeric input

Challenges and Considerations

OrdinalEncoder assumes a meaningful order in the categories, which might not always be accurate.

Applying OrdinalEncoder

OrdinalEncoder is commonly used when dealing with ordinal categorical features, either as a standalone preprocessing step or as part of a more extensive data transformation process.

Python Code Examples

Example 1: Using Scikit-Learn Preprocessing OrdinalEncoder


from sklearn.preprocessing import OrdinalEncoder
import numpy as np

data = np.array([['Low'],
                 ['Medium'],
                 ['High'],
                 ['Medium'],
                 ['Low']])

encoder = OrdinalEncoder(categories=[['Low', 'Medium', 'High']])
encoded_data = encoder.fit_transform(data)

print("Original Data:")
print(data)
print("\nEncoded Data:")
print(encoded_data)

Visualize Scikit-Learn Preprocessing OrdinalEncoder with Python


import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import OrdinalEncoder

# Load the Iris dataset
iris = load_iris()
X = iris.data
species = iris.target_names[iris.target]

# Apply OrdinalEncoder to species
encoder = OrdinalEncoder()
species_encoded = encoder.fit_transform(species.reshape(-1, 1))

# Plot the original and encoded data
plt.figure(figsize=(10, 5))

plt.subplot(1, 2, 1)
plt.scatter(X[:, 0], X[:, 1], c=iris.target)
plt.title('Original Data')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')

plt.subplot(1, 2, 2)
plt.scatter(X[:, 0], X[:, 1], c=species_encoded)
plt.title('Encoded Data')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')

plt.tight_layout()
plt.show()

This code uses the Matplotlib library to visualize the effect of the Scikit-Learn Preprocessing OrdinalEncoder on the Iris dataset. It loads the Iris dataset, applies the OrdinalEncoder to encode the species labels, and then creates a side-by-side comparison of the original and encoded data.

Sklearn Encoders

Scikit-Learn provides three distinct encoders for handling categorical data: LabelEncoder, OneHotEncoder, and OrdinalEncoder.

LabelEncoder converts categorical labels into sequential integer values, often used for encoding target variables in classification.
OneHotEncoder transforms categorical features into a binary matrix, representing the presence or absence of each category. This prevents biases due to category relationships.
OrdinalEncoder encodes ordinal categorical data by assigning numerical values based on order, maintaining relationships between categories. These encoders play vital roles in transforming diverse categorical data types into formats compatible with various machine learning algorithms.

Encoder	Advantages	Disadvantages	Best Use Case
LabelEncoder	Simple and efficient encoding. Useful for target variables. Preserves natural order.	Doesn’t create additional features. Not suitable for features without order.	Classification tasks where labels have a meaningful order.
OneHotEncoder	Prevents bias due to category relationships. Useful for nominal categorical features. Compatible with various algorithms.	Creates high-dimensional data. Potential multicollinearity issues.	Machine learning algorithms requiring numeric input, especially for nominal data.
OrdinalEncoder	Maintains ordinal relationships. Handles meaningful order. Useful for features with inherent hierarchy.	May introduce unintended relationships. Not suitable for nominal data.	Features with clear ordinal rankings, like education levels or ratings.

Python Example


from sklearn.preprocessing import LabelEncoder, OneHotEncoder, OrdinalEncoder
import pandas as pd

# Create a sample dataset
data = pd.DataFrame({
    'color': ['Red', 'Blue', 'Green', 'Red', 'Blue'],
    'size': ['Small', 'Large', 'Medium', 'Medium', 'Small'],
    'class': ['A', 'B', 'C', 'A', 'C']
})

# Using LabelEncoder
label_encoder = LabelEncoder()
data_label_encoded = data.copy()
for column in data.columns:
    data_label_encoded[column] = label_encoder.fit_transform(data[column])

# Using OneHotEncoder
onehot_encoder = OneHotEncoder()
data_onehot_encoded = onehot_encoder.fit_transform(data[['color', 'size']]).toarray()

# Using OrdinalEncoder
ordinal_encoder = OrdinalEncoder(categories=[['Small', 'Medium', 'Large']])
data_ordinal_encoded = ordinal_encoder.fit_transform(data[['size']])

print("Original Data:")
print(data)

print("\nLabel Encoded Data:")
print(data_label_encoded)

print("\nOneHot Encoded Data:")
print(data_onehot_encoded)

print("\nOrdinal Encoded Data:")
print(data_ordinal_encoded)

To learn more, read our blog post on Scikit-learn encoders.

Important Concepts in Scikit-Learn Preprocessing OrdinalEncoder

Categorical data and its types
Understanding ordinal categorical data

Order-preserving encoding techniques
Handling nominal data vs. ordinal data
Mapping categories to numerical values

To Know Before You Learn Scikit-Learn Preprocessing OrdinalEncoder?

Basics of categorical data and its significance in machine learning
Understanding of ordinal relationships in data
Familiarity with encoding techniques for categorical variables

Experience using Scikit-Learn for machine learning tasks
Appreciation of how different encoders handle categorical data

What’s Next?

Exploration of other Scikit-Learn preprocessing techniques

Introduction to feature scaling and normalization
Handling missing data in machine learning
Advanced encoding methods (Target Encoding, Frequency Encoding)

Application of preprocessing techniques in real-world datasets
Building complete machine learning pipelines

Relevant Entities

Entities	Properties
Scikit-Learn OrdinalEncoder	Converts ordinal categorical variables into numeric values while preserving order.
Ordinal Categorical Data	Non-numeric values with meaningful order, like education levels.
Ordinal Variables	Categorical features with a distinct order or ranking.
Numerical Mapping	Assigning numerical values based on the order of categories.
Use Cases	Education levels, customer satisfaction ratings, socioeconomic status.
Preserved Order	Ensuring that ordinal relationships are maintained after encoding.

Sources

scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html" target="_blank" rel="noreferrer noopener">Scikit-Learn Documentation on OrdinalEncoder
A Comprehensive Guide to Different Types of Categorical Data Encoding
onehotencoder-vs-labelencoder-vs-dictvectorizor" target="_blank" rel="noreferrer noopener">When to Use OneHotEncoder vs. LabelEncoder vs. DictVectorizer?

How to Prepare Categorical Data for Deep Learning in Python
All About Categorical Variable Encoding

Conclusion

The Scikit-Learn OrdinalEncoder is a valuable tool for converting ordinal categorical data into numerical values that retain the order information. By understanding how to use it effectively, data scientists can enhance the quality of their machine learning models when dealing with ordinal features.