Regression and classification are essential concepts in Machine Learning. Both of them aim to teach machines to predict a future outcome based on existing data which we use to train models. The difference is Regression aim to predict a continuous value such as a product price and classification aim to predict a label such as spam detection (spam or not spam). However, one model connects these two machine learning areas: Logistic Regression Classifier.
In this post, we see briefly the intuition behind the Logistic Regression and how to code it in python.
Intuition
The idea behind the Logistic Regression classifier is that we use the Linear Regression algorithm in a way that it can give us 0/1 labels as an outcome (instead of a continuous value). The linear regression plot is something like this: (y = b1X + b0)
The issue here is somehow we need a trick to put the outcome in the [0,1] range. To achieve this, we use probability since it is always between one and zero. So instead of predicting the actual value of y, we put the probability of that value in our plot. As the result, the linear regression plot will be something like this:
However, there is one step remaining. Since we are dealing with classification, we need to assign a label (0/1) to the outcome instead of a probability. Therefore, we set a threshold to separate the labels based on the probability. For instance, if the probability is less than 0.5 we assign the label 0, otherwise the label 1. Therefore, in the above plot, all the data points under the y=0.5 line will be zero and the rest will be one.
Python
To code Logistic regression, we can use the LogisticRegression class in the Sklearn library. The class is part of linear_model in Sklearn.
Let’s consider these are the data variables we are dealing with after splitting the data to train and test:
- X_train: train features
- X_test: test features
- y_train: train labels
- y_test: test labels
The first thing we do is to import the Logistic regression class from Sklearn:
from sklearn.linear_model import LogisticRegression
Then, we create an instance from the imported class:
lr_classif = LogisticRegression(random_state=0)
After that, we train our model with the X and y train data:
lr_classif.fit(X_train, y_train)
Now that our model is trained, we use it to make predictions with the test data. The prediction result will is named y_pred:
y_pred = lr_classif.predict(X_test)
To evaluate our model, we can compare the y_test and y_pred. To this, we can use the confusion_matrix or accuracy_score from sklearn.metrics.
Here is the final code in one place:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score
lr_classif = LogisticRegression(random_state=0)
lr_classif.fit(X_train, y_train)
y_pred = lr_classif.predict(X_test)
cm_lr_classif = confusion_matrix(y_test, y_pred)
print("Confusion Matrix")
print(cm_lr_classif)
acc_lr_classif = accuracy_score(y_test, y_pred)
print("Accuracy for Logistic Regression Classifier: {}".format(str(acc_lr_classif)))
The End.