Logistic Regression

Introduction to logistic Regression

This article discusses the basics of Logistic Regression and its implementation in Python. Logistic regression is basically a supervised classification algorithm. In a classification problem, the target variable(or output), y, can take only discrete values for given set of features(or inputs), X.

Contrary to popular belief, logistic regression IS a regression model. The model builds a regression model to predict the probability that a given data entry belongs to the category numbered as “1”. Just like Linear regression assumes that the data follows a linear function, Logistic regression models the data using the sigmoid function.

Logistic Regression Model

sigmoid

Logistic Regression Math

Datasets

The datasets has p feature variables and n observations
The feature matrix is represented as:

$x= \begin{bmatrix}1 & x_{11} & ... & x_{1p}\\1 & x_{21} & ... & x_{2p}\\... & ... & ... & ...}\\1 & x_{n1} & ... & x_{np} \end{bmatrix}$

Here, $x_{ij}$ denotes the values of jth feature for ith observation. $x_{ij}$ represents the predicted response for ith observation, where ith observation can be represented as follows: $\overrightarrow{x_{i} } =\begin{bmatrix} x_{i1} \\ x_{i2} \\ x_{i3} \\. \\. \\. \\ x_{ip}\end{bmatrix}$
$x_{ij}$ represents predicted response for ith response, the formula we use for calculating $x_{ij}$ is called hypothesis.

Hypothesis for classificaiton

In linear regression, the hypothesis we used for prediction was: $h( \overrightarrow{x_{i}} ) = \beta _{0} + \beta _{1} x_{i1} + \beta _{2} x_{i2}+...+ \beta _{p} x_{ip}$

Modification on Linear Regression

some modifications are made to the hypothesis for classification:
$h( \overrightarrow{ x_{i} })= \frac{1}{1+ e^{- \beta ^{T} \overrightarrow{ x_{i} }}}$

where regression coefficient vector be: $\overrightarrow{\beta} = \begin{bmatrix} \beta _{0} \\ \beta _{1}\\ \beta _{2}\\ .\\ .\\ .\\ \beta _{p}\end{bmatrix}$

Note that $x_{ij}$ represents bias of linear regression hypothesis.

Conditional Probabilities based ith observations:

$p(y_{i}=1 | \overrightarrow{ x_{i} }, \overrightarrow{ \beta } ) = h( \overrightarrow{ x_{i} } )$

$p(y_{i}=0 | \overrightarrow{ x_{i} }, \overrightarrow{ \beta } ) = 1-h( \overrightarrow{ x_{i} } )$

Compact form

$p(y_{i} | \overrightarrow{ x_{i} }, \overrightarrow{ \beta } ) = h( \overrightarrow{ x_{i} } ) ^{ y_{i} } (1-h( \overrightarrow{ x_{i} } )) ^{1- y_{i} }$

Note that yi represents the ith observation categorical target. Image ith observation categorical target is 1, then we need to maximize ith prediciton reponse as large as 1. While if ith observation categorical target is 0, we need to maximize $x_{ij}$ . In general, maximize $x_{ij}$ will fit more our model and obtain better result. Since x and y are observation values, we estimat the parameter $x_{ij}$ to maximizing this likelihood function: $L( \overrightarrow{ \beta } ) = \prod_i^n h( \overrightarrow{x_{i}} ) ^{ y_{i} } (1-h( \overrightarrow{x_{i}} ) )^{(1-y_{i})}$

Take log likelihood and obtain cost function: $J( \overrightarrow{ \beta } ) = \sum_i^n - y_{i} h( \overrightarrow{x_{i}} ) - (1-y_{i})(1-h( \overrightarrow{x_{i}} ))$

Using Gradient descent algorithm

$\frac{\partial J( \overrightarrow{ \beta } )}{\partial \beta _{j} } = \sum_i^n x_{ij} h( \overrightarrow{x_{i}} ) - x_{ij} \widehat{ y_{i} } = (h( \overrightarrow{x} ) - \widehat{ y }) \bullet \overrightarrow{ x_{i} }$

Logistic Regression using python from scratch

import modules

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

Prepare Data

data = pd.read_csv('/Users/hadley/Documents/pytorch/resources/heart.csv')
data.head()
print(data.shape)

(303, 14)

#### Creating Dummy Variables for categorical variables
dummy1 = pd.get_dummies(data['cp'], prefix = "cp")
dummy2 = pd.get_dummies(data['thal'], prefix = "thal")
dummy3 = pd.get_dummies(data['slope'], prefix = "slope")

data = [data, dummy1, dummy2, dummy3]
print(dummy1.shape, dummy2.shape, dummy3.shape)

(303, 4) (303, 4) (303, 3)

currently data is still list object, not ndarray

data = pd.concat(data, axis =1)
data.head()

	age	sex	cp	trestbps	chol	fbs	restecg	thalach	exang	oldpeak	...	cp_1	cp_2	cp_3	thal_1	thal_2	slope_0	slope_2
0	63	1	3	145	233	1	0	150	0	2.3	...	0	0	1	1	0	1	0
1	37	1	2	130	250	0	1	187	0	3.5	...	0	1	0	0	1	1	0
2	41	0	1	130	204	0	0	172	0	1.4	...	1	0	0	0	1	0	1
3	56	1	1	120	236	0	1	178	0	0.8	...	1	0	0	0	1	0	1
4	57	0	0	120	354	0	1	163	1	0.6	...	0	0	0	0	1	0	1

5 rows × 25 columns

drop previous categorical columns:

data = data.drop(['cp', 'thal', 'slope'], axis=1)
data.head()

	age	sex	trestbps	chol	fbs	restecg	thalach	exang	oldpeak	...	cp_1	cp_2	cp_3	thal_1	thal_2	slope_0	slope_2
0	63	1	145	233	1	0	150	0	2.3	...	0	0	1	1	0	1	0
1	37	1	130	250	0	1	187	0	3.5	...	0	1	0	0	1	1	0
2	41	0	130	204	0	0	172	0	1.4	...	1	0	0	0	1	0	1
3	56	1	120	236	0	1	178	0	0.8	...	1	0	0	0	1	0	1
4	57	0	120	354	0	1	163	1	0.6	...	0	0	0	0	1	0	1

5 rows × 22 columns

Creating Model for logistic regression

split data into input observations and output observations

y = data.target.to_numpy()
print(y.shape)

(303,)

x = data.drop(['target'], axis = 1)
print(x.shape)

(303, 21)

Normalize Data

normalize input observation data and add one more ‘dummy’ feature column so that bias calculation included here:

x = (x - np.min(x))/(np.max(x)-np.min(x)).to_numpy()
bias_column = np.ones((x.shape[0],1))
x = np.append(bias_column, x, axis = 1)
print(x.shape)

(303, 22)

Split data, 80% will be train data and 20% will be test data

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.2,random_state=0)
print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)

(242, 22) (61, 22) (242,) (61,)

### Weight and Bias Initialization, note that n_features+1 represent weights w already included bias
n_samples = x_train.shape[0]
n_features = x_train.shape[1]
w = np.zeros((n_features,))
print(n_samples, n_features, w.shape)

242 22 (22,)

#model prediction: forward pass by using sigmoid function
def forward(x):
    z = np.dot(x, w)
    return 1/(1+np.exp(-z))

loss function based on maximum likelihood estimation

def loss(y_train, y_predicted):
    loss = -(y_train*np.log(y_predicted) + (1-y_train)*np.log(1-y_predicted))
    return np.sum(loss)/y.shape[0]

gradient of weight

def gradient_weight(x,y_train,y_predicted):
    dw = np.dot((y_predicted - y_train).T, x)
    return dw

## training 
learning_rate = 0.01
n_iters = 100

forward predict

def forward_predict(x):
    z = np.dot(x, w)
    y_prediction =  1/(1+np.exp(-z))

    for i in range(y_prediction.size):
        if y_prediction[i] <= 0.5:
            y_prediction[i] = 0
        else:
            y_prediction[i] = 1
    return y_prediction

Prepare for loss curve, number of iteration as x axis and loss as y axis

plt_x = []
plt_y = []

Training

for epoch in range(n_iters):
    #prediction = forward pass
    y_pred = forward(x_train)

    #loss
    l = loss(y_train, y_pred)

    #gradients
    dw = gradient_weight(x_train, y_train, y_pred)

    #update wrights
    w -= learning_rate * dw

    plt_x.append(epoch)
    plt_y.append(l)

    if epoch % 1 ==0:
        print(f'epoch {epoch+1}: loss = {l:.4f}')

epoch 1: loss = 0.5536
epoch 2: loss = 0.4194
epoch 3: loss = 0.3742
epoch 4: loss = 0.3535
epoch 5: loss = 0.3418
epoch 6: loss = 0.3343
epoch 7: loss = 0.3290
epoch 8: loss = 0.3250
epoch 9: loss = 0.3218
epoch 10: loss = 0.3191
epoch 11: loss = 0.3167
epoch 12: loss = 0.3147
epoch 13: loss = 0.3128
epoch 14: loss = 0.3111
epoch 15: loss = 0.3095
epoch 16: loss = 0.3080
epoch 17: loss = 0.3066
epoch 18: loss = 0.3053
epoch 19: loss = 0.3041
epoch 20: loss = 0.3029
epoch 21: loss = 0.3018
epoch 22: loss = 0.3008
epoch 23: loss = 0.2998
epoch 24: loss = 0.2989
epoch 25: loss = 0.2979
epoch 26: loss = 0.2971
epoch 27: loss = 0.2963
epoch 28: loss = 0.2955
epoch 29: loss = 0.2947
epoch 30: loss = 0.2940
epoch 31: loss = 0.2933
epoch 32: loss = 0.2926
epoch 33: loss = 0.2920
epoch 34: loss = 0.2913
epoch 35: loss = 0.2907
epoch 36: loss = 0.2902
epoch 37: loss = 0.2896
epoch 38: loss = 0.2891
epoch 39: loss = 0.2886
epoch 40: loss = 0.2881
epoch 41: loss = 0.2876
epoch 42: loss = 0.2871
epoch 43: loss = 0.2867
epoch 44: loss = 0.2862
epoch 45: loss = 0.2858
epoch 46: loss = 0.2854
epoch 47: loss = 0.2850
epoch 48: loss = 0.2846
epoch 49: loss = 0.2843
epoch 50: loss = 0.2839
epoch 51: loss = 0.2836
epoch 52: loss = 0.2832
epoch 53: loss = 0.2829
epoch 54: loss = 0.2826
epoch 55: loss = 0.2823
epoch 56: loss = 0.2820
epoch 57: loss = 0.2817
epoch 58: loss = 0.2814
epoch 59: loss = 0.2812
epoch 60: loss = 0.2809
epoch 61: loss = 0.2806
epoch 62: loss = 0.2804
epoch 63: loss = 0.2801
epoch 64: loss = 0.2799
epoch 65: loss = 0.2797
epoch 66: loss = 0.2795
epoch 67: loss = 0.2792
epoch 68: loss = 0.2790
epoch 69: loss = 0.2788
epoch 70: loss = 0.2786
epoch 71: loss = 0.2784
epoch 72: loss = 0.2782
epoch 73: loss = 0.2780
epoch 74: loss = 0.2779
epoch 75: loss = 0.2777
epoch 76: loss = 0.2775
epoch 77: loss = 0.2773
epoch 78: loss = 0.2772
epoch 79: loss = 0.2770
epoch 80: loss = 0.2769
epoch 81: loss = 0.2767
epoch 82: loss = 0.2766
epoch 83: loss = 0.2764
epoch 84: loss = 0.2763
epoch 85: loss = 0.2761
epoch 86: loss = 0.2760
epoch 87: loss = 0.2759
epoch 88: loss = 0.2757
epoch 89: loss = 0.2756
epoch 90: loss = 0.2755
epoch 91: loss = 0.2753
epoch 92: loss = 0.2752
epoch 93: loss = 0.2751
epoch 94: loss = 0.2750
epoch 95: loss = 0.2749
epoch 96: loss = 0.2748
epoch 97: loss = 0.2747
epoch 98: loss = 0.2746
epoch 99: loss = 0.2745
epoch 100: loss = 0.2744

loss curve, loss function returned value with respect to number of iterations

plt.plot(plt_x, plt_y, 'b')
plt.xlabel("number of epochs", fontsize=15)
plt.ylabel("MLE cost function", fontsize=15)
plt.rcParams["figure.figsize"] = (12,7)
plt.show()

svg

manually calculate accuracy

y_test_pred = forward_predict(x_test)
print("Manuel Test Accuracy: {:.2f}%".format((100 - np.mean(np.abs(y_test_pred - y_test))*100)))

Manuel Test Accuracy: 85.25%

Final plot to check distance between predicted value and observation with repect to each feature

x_linear = np.dot(x_test, w)
def sigmoid(x):
     return 1/(1+np.exp(-x))
print(x_linear)

t1 = np.arange(-5.0, 5.0, 0.1)
plt.plot(t1, sigmoid(t1), 'r')
plt.plot(x_linear, y_test_pred, 'bo')
plt.xlabel("number of epochs", fontsize=15)
plt.ylabel("MLE cost function", fontsize=15)
plt.rcParams["figure.figsize"] = (12,7)
plt.show()

[-2.29420457 -0.06308867  0.04797952 -3.46814858 -2.0499066  -1.12642636
 -2.5685351  -2.41063596 -3.50555731 -4.49125426  0.60393739  2.5569347
 -3.00098277  2.30837334  3.2512419   1.25612815 -2.71948701  1.44747319
 -4.28088693  1.39343452  1.46533894 -1.24259217 -1.81542927 -1.63853973
  2.4237399   0.25793775 -1.76087883 -0.68070708  3.49235634  1.37801868
  1.25534473 -4.33003805  3.67578518  0.98659795  2.67926752  0.57409612
 -2.1883085   2.22040851 -1.85645047 -0.36110886 -0.50539455  1.48557235
  0.04146691 -1.75997446  0.44119235  1.63330185  1.78237282 -0.07312052
 -2.78411445  1.1278409   2.39580181  1.31927031  3.61757349  0.59014531
  4.81296128 -2.13979847  2.71081302  2.07815489  2.19097508  3.4580257
  1.06761226]

svg

	age	sex	cp	trestbps	chol	fbs	restecg	thalach	exang	oldpeak	...	cp_1	cp_2	cp_3	thal_1	thal_2	slope_0	slope_2
0	63	1	3	145	233	1	0	150	0	2.3	...	0	0	1	1	0	1	0
1	37	1	2	130	250	0	1	187	0	3.5	...	0	1	0	0	1	1	0
2	41	0	1	130	204	0	0	172	0	1.4	...	1	0	0	0	1	0	1
3	56	1	1	120	236	0	1	178	0	0.8	...	1	0	0	0	1	0	1
4	57	0	0	120	354	0	1	163	1	0.6	...	0	0	0	0	1	0	1

	age	sex	trestbps	chol	fbs	restecg	thalach	exang	oldpeak	...	cp_1	cp_2	cp_3	thal_1	thal_2	slope_0	slope_2
0	63	1	145	233	1	0	150	0	2.3	...	0	0	1	1	0	1	0
1	37	1	130	250	0	1	187	0	3.5	...	0	1	0	0	1	1	0
2	41	0	130	204	0	0	172	0	1.4	...	1	0	0	0	1	0	1
3	56	1	120	236	0	1	178	0	0.8	...	1	0	0	0	1	0	1
4	57	0	120	354	0	1	163	1	0.6	...	0	0	0	0	1	0	1

	age	sex	cp	trestbps	chol	fbs	restecg	thalach	exang	oldpeak	...	cp_1	cp_2	cp_3	thal_1	thal_2	slope_0	slope_2
0	63	1	3	145	233	1	0	150	0	2.3	...	0	0	1	1	0	1	0
1	37	1	2	130	250	0	1	187	0	3.5	...	0	1	0	0	1	1	0
2	41	0	1	130	204	0	0	172	0	1.4	...	1	0	0	0	1	0	1
3	56	1	1	120	236	0	1	178	0	0.8	...	1	0	0	0	1	0	1
4	57	0	0	120	354	0	1	163	1	0.6	...	0	0	0	0	1	0	1

	age	sex	trestbps	chol	fbs	restecg	thalach	exang	oldpeak	...	cp_1	cp_2	cp_3	thal_1	thal_2	slope_0	slope_2
0	63	1	145	233	1	0	150	0	2.3	...	0	0	1	1	0	1	0
1	37	1	130	250	0	1	187	0	3.5	...	0	1	0	0	1	1	0
2	41	0	130	204	0	0	172	0	1.4	...	1	0	0	0	1	0	1
3	56	1	120	236	0	1	178	0	0.8	...	1	0	0	0	1	0	1
4	57	0	120	354	0	1	163	1	0.6	...	0	0	0	0	1	0	1

	age	sex	cp	trestbps	chol	fbs	restecg	thalach	exang	oldpeak	...	cp_1	cp_2	cp_3	thal_1	thal_2	slope_0	slope_2
0	63	1	3	145	233	1	0	150	0	2.3	...	0	0	1	1	0	1	0
1	37	1	2	130	250	0	1	187	0	3.5	...	0	1	0	0	1	1	0
2	41	0	1	130	204	0	0	172	0	1.4	...	1	0	0	0	1	0	1
3	56	1	1	120	236	0	1	178	0	0.8	...	1	0	0	0	1	0	1
4	57	0	0	120	354	0	1	163	1	0.6	...	0	0	0	0	1	0	1

	age	sex	trestbps	chol	fbs	restecg	thalach	exang	oldpeak	...	cp_1	cp_2	cp_3	thal_1	thal_2	slope_0	slope_2
0	63	1	145	233	1	0	150	0	2.3	...	0	0	1	1	0	1	0
1	37	1	130	250	0	1	187	0	3.5	...	0	1	0	0	1	1	0
2	41	0	130	204	0	0	172	0	1.4	...	1	0	0	0	1	0	1
3	56	1	120	236	0	1	178	0	0.8	...	1	0	0	0	1	0	1
4	57	0	120	354	0	1	163	1	0.6	...	0	0	0	0	1	0	1