Home Dimension Reduction using PCA
Post
Cancel

Dimension Reduction using PCA


Configuration

Data Loading

1
2
3
4
5
6
7
8
from sklearn import datasets
from sklearn.decomposition import PCA

import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
breast_cancer = datasets.load_breast_cancer()
data = breast_cancer.data

Scikit-learn 패키지를 이용해 주성분 분석을 진행한다.

Data Visualization

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
x = data[:, :2]
y = breast_cancer.target
target_names = breast_cancer.target_names


plt.figure(figsize = (7, 7))
colors = ['red', 'blue']

for color, i, target_name in zip(color, [0, 1], target_names):
  plt.scatter(x[y==i, 0], x[y==i, 1], color=color, label=target_name)

plt.legend()
plt.xlabel('Mean Radius')
plt.ylabel('Mean Texture')
plt.show()

Prior to dimension reduction, only the two variables ‘malignant’ and ‘benign’ are visualized.

before


PCA

1
2
3
4
5
6
7
x = data
y = breast_cancer.target
target_names = breast_cancer.target_names

pca = PCA(n_components=2)
x_p = pca.fit_transform(x)
print('Variance of the Top 2 Principle Components : %s' %str(pca.explained_variance_ratio_))

Variance of the Top 2 Principle Components : [0.98204467 0.01617649]
The PCA yields results as shown here.

after

This post is licensed under CC BY 4.0 by the author.

Multiple Linear Regression (Boston Housing Dataset)

[Prj] Car Recommendation Service using Filtered Review Data