RoleXxParker

# Importing Required Libraries

```python

#data reading and manipulation packages

import pandas as pd

import numpy as np

import openpyxl as oxl

#data visualization packages

import matplotlib.pyplot as plt

import matplotlib as mlt

import seaborn as sns

#machine learning packages

from sklearn.linear_model import LinearRegression

from sklearn.model_selection import train_test_split

from sklearn.metrics import r2_score,mean_squared_error

from sklearn.linear_model import LogisticRegression

```

# Reading .csv file & performing EDA

## Exploratory Data Analysis

```python

#reading .csv file using pandas

df = pd.read_csv('boston.csv',sep = ",")

```

<div>

.dataframe tbody tr th:only-of-type {

vertical-align: middle;

}

.dataframe tbody tr th {

vertical-align: top;

}

.dataframe thead th {

text-align: right;

}

</style>

<thead>

<th>indus</th>

<th>ptratio</th>

<th>black</th>

<th>lstat</th>

</tr>

</thead>

<tbody>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

</tbody>

</table>

<p>506 rows × 14 columns</p>

</div>

### General Information about Dataset

- CRIM per capita crime rate by town

- ZN proportion of residential land zoned for lots over 25,000 sq.ft.

- INDUS proportion of non-retail business acres per town

- CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)

- NOX nitric oxides concentration (parts per 10 million)

- RM average number of rooms per dwelling

- AGE proportion of owner-occupied units built prior to 1940

- DIS weighted distances to five Boston employment centres

- RAD index of accessibility to radial highways

- TAX full-value property-tax rate per 10,000usd

- PTRATIO pupil-teacher ratio by town

- B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town

- LSTAT % lower status of the population

```python

#display first 5 rows

df.head()

#to display first n rows

#df.head(n)

```

<div>

.dataframe tbody tr th:only-of-type {

vertical-align: middle;

}

.dataframe tbody tr th {

vertical-align: top;

}

.dataframe thead th {

text-align: right;

}

</style>

<thead>

<th>indus</th>

<th>ptratio</th>

<th>black</th>

<th>lstat</th>

</tr>

</thead>

<tbody>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

</tbody>

</table>

</div>

```python

#display last 5 rows

df.tail()

#to display last n rows

#df.tail(n)

```

<div>

.dataframe tbody tr th:only-of-type {

vertical-align: middle;

}

.dataframe tbody tr th {

vertical-align: top;

}

.dataframe thead th {

text-align: right;

}

</style>

<thead>

<th>indus</th>

<th>ptratio</th>

<th>black</th>

<th>lstat</th>

</tr>

</thead>

<tbody>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

</tbody>

</table>

</div>

```python

#to display an sample row

df.sample()

```

<div>

.dataframe tbody tr th:only-of-type {

vertical-align: middle;

}

.dataframe tbody tr th {

vertical-align: top;

}

.dataframe thead th {

text-align: right;

}

</style>

<thead>

<th>indus</th>

<th>ptratio</th>

<th>black</th>

<th>lstat</th>

</tr>

</thead>

<tbody>

<tr>

</tr>

</tbody>

</table>

</div>

```python

#to display n sample from the dataset

df.sample(3)

```

<div>

.dataframe tbody tr th:only-of-type {

vertical-align: middle;

}

.dataframe tbody tr th {

vertical-align: top;

}

.dataframe thead th {

text-align: right;

}

</style>

<thead>

<th>indus</th>

<th>ptratio</th>

<th>black</th>

<th>lstat</th>

</tr>

</thead>

<tbody>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

</tbody>

</table>

</div>

```python

#display shape of the dataset

#that is total columns and rows

df.shape

```

(506, 14)

```python

#display all columns

df.columns

```

Index(['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax',

'ptratio', 'black', 'lstat', 'medv'],

dtype='object')

```python

#basic information about dataset

df.info()

```

RangeIndex: 506 entries, 0 to 505

Data columns (total 14 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 crim 506 non-null float64

1 zn 506 non-null float64

2 indus 506 non-null float64

3 chas 506 non-null int64

4 nox 506 non-null float64

5 rm 506 non-null float64

6 age 506 non-null float64

7 dis 506 non-null float64

8 rad 506 non-null int64

9 tax 506 non-null int64

10 ptratio 506 non-null float64

11 black 506 non-null float64

12 lstat 506 non-null float64

13 medv 506 non-null float64

dtypes: float64(11), int64(3)

memory usage: 55.5 KB

```python

#to get datatype of each column

df.dtypes

```

crim float64

zn float64

indus float64

chas int64

nox float64

rm float64

age float64

dis float64

rad int64

tax int64

ptratio float64

black float64

lstat float64

medv float64

dtype: object

```python

#statistical data about our dataset

df.describe()

```

<div>

.dataframe tbody tr th:only-of-type {

vertical-align: middle;

}

.dataframe tbody tr th {

vertical-align: top;

}

.dataframe thead th {

text-align: right;

}

</style>

<thead>

<th>indus</th>

<th>ptratio</th>

<th>black</th>

<th>lstat</th>

</tr>

</thead>

<tbody>

<tr>

<th>count</th>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

</tbody>

</table>

</div>

### Data Cleaning

- Check for Null Values

- Check for Duplicates Values/Rows

- Check for outliers

```python

#check for null values in dataset

#check for missing values

df.isnull()

```

<div>

.dataframe tbody tr th:only-of-type {

vertical-align: middle;

}

.dataframe tbody tr th {

vertical-align: top;

}

.dataframe thead th {

text-align: right;

}

</style>

<thead>

<th>indus</th>

<th>ptratio</th>

<th>black</th>

<th>lstat</th>

</tr>

</thead>

<tbody>

<tr>

<td>False</td>

</tr>

<tr>

<td>False</td>

</tr>

<tr>

<td>False</td>

</tr>

<tr>

<td>False</td>

</tr>

<tr>

<td>False</td>

</tr>

<tr>

</tr>

<tr>

<td>False</td>

</tr>

<tr>

<td>False</td>

</tr>

<tr>

<td>False</td>

</tr>

<tr>

<td>False</td>

</tr>

<tr>

<td>False</td>

</tr>

</tbody>

</table>

<p>506 rows × 14 columns</p>

</div>

```python

#sum of null values column-wise

df.isnull().sum()

```

crim 0

zn 0

indus 0

chas 0

nox 0

rm 0

age 0

dis 0

rad 0

tax 0

ptratio 0

black 0

lstat 0

medv 0

dtype: int64

```python

#total count of null values in the dataset

df.isnull().sum().sum()

```

np.int64(0)

```python

#check for duplicate rows

df.duplicated()

```

0 False

1 False

2 False

3 False

4 False

...

501 False

502 False

503 False

504 False

505 False

Length: 506, dtype: bool

```python

#sum of duplicated rows in the dataset

df.duplicated().sum()

```

np.int64(0)

# Data Visualization

```python

#Histogram

#distribution of all features

df.hist(figsize=(18,14),color = "y")

plt.show()

```

![png](output_23_0.png)

```python

#Relationship between number of rooms and house price

#plotting an scatter plot

x = df['rm']

y = df['medv']

plt.scatter(x,y,color = "r",marker = "*",label = "Data Points")

plt.xlabel('No. of Rooms',fontsize = "13")

plt.ylabel('House Price',fontsize = "13")

plt.title("Relationship Plot",fontsize = "13")

plt.legend()

plt.show()

```

![png](output_24_0.png)

# Check for outlier using box-plot

for i in df.columns:

if df[i].dtype != "object":

plt.boxplot(df[i])

plt.xlabel(i)

plt.show()

```python

#Check for outlier using box-plot

#Box-Plot is for Outlier Detection

plt.figure(figsize=(12,6))

sns.boxplot(data=df)

plt.xticks(rotation = 0)

#plt.legend()

plt.show()

```

![png](output_26_0.png)

```python

# Correlation between features

plt.figure(figsize=(12,6))

sns.heatmap(df.corr(), annot=True, cmap='coolwarm')

plt.title("Correlation Heatmap")

plt.show()

```

![png](output_27_0.png)

### Feature Selection

```python

#Separate festures (X) and target (y)

#X = Independent Variables/ Input Columns

#y = Dependent Variable/Target Column

X = df.drop(columns = 'medv')

y = df['medv']

```

```python

#List of Input Columns/Independent Variables

X.columns

```

Index(['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax',

'ptratio', 'black', 'lstat'],

dtype='object')

```python

#top 5 rows of input columns

X.head()

```

<div>

.dataframe tbody tr th:only-of-type {

vertical-align: middle;

}

.dataframe tbody tr th {

vertical-align: top;

}

.dataframe thead th {

text-align: right;

}

</style>

<thead>

<th>indus</th>

<th>ptratio</th>

<th>black</th>

<th>lstat</th>

</tr>

</thead>

<tbody>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

</tbody>

</table>

</div>

```python

#last 5 rows of input columns

X.tail()

```

<div>

.dataframe tbody tr th:only-of-type {

vertical-align: middle;

}

.dataframe tbody tr th {

vertical-align: top;

}

.dataframe thead th {

text-align: right;

}

</style>

<thead>

<th>indus</th>

<th>ptratio</th>

<th>black</th>

<th>lstat</th>

</tr>

</thead>

<tbody>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

</tbody>

</table>

</div>

```python

#top 5 rows of target columns

y.head()

```

0 24.0

1 21.6

2 34.7

3 33.4

4 36.2

Name: medv, dtype: float64

```python

#last 5 rows of target columns

y.tail()

```

501 22.4

502 20.6

503 23.9

504 22.0

505 11.9

Name: medv, dtype: float64

# Train-Test-Split

```python

#Split the data into training and testing data

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2,random_state = 42)

```

```python

#first 5 rows of input training data

X_train.head()

```

<div>

.dataframe tbody tr th:only-of-type {

vertical-align: middle;

}

.dataframe tbody tr th {

vertical-align: top;

}

.dataframe thead th {

text-align: right;

}

</style>

<thead>

<th>indus</th>

<th>ptratio</th>

<th>black</th>

<th>lstat</th>

</tr>

</thead>

<tbody>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

</tbody>

</table>

</div>

```python

#first 5 rows of input testing data

X_test.head()

```

<div>

.dataframe tbody tr th:only-of-type {

vertical-align: middle;

}

.dataframe tbody tr th {

vertical-align: top;

}

.dataframe thead th {

text-align: right;

}

</style>

<thead>

<th>indus</th>

<th>ptratio</th>

<th>black</th>

<th>lstat</th>

</tr>

</thead>

<tbody>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

</tbody>

</table>

</div>

```python

#first 5 rows of output/target training data

y_train.head()

```

477 12.0

15 19.9

332 19.4

423 13.4

19 18.2

Name: medv, dtype: float64

```python

#first 5 rows of output/target testing data

y_test.head()

```

173 23.6

274 32.4

491 13.6

72 22.8

452 16.1

Name: medv, dtype: float64

# Model Training

```python

#training the data by passing the training data (X_train,y_train)

model = LinearRegression()

model.fit(X_train,y_train)

#here our model is trained and our best-fit line is ready

```

/* Definition of color scheme common for light and dark mode */

--sklearn-color-text: #000;

--sklearn-color-text-muted: #666;

--sklearn-color-line: gray;

/* Definition of color scheme for unfitted estimators */

--sklearn-color-unfitted-level-0: #fff5e6;

--sklearn-color-unfitted-level-1: #f6e4d2;

--sklearn-color-unfitted-level-2: #ffe0b3;

--sklearn-color-unfitted-level-3: chocolate;

/* Definition of color scheme for fitted estimators */

--sklearn-color-fitted-level-0: #f0f8ff;

--sklearn-color-fitted-level-1: #d4ebff;

--sklearn-color-fitted-level-2: #b3dbfd;

--sklearn-color-fitted-level-3: cornflowerblue;

}

#sk-container-id-1.light {

/* Specific color for light theme */

--sklearn-color-text-on-default-background: black;

--sklearn-color-background: white;

--sklearn-color-border-box: black;

--sklearn-color-icon: #696969;

}

#sk-container-id-1.dark {

--sklearn-color-text-on-default-background: white;

--sklearn-color-background: #111;

--sklearn-color-border-box: white;

--sklearn-color-icon: #878787;

}

#sk-container-id-1 {

color: var(--sklearn-color-text);

}

#sk-container-id-1 pre {

padding: 0;

}

#sk-container-id-1 input.sk-hidden--visually {

border: 0;

clip: rect(1px 1px 1px 1px);

clip: rect(1px, 1px, 1px, 1px);

height: 1px;

margin: -1px;

overflow: hidden;

padding: 0;

position: absolute;

width: 1px;

}

#sk-container-id-1 div.sk-dashed-wrapped {

border: 1px dashed var(--sklearn-color-line);

margin: 0 0.4em 0.5em 0.4em;

box-sizing: border-box;

padding-bottom: 0.4em;

background-color: var(--sklearn-color-background);

}

#sk-container-id-1 div.sk-container {

/* jupyter's `normalize.less` sets `[hidden] { display: none; }`

but bootstrap.min.css set `[hidden] { display: none !important; }`

so we also need the `!important` here to be able to override the

default hidden behavior on the sphinx rendered scikit-learn.org.

See: https://github.com/scikit-learn/scikit-learn/issues/21755 */

display: inline-block !important;

position: relative;

}

#sk-container-id-1 div.sk-text-repr-fallback {

display: none;

}

div.sk-parallel-item,

div.sk-serial,

div.sk-item {

/* draw centered vertical line to link estimators */

background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));

background-size: 2px 100%;

background-repeat: no-repeat;

background-position: center center;

}

/* Parallel-specific style estimator block */

#sk-container-id-1 div.sk-parallel-item::after {

content: "";

width: 100%;

border-bottom: 2px solid var(--sklearn-color-text-on-default-background);

flex-grow: 1;

}

#sk-container-id-1 div.sk-parallel {

display: flex;

align-items: stretch;

justify-content: center;

background-color: var(--sklearn-color-background);

position: relative;

}

#sk-container-id-1 div.sk-parallel-item {

display: flex;

flex-direction: column;

}

#sk-container-id-1 div.sk-parallel-item:first-child::after {

align-self: flex-end;

width: 50%;

}

#sk-container-id-1 div.sk-parallel-item:last-child::after {

align-self: flex-start;

width: 50%;

}

#sk-container-id-1 div.sk-parallel-item:only-child::after {

width: 0;

}

/* Serial-specific style estimator block */

#sk-container-id-1 div.sk-serial {

display: flex;

flex-direction: column;

align-items: center;

background-color: var(--sklearn-color-background);

padding-right: 1em;

padding-left: 1em;

}

/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is

clickable and can be expanded/collapsed.

- Pipeline and ColumnTransformer use this feature and define the default style

- Estimators will overwrite some part of the style using the `sk-estimator` class

/* Pipeline and ColumnTransformer style (default) */

#sk-container-id-1 div.sk-toggleable {

/* Default theme specific background. It is overwritten whether we have a

specific estimator or a Pipeline/ColumnTransformer */

background-color: var(--sklearn-color-background);

}

/* Toggleable label */

#sk-container-id-1 label.sk-toggleable__label {

cursor: pointer;

display: flex;

width: 100%;

margin-bottom: 0;

padding: 0.5em;

box-sizing: border-box;

text-align: center;

align-items: center;

justify-content: center;

gap: 0.5em;

}

#sk-container-id-1 label.sk-toggleable__label .caption {

font-size: 0.6rem;

font-weight: lighter;

color: var(--sklearn-color-text-muted);

}

#sk-container-id-1 label.sk-toggleable__label-arrow:before {

/* Arrow on the left of the label */

content: "▸";

float: left;

margin-right: 0.25em;

color: var(--sklearn-color-icon);

}

#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {

color: var(--sklearn-color-text);

}

/* Toggleable content - dropdown */

#sk-container-id-1 div.sk-toggleable__content {

display: none;

text-align: left;

/* unfitted */

background-color: var(--sklearn-color-unfitted-level-0);

}

#sk-container-id-1 div.sk-toggleable__content.fitted {

/* fitted */

background-color: var(--sklearn-color-fitted-level-0);

}

#sk-container-id-1 div.sk-toggleable__content pre {

margin: 0.2em;

border-radius: 0.25em;

color: var(--sklearn-color-text);

/* unfitted */

background-color: var(--sklearn-color-unfitted-level-0);

}

#sk-container-id-1 div.sk-toggleable__content.fitted pre {

/* unfitted */

background-color: var(--sklearn-color-fitted-level-0);

}

#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {

/* Expand drop-down */

display: block;

width: 100%;

overflow: visible;

}

#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {

content: "▾";

}

/* Pipeline/ColumnTransformer-specific style */

#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {

color: var(--sklearn-color-text);

background-color: var(--sklearn-color-unfitted-level-2);

}

#sk-container-id-1 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {

background-color: var(--sklearn-color-fitted-level-2);

}

/* Estimator-specific style */

/* Colorize estimator box */

#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {

/* unfitted */

background-color: var(--sklearn-color-unfitted-level-2);

}

#sk-container-id-1 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {

/* fitted */

background-color: var(--sklearn-color-fitted-level-2);

}

#sk-container-id-1 div.sk-label label.sk-toggleable__label,

#sk-container-id-1 div.sk-label label {

/* The background is the default theme color */

color: var(--sklearn-color-text-on-default-background);

}

/* On hover, darken the color of the background */

#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {

color: var(--sklearn-color-text);

background-color: var(--sklearn-color-unfitted-level-2);

}

/* Label box, darken color on hover, fitted */

#sk-container-id-1 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {

color: var(--sklearn-color-text);

background-color: var(--sklearn-color-fitted-level-2);

}

/* Estimator label */

#sk-container-id-1 div.sk-label label {

font-family: monospace;

font-weight: bold;

line-height: 1.2em;

}

#sk-container-id-1 div.sk-label-container {

text-align: center;

}

/* Estimator-specific */

#sk-container-id-1 div.sk-estimator {

font-family: monospace;

border: 1px dotted var(--sklearn-color-border-box);

border-radius: 0.25em;

box-sizing: border-box;

margin-bottom: 0.5em;

/* unfitted */

background-color: var(--sklearn-color-unfitted-level-0);

}

#sk-container-id-1 div.sk-estimator.fitted {

/* fitted */

background-color: var(--sklearn-color-fitted-level-0);

}

/* on hover */

#sk-container-id-1 div.sk-estimator:hover {

/* unfitted */

background-color: var(--sklearn-color-unfitted-level-2);

}

#sk-container-id-1 div.sk-estimator.fitted:hover {

/* fitted */

background-color: var(--sklearn-color-fitted-level-2);

}

/* Specification for estimator info (e.g. "i" and "?") */

/* Common style for "i" and "?" */

.sk-estimator-doc-link,

a:link.sk-estimator-doc-link,

a:visited.sk-estimator-doc-link {

float: right;

font-size: smaller;

line-height: 1em;

font-family: monospace;

background-color: var(--sklearn-color-unfitted-level-0);

border-radius: 1em;

height: 1em;

width: 1em;

text-decoration: none !important;

margin-left: 0.5em;

text-align: center;

/* unfitted */

border: var(--sklearn-color-unfitted-level-3) 1pt solid;

color: var(--sklearn-color-unfitted-level-3);

}

.sk-estimator-doc-link.fitted,

a:link.sk-estimator-doc-link.fitted,

a:visited.sk-estimator-doc-link.fitted {

/* fitted */

background-color: var(--sklearn-color-fitted-level-0);

border: var(--sklearn-color-fitted-level-3) 1pt solid;

color: var(--sklearn-color-fitted-level-3);

}

/* On hover */

div.sk-estimator:hover .sk-estimator-doc-link:hover,

.sk-estimator-doc-link:hover,

div.sk-label-container:hover .sk-estimator-doc-link:hover,

.sk-estimator-doc-link:hover {

/* unfitted */

background-color: var(--sklearn-color-unfitted-level-3);

border: var(--sklearn-color-fitted-level-0) 1pt solid;

color: var(--sklearn-color-unfitted-level-0);

text-decoration: none;

}

div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,

.sk-estimator-doc-link.fitted:hover,

div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,

.sk-estimator-doc-link.fitted:hover {

/* fitted */

background-color: var(--sklearn-color-fitted-level-3);

border: var(--sklearn-color-fitted-level-0) 1pt solid;

color: var(--sklearn-color-fitted-level-0);

text-decoration: none;

}

/* Span, style for the box shown on hovering the info icon */

.sk-estimator-doc-link span {

display: none;

z-index: 9999;

position: relative;

font-weight: normal;

right: .2ex;

padding: .5ex;

margin: .5ex;

width: min-content;

min-width: 20ex;

max-width: 50ex;

color: var(--sklearn-color-text);

box-shadow: 2pt 2pt 4pt #999;

/* unfitted */

background: var(--sklearn-color-unfitted-level-0);

border: .5pt solid var(--sklearn-color-unfitted-level-3);

}

.sk-estimator-doc-link.fitted span {

/* fitted */

background: var(--sklearn-color-fitted-level-0);

border: var(--sklearn-color-fitted-level-3);

}

.sk-estimator-doc-link:hover span {

display: block;

}

/* "?"-specific style due to the `<a>` HTML tag */

#sk-container-id-1 a.estimator_doc_link {

float: right;

font-size: 1rem;

line-height: 1em;

font-family: monospace;

background-color: var(--sklearn-color-unfitted-level-0);

border-radius: 1rem;

height: 1rem;

width: 1rem;

text-decoration: none;

/* unfitted */

color: var(--sklearn-color-unfitted-level-1);

border: var(--sklearn-color-unfitted-level-1) 1pt solid;

}

#sk-container-id-1 a.estimator_doc_link.fitted {

/* fitted */

background-color: var(--sklearn-color-fitted-level-0);

border: var(--sklearn-color-fitted-level-1) 1pt solid;

color: var(--sklearn-color-fitted-level-1);

}

/* On hover */

#sk-container-id-1 a.estimator_doc_link:hover {

/* unfitted */

background-color: var(--sklearn-color-unfitted-level-3);

color: var(--sklearn-color-background);

text-decoration: none;

}

#sk-container-id-1 a.estimator_doc_link.fitted:hover {

/* fitted */

background-color: var(--sklearn-color-fitted-level-3);

}

.estimator-table {

font-family: monospace;

}

.estimator-table summary {

padding: .5rem;

cursor: pointer;

}

.estimator-table summary::marker {

font-size: 0.7rem;

}

.estimator-table details[open] {

padding-left: 0.1rem;

padding-right: 0.1rem;

padding-bottom: 0.3rem;

}

.estimator-table .parameters-table {

margin-left: auto !important;

margin-right: auto !important;

margin-top: 0;

}

.estimator-table .parameters-table tr:nth-child(odd) {

background-color: #fff;

}

.estimator-table .parameters-table tr:nth-child(even) {

background-color: #f6f6f6;

}

.estimator-table .parameters-table tr:hover {

background-color: #e0e0e0;

}

.estimator-table table td {

border: 1px solid rgba(106, 105, 104, 0.232);

}

`table td`is set in notebook with right text-align.

We need to overwrite it.

.estimator-table table td.param {

text-align: left;

position: relative;

padding: 0;

}

.user-set td {

color:rgb(255, 94, 0);

text-align: left !important;

}

.user-set td.value {

color:rgb(255, 94, 0);

background-color: transparent;

}

.default td {

color: black;

text-align: left !important;

}

.user-set td i,

.default td i {

color: black;

}

Styles for parameter documentation links

We need styling for visited so jupyter doesn't overwrite it

a.param-doc-link,

a.param-doc-link:link,

a.param-doc-link:visited {

text-decoration: underline dashed;

text-underline-offset: .3em;

color: inherit;

display: block;

padding: .5em;

}

/* "hack" to make the entire area of the cell containing the link clickable */

a.param-doc-link::before {

position: absolute;

content: "";

inset: 0;

}

.param-doc-description {

display: none;

position: absolute;

z-index: 9999;

left: 0;

padding: .5ex;

margin-left: 1.5em;

color: var(--sklearn-color-text);

box-shadow: .3em .3em .4em #999;

width: max-content;

text-align: left;

max-height: 10em;

overflow-y: auto;

/* unfitted */

background: var(--sklearn-color-unfitted-level-0);

border: thin solid var(--sklearn-color-unfitted-level-3);

}

/* Fitted state for parameter tooltips */

.fitted .param-doc-description {

/* fitted */

background: var(--sklearn-color-fitted-level-0);

border: thin solid var(--sklearn-color-fitted-level-3);

}

.param-doc-link:hover .param-doc-description {

display: block;

}

.copy-paste-icon {

background-image: url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCA0NDggNTEyIj48IS0tIUZvbnQgQXdlc29tZSBGcmVlIDYuNy4yIGJ5IEBmb250YXdlc29tZSAtIGh0dHBzOi8vZm9udGF3ZXNvbWUuY29tIExpY2Vuc2UgLSBodHRwczovL2ZvbnRhd2Vzb21lLmNvbS9saWNlbnNlL2ZyZWUgQ29weXJpZ2h0IDIwMjUgRm9udGljb25zLCBJbmMuLS0+PHBhdGggZD0iTTIwOCAwTDMzMi4xIDBjMTIuNyAwIDI0LjkgNS4xIDMzLjkgMTQuMWw2Ny45IDY3LjljOSA5IDE0LjEgMjEuMiAxNC4xIDMzLjlMNDQ4IDMzNmMwIDI2LjUtMjEuNSA0OC00OCA0OGwtMTkyIDBjLTI2LjUgMC00OC0yMS41LTQ4LTQ4bDAtMjg4YzAtMjYuNSAyMS41LTQ4IDQ4LTQ4ek00OCAxMjhsODAgMCAwIDY0LTY0IDAgMCAyNTYgMTkyIDAgMC0zMiA2NCAwIDAgNDhjMCAyNi41LTIxLjUgNDgtNDggNDhMNDggNTEyYy0yNi41IDAtNDgtMjEuNS00OC00OEwwIDE3NmMwLTI2LjUgMjEuNS00OCA0OC00OHoiLz48L3N2Zz4=);

background-repeat: no-repeat;

background-size: 14px 14px;

background-position: 0;

display: inline-block;

width: 14px;

height: 14px;

cursor: pointer;

}

</style><body><div id="sk-container-id-1" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>LinearRegression()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item"><div class="sk-estimator fitted sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-1" type="checkbox" checked><label for="sk-estimator-id-1" class="sk-toggleable__label fitted sk-toggleable__label-arrow"><div><div>LinearRegression</div></div><div><a class="sk-estimator-doc-link fitted" rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LinearRegression.html">?<span>Documentation for LinearRegression</span></a><span class="sk-estimator-doc-link fitted">i<span>Fitted</span></span></div></label><div class="sk-toggleable__content fitted" data-param-prefix="">

<summary>Parameters</summary>

<tbody>

<td><i class="copy-paste-icon"

onclick="copyToClipboard('fit_intercept',

this.parentElement.nextElementSibling)"

></i></td>

<a class="param-doc-link"

rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LinearRegression.html#:~:text=fit_intercept,-bool%2C%20default%3DTrue">

fit_intercept

<span class="param-doc-description">fit_intercept: bool, default=True<br><br>Whether to calculate the intercept for this model. If set<br>to False, no intercept will be used in calculations<br>(i.e. data is expected to be centered).</span>

</a>

</td>

</tr>

<td><i class="copy-paste-icon"

onclick="copyToClipboard('copy_X',

this.parentElement.nextElementSibling)"

></i></td>

<a class="param-doc-link"

rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LinearRegression.html#:~:text=copy_X,-bool%2C%20default%3DTrue">

copy_X

<span class="param-doc-description">copy_X: bool, default=True<br><br>If True, X will be copied; else, it may be overwritten.</span>

</a>

</td>

</tr>

<td><i class="copy-paste-icon"

onclick="copyToClipboard('tol',

this.parentElement.nextElementSibling)"

></i></td>

<a class="param-doc-link"

rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LinearRegression.html#:~:text=tol,-float%2C%20default%3D1e-6">

tol

<span class="param-doc-description">tol: float, default=1e-6<br><br>The precision of the solution (`coef_`) is determined by `tol` which<br>specifies a different convergence criterion for the `lsqr` solver.<br>`tol` is set as `atol` and `btol` of :func:`scipy.sparse.linalg.lsqr` when<br>fitting on sparse training data. This parameter has no effect when fitting<br>on dense data.<br><br>.. versionadded:: 1.7</span>

</a>

</td>

</tr>

<td><i class="copy-paste-icon"

onclick="copyToClipboard('n_jobs',

this.parentElement.nextElementSibling)"

></i></td>

<a class="param-doc-link"

rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LinearRegression.html#:~:text=n_jobs,-int%2C%20default%3DNone">

n_jobs

<span class="param-doc-description">n_jobs: int, default=None<br><br>The number of jobs to use for the computation. This will only provide<br>speedup in case of sufficiently large problems, that is if firstly<br>`n_targets > 1` and secondly `X` is sparse or if `positive` is set<br>to `True`. ``None`` means 1 unless in a<br>:obj:`joblib.parallel_backend` context. ``-1`` means using all<br>processors. See :term:`Glossary <n_jobs>` for more details.</span>

</a>

</td>

</tr>

<td><i class="copy-paste-icon"

onclick="copyToClipboard('positive',

this.parentElement.nextElementSibling)"

></i></td>

<a class="param-doc-link"

rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LinearRegression.html#:~:text=positive,-bool%2C%20default%3DFalse">

positive

<span class="param-doc-description">positive: bool, default=False<br><br>When set to ``True``, forces the coefficients to be positive. This<br>option is only supported for dense arrays.<br><br>For a comparison between a linear regression model with positive constraints<br>on the regression coefficients and a linear regression without such constraints,<br>see :ref:`sphx_glr_auto_examples_linear_model_plot_nnls.py`.<br><br>.. versionadded:: 0.24</span>

</a>

</td>

<td class="value">False</td>

</tr>

</tbody>

</table>

</details>

</div>

</div></div></div></div></div><script>function copyToClipboard(text, element) {

// Get the parameter prefix from the closest toggleable content

const toggleableContent = element.closest('.sk-toggleable__content');

const paramPrefix = toggleableContent ? toggleableContent.dataset.paramPrefix : '';

const fullParamName = paramPrefix ? `${paramPrefix}${text}` : text;

const originalStyle = element.style;

const computedStyle = window.getComputedStyle(element);

const originalWidth = computedStyle.width;

const originalHTML = element.innerHTML.replace('Copied!', '');

navigator.clipboard.writeText(fullParamName)

.then(() => {

element.style.width = originalWidth;

element.style.color = 'green';

element.innerHTML = "Copied!";

setTimeout(() => {

element.innerHTML = originalHTML;

element.style = originalStyle;

}, 2000);

})

.catch(err => {

console.error('Failed to copy:', err);

element.style.color = 'red';

element.innerHTML = "Failed!";

setTimeout(() => {

element.innerHTML = originalHTML;

element.style = originalStyle;

}, 2000);

});

return false;

}

document.querySelectorAll('.copy-paste-icon').forEach(function(element) {

const toggleableContent = element.closest('.sk-toggleable__content');

const paramPrefix = toggleableContent ? toggleableContent.dataset.paramPrefix : '';

const paramName = element.parentElement.nextElementSibling

.textContent.trim().split(' ')[0];

const fullParamName = paramPrefix ? `${paramPrefix}${paramName}` : paramName;

element.setAttribute('title', fullParamName);

});

/**

* Adapted from Skrub

* https://github.com/skrub-data/skrub/blob/403466d1d5d4dc76a7ef569b3f8228db59a31dc3/skrub/_reporting/_data/templates/report.js#L789

* @returns "light" or "dark"

function detectTheme(element) {

const body = document.querySelector('body');

// Check VSCode theme

const themeKindAttr = body.getAttribute('data-vscode-theme-kind');

const themeNameAttr = body.getAttribute('data-vscode-theme-name');

if (themeKindAttr && themeNameAttr) {

const themeKind = themeKindAttr.toLowerCase();

const themeName = themeNameAttr.toLowerCase();

if (themeKind.includes("dark") || themeName.includes("dark")) {

return "dark";

}

if (themeKind.includes("light") || themeName.includes("light")) {

return "light";

}

// Check Jupyter theme

if (body.getAttribute('data-jp-theme-light') === 'false') {

return 'dark';

} else if (body.getAttribute('data-jp-theme-light') === 'true') {

return 'light';

}

// Guess based on a parent element's color

const color = window.getComputedStyle(element.parentNode, null).getPropertyValue('color');

const match = color.match(/^rgb\s*$\s*(\d+)\s*,\s*(\d+)\s*,\s*(\d+)\s*$\s*$/i);

if (match) {

const [r, g, b] = [

parseFloat(match[1]),

parseFloat(match[2]),

parseFloat(match[3])

];

// https://en.wikipedia.org/wiki/HSL_and_HSV#Lightness

const luma = 0.299 * r + 0.587 * g + 0.114 * b;

if (luma > 180) {

// If the text is very bright we have a dark theme

return 'dark';

}

if (luma < 75) {

// If the text is very dark we have a light theme

return 'light';

}

// Otherwise fall back to the next heuristic.

}

// Fallback to system preference

return window.matchMedia('(prefers-color-scheme: dark)').matches ? 'dark' : 'light';

}

function forceTheme(elementId) {

const estimatorElement = document.querySelector(`#${elementId}`);

if (estimatorElement === null) {

console.error(`Element with id ${elementId} not found.`);

} else {

const theme = detectTheme(estimatorElement);

estimatorElement.classList.add(theme);

}

forceTheme('sk-container-id-1');</script></body>

```python

#line of Best-Fit Visualization

xx = df['rm']

yy = df['medv']

# Explicitly define x and y

sns.regplot(x=xx, y=yy, marker="*", color="r",line_kws={"color": "green","label":"Regression"},label = "Data Points")

plt.xlabel("Rooms",fontsize = 13)

plt.ylabel("Price",fontsize = 13)

plt.title("Best Fit Line (Regression)",fontsize = 13)

plt.legend()

plt.show()

```

![png](output_43_0.png)

```python

model.coef_

#here are the values of w's from w1 to w13

```

array([-1.13055924e-01, 3.01104641e-02, 4.03807204e-02, 2.78443820e+00,

-1.72026334e+01, 4.43883520e+00, -6.29636221e-03, -1.44786537e+00,

2.62429736e-01, -1.06467863e-02, -9.15456240e-01, 1.23513347e-02,

-5.08571424e-01])

```python

model.intercept_

#here is the value of intercept that is c

```

np.float64(30.24675099392349)

```python

#hence the equation of best fit lines becomes

#y = w1x1 + w2x2 + ....... + w13x13 + c

```

# Equation of Best Fit Regression line

```python

# Display coefficients along with respective features

coeff_df = pd.DataFrame({

'Feature': X.columns,

'Coefficient': model.coef_

})

print(coeff_df)

print("\nIntercept:", model.intercept_)

```

Feature Coefficient

0 crim -0.113056

1 zn 0.030110

2 indus 0.040381

3 chas 2.784438

4 nox -17.202633

5 rm 4.438835

6 age -0.006296

7 dis -1.447865

8 rad 0.262430

9 tax -0.010647

10 ptratio -0.915456

11 black 0.012351

12 lstat -0.508571

Intercept: 30.24675099392349

```python

eq = f"y = {model.intercept_:.2f}"

for coef, col in zip(model.coef_, X.columns):

eq += f" + ({coef:.2f} * {col})"

print(eq)

```

y = 30.25 + (-0.11 * crim) + (0.03 * zn) + (0.04 * indus) + (2.78 * chas) + (-17.20 * nox) + (4.44 * rm) + (-0.01 * age) + (-1.45 * dis) + (0.26 * rad) + (-0.01 * tax) + (-0.92 * ptratio) + (0.01 * black) + (-0.51 * lstat)

# Prediction

### Comparision of Actual and Predicted

```python

y_pred = model.predict(X_test)

y_pred

```

array([28.99672362, 36.02556534, 14.81694405, 25.03197915, 18.76987992,

23.25442929, 17.66253818, 14.34119 , 23.01320703, 20.63245597,

24.90850512, 18.63883645, -6.08842184, 21.75834668, 19.23922576,

26.19319733, 20.64773313, 5.79472718, 40.50033966, 17.61289074,

27.24909479, 30.06625441, 11.34179277, 24.16077616, 17.86058499,

15.83609765, 22.78148106, 14.57704449, 22.43626052, 19.19631835,

22.43383455, 25.21979081, 25.93909562, 17.70162434, 16.76911711,

16.95125411, 31.23340153, 20.13246729, 23.76579011, 24.6322925 ,

13.94204955, 32.25576301, 42.67251161, 17.32745046, 27.27618614,

16.99310991, 14.07009109, 25.90341861, 20.29485982, 29.95339638,

21.28860173, 34.34451856, 16.04739105, 26.22562412, 39.53939798,

22.57950697, 18.84531367, 32.72531661, 25.0673037 , 12.88628956,

22.68221908, 30.48287757, 31.52626806, 15.90148607, 20.22094826,

16.71089812, 20.52384893, 25.96356264, 30.61607978, 11.59783023,

20.51232627, 27.48111878, 11.01962332, 15.68096344, 23.79316251,

6.19929359, 21.6039073 , 41.41377225, 18.76548695, 8.87931901,

20.83076916, 13.25620627, 20.73963699, 9.36482222, 23.22444271,

31.9155003 , 19.10228271, 25.51579303, 29.04256769, 20.14358566,

25.5859787 , 5.70159447, 20.09474756, 14.95069156, 12.50395648,

20.72635294, 24.73957161, -0.164237 , 13.68486682, 16.18359697,

22.27621999, 24.47902364])

```python

y_pred.shape

```

(102,)

```python

#how much our predictions are close to actual prices

results = pd.DataFrame({"Actual":y_test,"Predicted":y_pred})

results.head()

#displaying first 5 rows

```

<div>

.dataframe tbody tr th:only-of-type {

vertical-align: middle;

}

.dataframe tbody tr th {

vertical-align: top;

}

.dataframe thead th {

text-align: right;

}

</style>

<thead>

<th>Actual</th>

<th>Predicted</th>

</tr>

</thead>

<tbody>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

<tr>

</tr>

</tbody>

</table>

</div>

# Model Evaluation

```python

print("R2_Scre :-",r2_score(y_test,y_pred)) # model accuracy

print("Mean Squared Error :- ",mean_squared_error(y_test,y_pred))

```

R2_Scre :- 0.6687594935356317

Mean Squared Error :- 24.291119474973538

```python

rmse = np.sqrt(mean_squared_error(y_test,y_pred)) #mse ** 0.5 -- Square root of Mean Squared Error

print("Root Mean Squared Error :- ",rmse)

```

Root Mean Squared Error :- 4.928602182665339

```python

```

```python

# Example custom input

# Order must match X.columns

custom_input = [[0.1, 18, 2.3, 0, 0.5, 6, 65, 4, 1, 300, 15, 390, 5]]

predicted_price = model.predict(custom_input)

print("Predicted Price:", predicted_price[0])

```

Predicted Price: 28.31199256486372

F:\Python_Files2026\Lib\site-packages\sklearn\utils\validation.py:2691: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names

warnings.warn(

```python

```

```python

```

```python

a = y_test

b = y_pred

plt.scatter(a,b)

plt.show()

```

![png](output_61_0.png)

```python

plt.figure(figsize=(10,5))

plt.plot(y_test.values, color='orange', label='Actual')

plt.plot(y_pred, color='green', label='Predicted')

plt.xlabel("Data Points")

plt.ylabel("Price")

plt.title("Actual vs Predicted Values")

plt.legend()

plt.show()

```

![png](output_62_0.png)

```python

plt.figure(figsize=(10,5))

# Actual values

plt.scatter(range(len(y_test)), y_test.values,

color='orange', label='Actual')

# Predicted values

plt.scatter(range(len(y_pred)), y_pred,

color='green', label='Predicted')

plt.xlabel("Data Points")

plt.ylabel("Price")

plt.title("Actual vs Predicted Values (Scatter Plot)")

plt.legend()

plt.show()

```

![png](output_63_0.png)

```python

import matplotlib.pyplot as plt

a = y_test

b = y_pred

plt.figure(figsize=(7,5))

# Scatter plot

plt.scatter(a, b, color='blue')

# Labels

plt.xlabel("Actual Values (y_test)")

plt.ylabel("Predicted Values (y_pred)")

# Title

plt.title("Actual vs Predicted Values")

# Optional: add diagonal reference line (very useful 🔥)

plt.plot([a.min(), a.max()], [a.min(), a.max()], color='red', linestyle='--')

plt.show()

```

![png](output_64_0.png)

```python

```

Search This Blog

RoleXxParker

Comments

Post a Comment