Python Machine Learning: Linear Regression (I)
Have you ever felt your phone hears your conversation? For example, you and your friend are talking about new shoes, and then when you pick your phone up, you get a bunch of ads about it? Or, when you watch a movie or series on Netflix, the next time you get recommendations of your taste? Well, this all is possible thanks to Machine Learning!
If you have not heard about it, let me explain it to you. Machine Learning is a very popular topic nowadays. It is a method to analyze data in analytical performance. Machine Learning helps humans with very complicated topics, such as forecasting bitcoin price. In machine learning, the AI model learns from data, analyzes it, and, then, establishes patterns to make future decisions.
In this and in the following tutorials, you will learn the basics of Machine Learning using Python. In this tutorial, linear regression will be explained. Before we start coding, what is linear regression? Well, linear regression is an algorithm where the predicted values have a linear slope. In general, regression is mostly used to find the relationship between the variables and forecasting. In the case of linear regression, this relationship will be linear.
In this tutorial, the linear regression will be made using matrix multiplication. If we remember our high school Math lessons, a linear relationship between the dependent and independent variables has the form: y = c0*x0 + c1*x1 or y = c0 + c1*x, where c0 will be the intercept with the y-axis, and c1 - the slope of the line.
This relationship can be expressed in a matrix way. In the system, we will have 3 matrices. The first one will be the values of y, the second one will be a set of x (in this case we will have only x0 and x1) known as Vandermonde matrix, and the third matrix will consist of the coefficients of x (c0 and c1).
Said this, let's start coding! For this tutorial, the y- and x-values are in a text file named 'points.txt' saved in the same directory as your Python file, The first thing we should do is to import the x- and y- values from the text file into Python. As you already learned in the previous tutorial, data can be imported into a DataFrame using the pandas library. However, in this case, we will import the data into an array using the numpy library.
#Importing libraries
import numpy as np
#Importing text file
data = np.loadtxt('points.txt', skiprows=(2), dtype=float)
print(data)
The picture above shows a small part of the whole data. As you can notice, it is a 2D-array where the x- and y-values are delimited by the comma (right and left respectively). To have an idea how these data look like, let's first set our x- and y-values and then, plot them. For this, we will use the matplotlib.pyplot library.
#Importing libraries
import matplotlib.pyplot as plt
#Setting x- and y- values
x = data[:,0]
y = data[:,1]
#Plotting data
plt.plot(x,y,'o')
plt.title('Original data')
plt.xlabel('x')
plt.ylabel('y')
plt.show()
Now, let's define our Vandermonde matrix. In linear algebra, a Vandermonde matrix is a matrix with terms of a geometric progression in each row:
x10 | x11 | x12 | x13 | ... | x1d |
---|---|---|---|---|---|
x20 | x21 | x22 | x23 | ... | x2d |
x30 | x31 | x32 | x33 | ... | x3d |
... | ... | ... | ... | ... | ... |
xn0 | xn1 | xn2 | xn3 | ... | xnd |
1 | x1 |
---|---|
1 | x2 |
1 | x3 |
... | ... |
1 | xn |
Comments
Post a Comment