Python: Downloading data from the web

 Are you tired of going to the browser, downloading the data you want, and then saving it to your desired folder? Well, here is your solution! You can download the data from the web using Python! Let everything be automated!

Let's get started! The data used for this tutorial was downloaded from the following source: https://github.com/owid/covid-19-data/blob/master/public/data/vaccinations/vaccinations.csv.

After you have searched your file on the web (it can be any file from any web), the first thing you should do is to right-click on the file and copy its link address as shown in the figure below.




Now, go to your Python file and paste this link address of the file in order to read and download the file. For this purpose, since we are working with a link address, we have to import the request library from the urllib library.


#Importing libraries

from urllib import request


#Reading the file from the link

file_url = r'https://github.com/owid/covid-19-data/blob/master/public/data/vaccinations/vaccinations.csv?raw=true'


The letter r in the code stands for reading mode. Note that the link address should be inside the quotation marks (' ').

Now, we will get the file downloaded line by line and saved in a text file (which is not yet created). For this purpose, we will define a function for doing this, and then, at the end, we should call this function in order to get the data.


#Defining a function to download the file

def file_info(url):

    #Opening the url file

    file_open = request.urlopen(url)

    #Reading the file

    file_content = file_open.read()

    #Converting into string

    content = str(file_content)

    #Splitting the lines

    lines = content.split('\\n')


Notice that the function's name is file_info, and its input is called url, which can be differently named, as you prefer. However, if you do so, do not forget to change the corresponding names in the upcoming code lines!

Once the function is defined, the first thing we should do is to open the file from the web. For this, the function request.urlopen is needed. Then, in order for Python to go through the whole file and read it, the function read is needed. 

The opened file by Python is in the bit format, which is a complex format to work with. Thus, the need to convert it to a string format arises. After doing so, we must split the lines of the file, otherwise, the whole content of the file will be in one long line.

Now that Python is able to read the file from the web, we will save it as a new file in the same directory as our Python script file. For this purpose, we just need 4 lines of code! 


    with open('vaccinations.txt', 'w') as output_file:

        for line in lines:

            save_data = output_file.write(line + '\n')

            print(save_data)


Python has the possibility to 'open' a file that does not exist in a write mode. The write mode 'w' means that the text file Python just created is ready to be written. In the first line, output_file is the name of the variable. It is similar to this:


output_file = open('vaccinations.txt', 'w')


Then, the second line of the code is used to go through the lines variable, which contains the content of our web file. Once Python has read all the lines of the web file, it will copy and paste it into the created text file using the write function, and then save it. As already explained before, the keyword '\n' is used to split the lines.

Once we got the text file created with the content from the web file, we just need to call our function.


#Calling the function

file_info(file_url)


If we run this code, the text file created by Python will be found in the same folder as your Python script.




The final code will look like this:



Congratulations! Now you can surprise your programming teacher by downloading any file from any web automatically! In the next tutorial, you will learn how to manipulate huge amount of data!


Comments

Popular posts from this blog

Python: Tracking any phone number

Python: Pandas DataFrame data manipulation

Python Machine Learning: Linear Regression (I)