20 - web scraping with python using beautiful soup & requests (Python tutorial for beginners 2019)
# Web Scraping with Python: A Step-by-Step Guide Using Beautiful Soup and Requests
In this video tutorial, we will guide you through the process of web scraping using Python. This is an essential skill for both beginners and professional developers. We will demonstrate how to scrape weather forecast data from a website and store it in a CSV file or a pandas DataFrame.
## Inspired by Dataquest.io: A Fun Project
This tutorial was inspired by a Dataquest.io tutorial on weather forecasting. Instead of focusing on general weather forecasts, we decided to make it more specific by scraping weather data for a lake where one of the creators resides. We will be using the website [forecast.weather.gov](https://www.weather.gov/) for our project.
## Getting Started: Setting Up the Environment
Before diving into the code, ensure you have the necessary libraries installed:
1. **Beautiful Soup**: A Python library for parsing HTML and XML documents.
2. **Requests**: A Python HTTP client library to make requests to web pages.
3. **Pandas**: An open-source data analysis tool that will help us store the scraped data in a structured format.
Run the following commands to install these libraries:
```bash
pip install beautifulsoup4
pip install requests
pip install pandas
```
## Inspecting the Web Page
Using Chrome DevTools, we can inspect the structure of the web page. The goal is to locate the part of the HTML that contains the weather forecast data.
1. Right-click on the webpage and select "Inspect" (or use the shortcut `Ctrl + Shift + C` for Windows or `Command + Shift + C` for Mac).
2. Hover over different elements to see their HTML structure.
3. Look for the container holding the 7-day weather forecast, which has an ID of `seven-day-forecast-body`.
## Writing the Code
### Importing Libraries
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
```
### Making HTTP Requests and Parsing HTML
```python
# Make a request to the website
page = requests.get("https://www.weather.gov/losangeles")
# Create a Beautiful Soup object
soup = BeautifulSoup(page.content, 'html.parser')
# Print out the soup object (HTML content)
print(soup)
```
### Extracting Specific Data
```python
# Find the container with the 7-day forecast
week = soup.find(id="seven-day-forecast-body")
# Get all items within the container
items = week.find_all(class_="tombstone-container")
```
### Accessing Individual Elements
```python
# Extract period names, descriptions, and temperatures
period_names = [item.find(class_="period-name").text for item in items]
short_descriptions = [item.find(class_="short-description").text for item in items]
temperatures = [item.find(class_="temp").text for item in items]
print(period_names)
print(short_descriptions)
print(temperatures)
```
## Using Pandas to Structure Data
```python
# Create a DataFrame using pandas
weather_data = {
"Period": period_names,
"Description": short_descriptions,
"Temperature": temperatures
}
df = pd.DataFrame(weather_data)
# Print the DataFrame
print(df)
```
### Exporting Data to CSV
```python
# Save the DataFrame to a CSV file
df.to_csv("weather.csv", index=False)
```
## Example: Scraping Weather Data for Chicago
If you want to scrape weather data for a different location, simply replace the URL with the city's forecast page (e.g., `https://www.weather.gov/chicago`).
## Conclusion
This tutorial demonstrates how powerful web scraping can be. By using Python libraries like Beautiful Soup and Requests, we can extract valuable data from websites and organize it into structured formats like CSV or pandas DataFrames.
Stay tuned for more tutorials in our series on automation with Python. If you're interested in learning more about automation and data science, check out our upcoming course: *How to Automate Stuff with Python*.