🐍 Python for Data Science

Master data analysis, visualization, and machine learning with Canadian market focus

8 Comprehensive Modules
40+ Code Examples
6 Real Projects

Course Overview

Duration: 12-16 weeks (self-paced)

Level: Beginner to Intermediate

Prerequisites: Basic Python knowledge

What You'll Learn

  • Data manipulation and cleaning with pandas
  • Numerical computing with NumPy
  • Data visualization with matplotlib and seaborn
  • Statistical analysis and hypothesis testing
  • Machine learning with scikit-learn
  • Deep learning introduction with TensorFlow
  • Working with APIs and web scraping
  • Real-world Toronto data projects
Python data science code and charts

Learning Modules

πŸ“Š Module 1: Python Data Fundamentals

Beginner

Topics Covered:

  • Python environment setup (Anaconda, Jupyter)
  • NumPy arrays and operations
  • Pandas DataFrames and Series
  • Data loading from CSV, JSON, APIs

Code Example - Loading Toronto Open Data:

import pandas as pd
import numpy as np
import requests

# Load Toronto neighbourhood data
url = "https://open.toronto.ca/dataset/neighbourhoods/"
df = pd.read_csv("neighbourhoods.csv")

# Basic data exploration
print(f"Dataset shape: {df.shape}")
print(f"Columns: {df.columns.tolist()}")
print(df.head())

# Data info and statistics
df.info()
df.describe()

Hands-on Project:

Toronto Housing Price Analysis - Analyze real estate data from the Toronto Regional Real Estate Board (TRREB) to understand market trends across different neighbourhoods.

🧹 Module 2: Data Cleaning & Preprocessing

Beginner

Topics Covered:

  • Handling missing data
  • Data type conversion and validation
  • Outlier detection and treatment
  • Data transformation and normalization

Code Example - Cleaning TTC Data:

# Clean TTC ridership data
import pandas as pd

# Load TTC data
ttc_data = pd.read_csv("ttc_ridership.csv")

# Handle missing values
ttc_data['ridership'].fillna(ttc_data['ridership'].median(), inplace=True)

# Convert date column
ttc_data['date'] = pd.to_datetime(ttc_data['date'])

# Remove outliers using IQR method
Q1 = ttc_data['ridership'].quantile(0.25)
Q3 = ttc_data['ridership'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

clean_data = ttc_data[
    (ttc_data['ridership'] >= lower_bound) & 
    (ttc_data['ridership'] <= upper_bound)
]

Hands-on Project:

Toronto Crime Data Preprocessing - Clean and prepare Toronto Police Service crime data for analysis, handling inconsistencies and missing information.

πŸ“ˆ Module 3: Data Visualization

Intermediate

Topics Covered:

  • Matplotlib fundamentals
  • Seaborn statistical plots
  • Interactive visualizations with Plotly
  • Dashboard creation

Code Example - Toronto Weather Visualization:

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Load Toronto weather data
weather_df = pd.read_csv("toronto_weather_2023.csv")

# Create temperature trend plot
plt.figure(figsize=(12, 6))
plt.plot(weather_df['date'], weather_df['temperature'])
plt.title('Toronto Temperature Trends 2023')
plt.xlabel('Date')
plt.ylabel('Temperature (Β°C)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

# Interactive plot with Plotly
fig = px.line(weather_df, x='date', y='temperature',
              title='Interactive Toronto Temperature Chart')
fig.show()

Hands-on Project:

Toronto Air Quality Dashboard - Create interactive visualizations of Toronto's air quality data, showing pollution trends across different seasons and locations.

πŸ“Š Module 4: Statistical Analysis

Intermediate

Topics Covered:

  • Descriptive statistics
  • Correlation and regression analysis
  • Hypothesis testing
  • A/B testing for Canadian businesses

Code Example - Toronto Tech Salary Analysis:

import scipy.stats as stats
from scipy.stats import ttest_ind

# Compare tech salaries by education level
bachelor_salaries = toronto_tech_df[
    toronto_tech_df['education'] == 'Bachelor'
]['salary']

master_salaries = toronto_tech_df[
    toronto_tech_df['education'] == 'Master'
]['salary']

# Perform t-test
t_stat, p_value = ttest_ind(bachelor_salaries, master_salaries)

print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("Significant difference in salaries")
else:
    print("No significant difference found")

# Correlation analysis
correlation_matrix = toronto_tech_df[
    ['experience_years', 'salary', 'company_size']
].corr()

sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Toronto Tech Career Correlation Matrix')
plt.show()

Hands-on Project:

Toronto Transit Efficiency Study - Analyze TTC performance data to identify patterns and make data-driven recommendations for service improvements.

πŸ€– Module 5: Machine Learning Fundamentals

Intermediate

Topics Covered:

  • Supervised vs unsupervised learning
  • Linear and logistic regression
  • Decision trees and random forests
  • Model evaluation and cross-validation

Code Example - Toronto Housing Price Prediction:

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, r2_score

# Prepare Toronto housing data
features = ['bedrooms', 'bathrooms', 'sq_ft', 'neighbourhood_score']
X = housing_df[features]
y = housing_df['price']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train Random Forest model
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

# Make predictions
predictions = rf_model.predict(X_test)

# Evaluate model
mae = mean_absolute_error(y_test, predictions)
r2 = r2_score(y_test, predictions)

print(f"Mean Absolute Error: ${mae:,.2f}")
print(f"RΒ² Score: {r2:.4f}")

# Feature importance
feature_importance = pd.DataFrame({
    'feature': features,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)

print("\nFeature Importance:")
print(feature_importance)

Hands-on Project:

Toronto Tech Job Matching System - Build a machine learning model that matches developers with suitable job opportunities based on skills, experience, and preferences.

🧠 Module 6: Deep Learning Introduction

Advanced

Topics Covered:

  • Neural network fundamentals
  • TensorFlow and Keras basics
  • Image classification and NLP
  • Transfer learning applications

Code Example - Toronto Traffic Pattern Recognition:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Create a simple neural network for traffic prediction
model = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=(8,)),
    layers.Dropout(0.3),
    layers.Dense(32, activation='relu'),
    layers.Dropout(0.3),
    layers.Dense(1, activation='linear')
])

# Compile model
model.compile(
    optimizer='adam',
    loss='mse',
    metrics=['mae']
)

# Train on Toronto traffic data
# Features: hour, day_of_week, weather, events, etc.
history = model.fit(
    X_train_traffic, y_train_traffic,
    validation_data=(X_val_traffic, y_val_traffic),
    epochs=100,
    batch_size=32,
    verbose=1
)

# Evaluate model performance
test_loss, test_mae = model.evaluate(X_test_traffic, y_test_traffic)
print(f"Test MAE: {test_mae:.2f} minutes")

Hands-on Project:

Toronto Social Media Sentiment Analysis - Build a deep learning model to analyze sentiment about Toronto tech companies from social media data.

🌐 Module 7: Web APIs & Data Collection

Intermediate

Topics Covered:

  • REST API consumption
  • Web scraping with BeautifulSoup
  • Working with JSON and XML data
  • Rate limiting and ethical scraping

Code Example - Toronto Open Data API:

import requests
import json
from bs4 import BeautifulSoup

# Access Toronto Open Data API
def get_toronto_data(dataset_id):
    base_url = "https://ckan0.cf.opendata.inter.sandbox-toronto.ca/api/3/"
    endpoint = f"action/package_show?id={dataset_id}"
    
    response = requests.get(base_url + endpoint)
    
    if response.status_code == 200:
        return response.json()
    else:
        return None

# Get parking ticket data
parking_data = get_toronto_data("parking-tickets")

# Process the data
if parking_data:
    resources = parking_data['result']['resources']
    for resource in resources:
        if resource['format'].upper() == 'CSV':
            csv_url = resource['url']
            df = pd.read_csv(csv_url)
            print(f"Loaded {len(df)} parking ticket records")
            break

# Web scraping example - Toronto tech job postings
def scrape_tech_jobs():
    headers = {
        'User-Agent': 'Mozilla/5.0 (compatible; DataScience-Student)'
    }
    
    # Note: Always check robots.txt and respect website policies
    url = "https://example-job-site.com/toronto-tech-jobs"
    response = requests.get(url, headers=headers)
    
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, 'html.parser')
        jobs = soup.find_all('div', class_='job-listing')
        
        job_data = []
        for job in jobs:
            title = job.find('h3', class_='job-title').text.strip()
            company = job.find('span', class_='company').text.strip()
            salary = job.find('span', class_='salary').text.strip()
            
            job_data.append({
                'title': title,
                'company': company,
                'salary': salary
            })
        
        return pd.DataFrame(job_data)
    
    return None

Hands-on Project:

Canadian tech market Intelligence Dashboard - Create an automated system that collects and analyzes job postings, salary trends, and skill demands in Canada's tech market.

πŸš€ Module 8: Advanced Projects & Deployment

Advanced

Topics Covered:

  • End-to-end ML pipeline development
  • Model deployment with Flask/FastAPI
  • Cloud deployment (AWS, GCP, Azure)
  • Portfolio project development

Code Example - Deploy ML Model with Flask:

from flask import Flask, request, jsonify
import joblib
import pandas as pd

app = Flask(__name__)

# Load trained model
model = joblib.load('toronto_housing_model.pkl')
scaler = joblib.load('feature_scaler.pkl')

@app.route('/predict', methods=['POST'])
def predict_price():
    try:
        # Get data from request
        data = request.get_json()
        
        # Create DataFrame
        features = pd.DataFrame([data])
        
        # Scale features
        features_scaled = scaler.transform(features)
        
        # Make prediction
        prediction = model.predict(features_scaled)[0]
        
        return jsonify({
            'predicted_price': round(prediction, 2),
            'currency': 'CAD',
            'status': 'success'
        })
    
    except Exception as e:
        return jsonify({
            'error': str(e),
            'status': 'error'
        })

if __name__ == '__main__':
    app.run(debug=True)

Capstone Project:

Toronto Smart City Analytics Platform - Develop a comprehensive data science platform that integrates multiple Toronto datasets (traffic, weather, events, demographics) to provide insights for city planning and business decisions.

Career Integration in Canada's tech market

🎯 In-Demand Data Science Roles in Toronto

  • Data Scientist ($80-140k CAD): RBC, Shopify, TD Bank
  • Machine Learning Engineer ($90-160k CAD): Wealthsimple, FreshBooks
  • Data Analyst ($60-95k CAD): Government of Canada, Ontario Health
  • AI Research Scientist ($120-200k CAD): Vector Institute, Uber ATG
  • Business Intelligence Developer ($70-110k CAD): Canadian Tire, Loblaws

πŸ”§ Essential Toronto Data Science Stack

  • Languages: Python, R, SQL
  • Cloud Platforms: AWS (most common), Azure (government), GCP
  • Big Data: Spark, Hadoop, Databricks
  • MLOps: MLflow, Kubeflow, Docker
  • Visualization: Tableau, Power BI (government), D3.js

Building Your Toronto Data Science Portfolio

GitHub Essentials: Showcase 3-5 projects demonstrating different aspects of data science, with special emphasis on Canadian datasets and business contexts.

Recommended Portfolio Projects:

  1. Toronto Housing Market Predictor - End-to-end ML pipeline
  2. TTC Performance Dashboard - Real-time data visualization
  3. Canadian Tech Salary Analyzer - Statistical analysis and insights
  4. Toronto Tourism Sentiment Analysis - NLP and deep learning
  5. Climate Change Impact on Ontario - Time series forecasting

Networking Opportunities: Toronto Machine Learning Society (TMLS), PyData Toronto, Data Science Toronto Meetup, and Toronto AI conferences.

Data science portfolio and career opportunities in Toronto

Additional Resources & Next Steps

πŸ“š Learning Resources

  • Python for Data Analysis (McKinney)
  • Hands-On Machine Learning (GΓ©ron)
  • The Elements of Statistical Learning
  • Coursera: Machine Learning by Andrew Ng
  • edX: MIT Introduction to Computer Science

πŸ› οΈ Practice Platforms

  • Kaggle Competitions
  • Google Colab for free GPU access
  • GitHub for version control
  • Toronto Open Data portal
  • Statistics Canada datasets

πŸŽ“ Certification Paths

  • AWS Certified Machine Learning
  • Google Cloud Professional ML Engineer
  • Microsoft Azure Data Scientist Associate
  • Coursera Data Science Specializations
  • University of Toronto Data Science Certificate

Ready to Start Your Data Science Journey?

Join thousands of Toronto developers who are building successful data science careers with our comprehensive tutorials and community support.

Get Mentorship View All Tutorials