Installation Guide¶
📦 Required Libraries¶
This project uses several powerful Python libraries for data analysis, machine learning, and deep learning.
Libraries Overview¶
-
For handling and manipulating tabular data using DataFrames
-
Provides fast numerical operations and multi-dimensional arrays
-
Used to create static, animated, and interactive plots
-
A machine learning library with tools for modeling and evaluation
-
Enables statistical analysis and time series exploration
-
A powerful library for building and training deep learning models
-
High-level API within TensorFlow for fast neural network development
-
Python interface for creating Graphviz DOT graphs (optional)
-
Graph visualization tools used to render DOT graphs (optional)
🔧 Installation Methods¶
Method 1: Using Conda (Recommended)¶
If you have Anaconda or Miniconda installed, this is the most straightforward method:
conda install -c conda-forge pandas numpy matplotlib scikit-learn statsmodels tensorflow pydot graphviz -y
Why Conda?
Conda handles complex dependencies better, especially for TensorFlow and its GPU support.
Method 2: Using pip¶
For those using standard Python installations:
# Upgrade pip first
pip install --upgrade pip
# Install core packages
pip install pandas numpy matplotlib scikit-learn statsmodels
# Install TensorFlow (CPU version)
pip install tensorflow
# Optional: For GPU support
pip install tensorflow-gpu
# Optional: For neural network visualization
pip install pydot graphviz
Method 3: Using Requirements File¶
Create a requirements.txt file:
pandas>=1.3.0
numpy>=1.21.0
matplotlib>=3.4.0
scikit-learn>=1.0.0
statsmodels>=0.13.0
tensorflow>=2.10.0
pydot>=1.4.0
graphviz>=0.20.0
Then install all at once:
🐍 Setting Up Virtual Environment¶
Best Practice
Always use a virtual environment to avoid package conflicts!
Windows¶
# Create virtual environment
python -m venv hydro_env
# Activate it
hydro_env\Scripts\activate
# Install packages
pip install -r requirements.txt
# To deactivate when done
deactivate
macOS/Linux¶
# Create virtual environment
python3 -m venv hydro_env
# Activate it
source hydro_env/bin/activate
# Install packages
pip install -r requirements.txt
# To deactivate when done
deactivate
✅ Verify Installation¶
After installation, verify everything is working:
import sys
print(f"Python version: {sys.version}")
# Test imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn
import statsmodels
import tensorflow as tf
from tensorflow import keras
# Print versions
print(f"Pandas: {pd.__version__}")
print(f"NumPy: {np.__version__}")
print(f"Scikit-learn: {sklearn.__version__}")
print(f"TensorFlow: {tf.__version__}")
print(f"Keras: {keras.__version__}")
# Test TensorFlow
print(f"TensorFlow GPU Available: {tf.config.list_physical_devices('GPU')}")
Expected output:
Python version: 3.9.x (or higher)
Pandas: 1.3.x
NumPy: 1.21.x
Scikit-learn: 1.0.x
TensorFlow: 2.10.x
Keras: 2.10.x
TensorFlow GPU Available: [] (or list of GPUs if available)
🚨 Troubleshooting¶
Common Issues and Solutions¶
ImportError: No module named 'tensorflow'
Solution: Ensure you've activated your virtual environment and installed TensorFlow:
Graphviz not found
Solution: Graphviz requires system installation:
Windows: Download from Graphviz website
macOS:
Linux:
Memory errors with large datasets
Solution: Consider using: - Smaller batch sizes in neural networks - Data chunking with pandas - Google Colab for free GPU access
💻 Alternative: Google Colab¶
If you prefer not to install locally, use Google Colab:
- Go to Google Colab
- Create a new notebook
- Most libraries are pre-installed
- For additional packages:
🎯 Next Steps¶
Now that you have all the required libraries installed, proceed to:
- Data Import - Learn how to load and prepare your discharge data
- Performance Metrics - Understand model evaluation metrics
GPU Support
For faster neural network training, consider setting up GPU support:
- NVIDIA GPU with CUDA support
- Install CUDA toolkit and cuDNN
- Install tensorflow-gpu instead of tensorflow