Linux Troubleshooting Guide
Category: System Administration & Troubleshooting
Tags: linux, troubleshooting, debugging, system-recovery, performance, diagnostics
Essential Troubleshooting Methodology
What this guide covers: Comprehensive troubleshooting scenarios for common Linux system problems with step-by-step diagnostic procedures, root cause analysis, and resolution strategies.
Why systematic troubleshooting matters: Random fixes waste time and can make problems worse. A methodical approach identifies root causes quickly and prevents recurring issues.
General Troubleshooting Process:
1. Define the problem - What exactly is broken?
2. Gather information - When did it start? What changed?
3. Identify possible causes - What could cause this symptom?
4. Test hypotheses - Systematically eliminate causes
5. Implement solution - Fix the root cause
6. Verify fix - Confirm problem is resolved
7. Prevent recurrence - Monitor and improve
System Performance Issues
Scenario 1: "System is extremely slow"
Problem Description: System responds slowly to commands, applications take forever to load, general sluggish performance.
What causes slow performance: High CPU usage, memory exhaustion, disk I/O bottlenecks, network issues, or system resource contention.
Why this happens: Resource-intensive processes, memory leaks, failing hardware, insufficient system resources, or poorly configured services.
Step-by-step diagnosis:
# Step 1: Quick system health check
uptime # Check load averages
# Normal: load < number of CPU cores
# Problem: load consistently > (CPU cores × 2)
# Example: 4-core system should have load < 8
free -h # Check memory usage
# Look for: low available memory, high swap usage
# Problem indicators: <100MB available, swap usage >50%
df -h # Check disk space
# Problem indicators: any filesystem >90% full
# Root filesystem >95% can cause severe slowdowns
# Step 2: Identify resource-intensive processes
top -o %CPU # Sort by CPU usage
# Look for: processes using >50% CPU consistently
# Note: PID, user, and command of high-CPU processes
ps aux --sort=-%mem | head -10 # Top memory consumers
# Look for: processes using >20% memory
# Check for: obvious memory leaks (increasing memory over time)
# Step 3: Check I/O performance
iostat -x 1 5 # Monitor disk I/O for 5 seconds
# Problem indicators:
# %util consistently >80% (disk bottleneck)
# await >100ms (slow disk response)
# High r/s or w/s with low rkB/s, wkB/s (many small operations)
sudo iotop -o # Show processes doing I/O
# Identify: which processes are causing disk activity
# Look for: unexpectedly high I/O from specific applications
# Step 4: Network performance check
ping -c 5 8.8.8.8 # Test external connectivity
# Problem indicators: >100ms latency, packet loss
# If network-dependent apps are slow
ss -tuln | wc -l # Count network connections
# High connection count might indicate network service issues
Root cause analysis and solutions:
# High CPU usage solutions:
# 1. Kill runaway processes
sudo kill -15 [PID] # Graceful termination
sudo kill -9 [PID] # Force kill if unresponsive
# 2. Lower process priority
sudo renice +10 [PID] # Lower priority (higher nice value)
sudo ionice -c 3 -p [PID] # Lower I/O priority
# 3. Check for CPU-intensive services
systemctl list-units --type=service --state=running
# Disable unnecessary services:
sudo systemctl stop [service-name]
sudo systemctl disable [service-name]
# Memory exhaustion solutions:
# 1. Clear caches (safe operation)
sudo sync # Flush file system buffers
echo 3 | sudo tee /proc/sys/vm/drop_caches # Clear page cache, dentries, inodes
# 2. Identify and kill memory-hungry processes
# Check process memory growth over time:
while true; do ps aux --sort=-%mem | head -5; sleep 10; done
# 3. Add swap space (temporary solution)
sudo fallocate -l 2G /swapfile # Create 2GB swap file
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
# Disk I/O bottleneck solutions:
# 1. Identify I/O-heavy processes
sudo iotop -a # Accumulated I/O stats
# Kill or throttle I/O-intensive processes
# 2. Check for failing disk
sudo smartctl -a /dev/sda # Check disk health
dmesg | grep -i error # Check for disk errors
# 3. Optimize disk usage
# Move large files to different disk
# Clean up unnecessary files
sudo find /var/log -name "*.log" -size +100M -mtime +7
Prevention and monitoring:
# Set up monitoring alerts
# Add to crontab:
# */5 * * * * uptime | awk '{if($10>4.0) print "High load: " $0}' | mail -s "Load Alert" admin@domain.com
# Monitor disk space:
# 0 6 * * * df -h | awk '$5>80 {print "Disk usage alert: " $0}' | mail -s "Disk Alert" admin@domain.com
# Log performance data:
# */5 * * * * iostat -x 1 1 >> /var/log/iostat.log
Scenario 2: "High memory usage and system freezing"
Problem Description: System becomes unresponsive, applications crash with out-of-memory errors, frequent freezing.
What causes memory issues: Memory leaks, insufficient RAM, swap exhaustion, runaway processes, or memory-intensive applications.
Step-by-step diagnosis:
# Step 1: Memory status assessment
free -h # Current memory usage
cat /proc/meminfo | grep -E "(MemTotal|MemFree|MemAvailable|SwapTotal|SwapFree)"
# Calculate: Memory utilization = (MemTotal - MemAvailable) / MemTotal * 100
# Problem: >90% memory utilization
# Step 2: Identify memory consumption patterns
ps aux --sort=-%mem | head -20 # Top memory consumers
# Look for: processes with unusually high memory usage
# Note: RSS (Resident Set Size) values in MB
# Check for memory leaks (run multiple times):
for i in {1..5}; do ps aux --sort=-%mem | head -5; sleep 60; done
# Look for: steadily increasing memory usage in specific processes
# Step 3: System memory analysis
cat /proc/buddyinfo # Memory fragmentation info
# High fragmentation can cause allocation failures
slabtop # Kernel memory usage
# Look for: excessive kernel memory usage
# Step 4: Swap analysis
cat /proc/swaps # Active swap spaces
swapon -s # Swap usage summary
# Problem: swap usage >50% of total swap
vmstat 5 5 # Monitor memory and swap activity
# Look for: high 'si' (swap in) and 'so' (swap out) values
# Continuous swapping indicates memory pressure
Solutions for memory problems:
# Immediate memory relief:
# 1. Kill memory-intensive processes
sudo pkill -f [process-name] # Kill by process name
# Or use system monitor to identify and kill processes
# 2. Clear system caches
sudo sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
# 3. Emergency process termination
# Enable SysRq keys for emergency:
echo 1 | sudo tee /proc/sys/kernel/sysrq
# Then use Alt+SysRq+F to kill memory-hungry processes
# Long-term solutions:
# 1. Add more swap space
sudo fallocate -l 4G /swapfile2
sudo chmod 600 /swapfile2
sudo mkswap /swapfile2
sudo swapon /swapfile2
# Make permanent by adding to /etc/fstab:
echo '/swapfile2 none swap sw 0 0' | sudo tee -a /etc/fstab
# 2. Optimize swap usage
# Reduce swappiness (default is 60):
echo 10 | sudo tee /proc/sys/vm/swappiness # Temporary
echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.conf # Permanent
# 3. Find and fix memory leaks
# Monitor specific process memory over time:
watch -n 30 'ps aux | grep [process-name]'
# If memory continuously grows, process has memory leak
# Solutions: restart process, update software, report bug
Memory monitoring and prevention:
# Set up memory monitoring:
# Create monitoring script:
cat > /usr/local/bin/memory-monitor.sh << 'EOF'
#!/bin/bash
THRESHOLD=90
MEM_USAGE=$(free | awk 'NR==2{printf "%.0f", $3*100/$2}')
if [ $MEM_USAGE -gt $THRESHOLD ]; then
echo "Memory usage is $MEM_USAGE%" | mail -s "Memory Alert" admin@domain.com
ps aux --sort=-%mem | head -10 >> /var/log/memory-alert.log
fi
EOF
chmod +x /usr/local/bin/memory-monitor.sh
# Add to crontab:
# */10 * * * * /usr/local/bin/memory-monitor.sh
Disk Space and Storage Issues
Scenario 3: "No space left on device" errors
Problem Description: Applications fail with "No space left on device", system becomes unstable, cannot create new files.
What causes disk space issues: Log file growth, temporary file accumulation, large file downloads, database growth, or filesystem corruption.
Step-by-step diagnosis:
# Step 1: Identify full filesystems
df -h # Human-readable disk usage
df -i # Check inode usage (can be full even with disk space)
# Problem indicators:
# >95% disk usage on any filesystem
# >90% inode usage
# Step 2: Locate space consumers
du -h / 2>/dev/null | sort -hr | head -20 # Top 20 largest directories
# Focus on: /var, /tmp, /home, /opt
# Check common problem areas:
du -sh /var/log/* | sort -hr | head -10 # Large log files
du -sh /tmp/* 2>/dev/null | sort -hr | head -10 # Temporary files
du -sh /home/* | sort -hr | head -10 # User directories
# Step 3: Find large files
find / -type f -size +100M 2>/dev/null | head -20 # Files >100MB
find /var/log -name "*.log" -size +50M # Large log files
find /tmp -type f -mtime +7 # Old temporary files
# Step 4: Check for hidden space usage
lsof +L1 # Find deleted files still held open
# Deleted files held by processes still consume space
# Shows files with link count 0 but still open
# Step 5: Analyze specific directories
ncdu /var # Interactive disk usage analyzer
# If ncdu not available:
du -ah /var | sort -hr | head -50 # Alternative analysis
Solutions for disk space problems:
# Immediate space recovery:
# 1. Clean log files (SAFE - logs can be regenerated)
sudo find /var/log -name "*.log" -mtime +30 -size +10M -exec gzip {} \;
sudo find /var/log -name "*.gz" -mtime +90 -delete
sudo journalctl --vacuum-size=100M # Limit systemd journal size
# 2. Clean temporary files
sudo find /tmp -type f -mtime +7 -delete # Files older than 7 days
sudo find /var/tmp -type f -mtime +30 -delete # Older temporary files
# 3. Package manager cleanup
sudo apt autoremove # Remove unused packages
sudo apt autoclean # Clean package cache
# For yum/dnf:
sudo yum clean all # Clean package cache
# 4. Clear user caches
# For each user:
rm -rf /home/user/.cache/* # Browser and application caches
rm -rf /home/user/.thumbnails/* # Image thumbnails
# 5. Handle deleted files still open
lsof +L1 | awk '{print $2}' | tail -n +2 | sort -u | xargs -r sudo kill -HUP
# Send HUP signal to processes holding deleted files
# This forces them to close file handles
# Advanced space recovery:
# 1. Find and remove duplicate files
fdupes -r /home # Find duplicates (install: apt install fdupes)
# Manually review and remove duplicates
# 2. Compress old files
find /var/log -name "*.log" -mtime +7 -exec gzip {} \;
find /home -name "*.txt" -size +10M -mtime +90 -exec gzip {} \;
# 3. Move large files to different filesystem
mv /large/file /other/filesystem/location
ln -s /other/filesystem/location/large/file /original/location/large/file
Long-term disk space management:
# Set up automated cleanup:
# Create cleanup script:
cat > /usr/local/bin/disk-cleanup.sh << 'EOF'
#!/bin/bash
# Clean old log files
find /var/log -name "*.log" -mtime +30 -size +10M -exec gzip {} \;
find /var/log -name "*.gz" -mtime +90 -delete
# Clean temporary files
find /tmp -type f -mtime +7 -delete
find /var/tmp -type f -mtime +30 -delete
# Clean package caches
apt autoremove -y
apt autoclean
# Vacuum journal logs
journalctl --vacuum-time=30d
journalctl --vacuum-size=100M
# Log cleanup activity
echo "Disk cleanup completed at $(date)" >> /var/log/disk-cleanup.log
EOF
chmod +x /usr/local/bin/disk-cleanup.sh
# Schedule weekly cleanup:
# Add to crontab:
# 0 2 * * 0 /usr/local/bin/disk-cleanup.sh
# Set up disk space monitoring:
cat > /usr/local/bin/disk-monitor.sh << 'EOF'
#!/bin/bash
THRESHOLD=85
df -h | awk 'NR>1 {gsub(/%/,"",$5); if($5>THRESHOLD) print $0}' THRESHOLD=$THRESHOLD > /tmp/disk-alert
if [ -s /tmp/disk-alert ]; then
mail -s "Disk Space Alert" admin@domain.com < /tmp/disk-alert
fi
EOF
chmod +x /usr/local/bin/disk-monitor.sh
# Monitor every hour:
# 0 * * * * /usr/local/bin/disk-monitor.sh
Scenario 4: "Inode exhaustion - cannot create files despite free space"
Problem Description: Error "No space left on device" but df shows available space. Cannot create new files or directories.
What causes inode exhaustion: Too many small files, mail queues, temporary files, or filesystem design limitations.
Diagnosis and solution:
# Step 1: Confirm inode exhaustion
df -i # Show inode usage
# Look for: IUse% close to 100% on any filesystem
# Step 2: Find directories with many files
find / -xdev -type f | cut -d '/' -f 2 | sort | uniq -c | sort -nr | head -10
# Shows directories with most files
# Find directories with excessive small files:
find /var -type d -exec sh -c 'echo "$(ls -1A "{}" | wc -l) {}"' \; | sort -nr | head -20
# Step 3: Identify problem areas
find /var/spool -type f | wc -l # Mail queue files
find /tmp -type f | wc -l # Temporary files
find /var/cache -type f | wc -l # Cache files
# Step 4: Clean up small files
# Mail queue cleanup (if mail server):
sudo postqueue -f # Flush mail queue
sudo find /var/spool/postfix -type f -delete # Clear queue files
# Temporary file cleanup:
sudo find /tmp -type f -delete # Clear all temporary files
sudo find /var/tmp -type f -mtime +1 -delete # Clear old var temp files
# Cache cleanup:
sudo find /var/cache -type f -mtime +30 -delete # Old cache files
# Application-specific cleanup:
# PHP sessions:
sudo find /var/lib/php/sessions -type f -mtime +1 -delete
# Thumbnail cache:
find /home/*/.thumbnails -type f -mtime +30 -delete
Network Connectivity Issues
Scenario 5: "Cannot connect to network/internet"
Problem Description: No network connectivity, DNS resolution fails, services cannot connect to remote hosts.
Step-by-step network diagnosis:
# Step 1: Basic connectivity tests
ping -c 3 localhost # Test loopback
ping -c 3 127.0.0.1 # Test local IP stack
ping -c 3 $(ip route | grep default | awk '{print $3}') # Test gateway
ping -c 3 8.8.8.8 # Test external connectivity
ping -c 3 google.com # Test DNS resolution
# Results interpretation:
# Localhost fails: IP stack problem
# Gateway fails: Local network problem
# 8.8.8.8 fails: Internet connectivity problem
# google.com fails: DNS problem
# Step 2: Interface configuration check
ip addr show # Check interface status and IPs
ip route show # Check routing table
ip link show # Check physical interface status
# Look for:
# Interfaces in DOWN state
# Missing IP addresses
# Missing default route
# Wrong subnet configuration
# Step 3: DNS configuration check
cat /etc/resolv.conf # Check DNS servers
nslookup google.com # Test DNS resolution
dig @8.8.8.8 google.com # Test with specific DNS server
# Step 4: Service status check
systemctl status NetworkManager # Network management service
systemctl status systemd-networkd # Alternative network service
systemctl status systemd-resolved # DNS resolution service
Network problem solutions:
# Interface problems:
# 1. Restart network interface
sudo ip link set eth0 down
sudo ip link set eth0 up
# 2. Restart network services
sudo systemctl restart NetworkManager # Most desktop systems
sudo systemctl restart systemd-networkd # Server systems
# 3. Manual IP configuration (temporary)
sudo ip addr add 192.168.1.100/24 dev eth0 # Set IP address
sudo ip route add default via 192.168.1.1 # Set default gateway
# DNS problems:
# 1. Temporary DNS fix
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
echo "nameserver 1.1.1.1" | sudo tee -a /etc/resolv.conf
# 2. Flush DNS cache
sudo systemctl restart systemd-resolved # Systemd systems
sudo /etc/init.d/nscd restart # Systems with nscd
# DHCP problems:
# 1. Release and renew DHCP lease
sudo dhclient -r eth0 # Release lease
sudo dhclient eth0 # Request new lease
# 2. Check DHCP service
systemctl status isc-dhcp-server # If running DHCP server
journalctl -u NetworkManager # Check for DHCP errors
# Firewall issues:
# 1. Check if firewall is blocking
sudo iptables -L # List firewall rules
sudo ufw status # Ubuntu firewall status
# 2. Temporarily disable firewall (DANGEROUS - for testing only)
sudo ufw disable # Ubuntu
sudo systemctl stop iptables # CentOS/RHEL
Permission and Access Issues
Scenario 6: "Permission denied" errors
Problem Description: Cannot access files/directories, applications fail with permission errors, users cannot perform required operations.
Step-by-step permission diagnosis:
# Step 1: Identify the exact problem
ls -la /path/to/problem/file # Check file permissions
ls -ld /path/to/problem/directory # Check directory permissions
id # Check current user and groups
groups # Check group memberships
# Step 2: Understand permission structure
# Permission format: drwxrwxrwx
# d = directory, - = file
# First rwx: owner permissions
# Second rwx: group permissions
# Third rwx: other permissions
# r=read(4), w=write(2), x=execute(1)
# Step 3: Check ownership
stat /path/to/file # Detailed file information
find /path -not -user $(whoami) -ls # Files not owned by current user
find /path -not -group $(id -gn) -ls # Files not in current user's group
# Step 4: Check special permissions
find / -perm -4000 -ls 2>/dev/null # Files with setuid bit
find / -perm -2000 -ls 2>/dev/null # Files with setgid bit
find / -perm -1000 -ls 2>/dev/null # Directories with sticky bit
Permission problem solutions:
# File permission fixes:
# 1. Standard file permissions
sudo chmod 644 /path/to/file # Read/write for owner, read for others
sudo chmod 755 /path/to/directory # Full access for owner, read/execute for others
sudo chmod +x /path/to/script # Add execute permission
# 2. Recursive permission fixes
sudo find /var/www -type f -exec chmod 644 {} \; # Set file permissions
sudo find /var/www -type d -exec chmod 755 {} \; # Set directory permissions
# 3. Ownership fixes
sudo chown user:group /path/to/file # Change owner and group
sudo chown -R user:group /path/to/directory # Recursive ownership change
# Application-specific permission fixes:
# Web server files:
sudo chown -R www-data:www-data /var/www/html
sudo find /var/www/html -type f -exec chmod 644 {} \;
sudo find /var/www/html -type d -exec chmod 755 {} \;
# SSH key permissions:
chmod 700 ~/.ssh # SSH directory
chmod 600 ~/.ssh/id_rsa # Private key
chmod 644 ~/.ssh/id_rsa.pub # Public key
chmod 644 ~/.ssh/authorized_keys # Authorized keys
# Log file permissions:
sudo chown syslog:adm /var/log/*.log # Standard log ownership
sudo chmod 640 /var/log/*.log # Log file permissions
# Advanced permission troubleshooting:
# 1. Check ACLs (Access Control Lists)
getfacl /path/to/file # Show ACL permissions
# If ACLs are set, use setfacl to modify:
sudo setfacl -m u:username:rw /path/to/file # Give user read/write access
# 2. SELinux context (if SELinux enabled)
ls -Z /path/to/file # Show SELinux context
sudo restorecon -R /path/to/directory # Restore default SELinux contexts
Boot and System Recovery Issues
Scenario 7: "System won't boot"
Problem Description: System fails to start, gets stuck at boot, drops to emergency shell, or shows kernel panic.
Boot problem diagnosis and recovery:
# Boot process understanding:
# 1. BIOS/UEFI → 2. Bootloader (GRUB) → 3. Kernel → 4. Init system → 5. Services
# Recovery methods by boot stage:
# GRUB Recovery (bootloader problems):
# 1. Boot from rescue media/live USB
# 2. Mount root filesystem:
sudo mkdir /mnt/recovery
sudo mount /dev/sda1 /mnt/recovery # Replace with actual root partition
sudo mount --bind /dev /mnt/recovery/dev
sudo mount --bind /proc /mnt/recovery/proc
sudo mount --bind /sys /mnt/recovery/sys
# 3. Chroot into system:
sudo chroot /mnt/recovery
# 4. Reinstall GRUB:
grub-install /dev/sda # Replace with actual disk
update-grub # Regenerate GRUB configuration
# Kernel boot problems:
# 1. Boot with older kernel from GRUB menu
# 2. Check kernel logs:
dmesg | less # Kernel messages
journalctl -b # Boot logs
# 3. Common kernel boot fixes:
# Edit GRUB boot parameters (press 'e' in GRUB menu):
# Add: nomodeset # Disable graphics acceleration
# Add: single # Boot to single-user mode
# Add: init=/bin/bash # Boot to bash shell
# Filesystem corruption recovery:
# 1. Boot from rescue media
# 2. Check filesystem:
sudo fsck /dev/sda1 # Check and repair filesystem
sudo fsck -f /dev/sda1 # Force check
sudo fsck -y /dev/sda1 # Auto-answer yes to fixes
# 3. Mount and check critical files:
sudo mount /dev/sda1 /mnt
ls -la /mnt/etc/fstab # Check fstab for errors
ls -la /mnt/boot # Check boot files exist
System recovery procedures:
# Single-user mode recovery:
# 1. Boot to single-user mode (add 'single' to kernel parameters)
# 2. Remount root filesystem as read-write:
mount -o remount,rw /
# 3. Common single-user fixes:
passwd root # Reset root password
systemctl enable sshd # Enable SSH for remote access
vim /etc/fstab # Fix filesystem mount errors
# Emergency shell recovery:
# If system drops to emergency shell:
# 1. Check what caused the emergency:
systemctl --failed # Show failed services
journalctl -xb # Boot logs with explanations
# 2. Fix common issues:
systemctl reset-failed # Clear failed service states
systemctl daemon-reload # Reload systemd configuration
systemctl default # Try to boot to default target
# Rescue mode from installation media:
# 1. Boot from installation USB/DVD
# 2. Choose "Rescue" option
# 3. Mount existing installation
# 4. Common rescue operations:
# Fix broken packages:
apt --fix-broken install # Debian/Ubuntu
yum history undo last # CentOS/RHEL - undo last transaction
# Restore from backup:
# If you have backups in /backup:
rsync -av /backup/etc/ /etc/ # Restore configuration
rsync -av /backup/home/ /home/ # Restore user data
# Network boot recovery:
# 1. Configure network in rescue environment:
ip addr add 192.168.1.100/24 dev eth0
ip route add default via 192.168.1.1
echo "nameserver 8.8.8.8" > /etc/resolv.conf
# 2. Remote access for help:
systemctl start sshd
passwd root # Set password for SSH access
Scenario 8: "Services failing to start at boot"
Problem Description: System boots but essential services don't start, applications unavailable, missing functionality.
Service startup troubleshooting:
# Step 1: Identify failed services
systemctl --failed # List failed services
systemctl list-units --state=failed # Alternative command
# Step 2: Analyze specific service failures
systemctl status service-name # Detailed service status
journalctl -u service-name # Service-specific logs
journalctl -u service-name --since "today" # Today's logs only
# Step 3: Check service dependencies
systemctl list-dependencies service-name # Show service dependencies
systemctl show service-name # Show all service properties
# Step 4: Manual service testing
# Test service manually:
sudo /usr/bin/service-binary --test-config # Test configuration
sudo -u service-user /usr/bin/service-binary --foreground # Run in foreground
# Common service startup problems and fixes:
# Configuration file errors:
# 1. Check configuration syntax:
nginx -t # Nginx configuration test
apache2ctl configtest # Apache configuration test
sshd -t # SSH daemon configuration test
# 2. Fix configuration errors:
sudo vim /etc/service/config.conf # Edit configuration
# Common issues: syntax errors, wrong file paths, invalid options
# Permission problems:
# 1. Check service file permissions:
ls -la /etc/systemd/system/service-name.service
ls -la /usr/lib/systemd/system/service-name.service
# 2. Fix systemd service file:
sudo systemctl daemon-reload # Reload after editing service files
sudo systemctl enable service-name # Ensure service is enabled
# Dependency problems:
# 1. Start dependencies manually:
sudo systemctl start dependency-service # Start required services first
sudo systemctl enable dependency-service # Enable for future boots
# 2. Fix dependency order:
# Edit service file to add:
# After=network.target
# Requires=postgresql.service
# Resource problems:
# 1. Check available resources:
free -h # Memory availability
df -h # Disk space
ulimit -a # Resource limits
# 2. Adjust service resource limits:
# Edit service file:
# [Service]
# LimitNOFILE=65536
# MemoryLimit=1G
# User/group problems:
# 1. Check service user exists:
id service-user # Check if user exists
getent group service-group # Check if group exists
# 2. Create missing users:
sudo useradd -r -s /bin/false service-user # Create system user
sudo usermod -aG service-group service-user # Add to group
Application and Service Issues
Scenario 9: "Web server not responding"
Problem Description: Web server returns errors, timeouts, or refuses connections. Website unavailable or performing poorly.
Web server troubleshooting:
# Step 1: Basic connectivity test
curl -I http://localhost # Test local HTTP connection
curl -I https://localhost # Test local HTTPS connection
telnet localhost 80 # Test if port 80 is listening
telnet localhost 443 # Test if port 443 is listening
# Step 2: Check web server status
systemctl status apache2 # Apache status
systemctl status nginx # Nginx status
systemctl status httpd # Apache on CentOS/RHEL
# Step 3: Check web server processes
ps aux | grep apache2 # Apache processes
ps aux | grep nginx # Nginx processes
lsof -i :80 # What's using port 80
lsof -i :443 # What's using port 443
# Step 4: Analyze web server logs
tail -f /var/log/apache2/error.log # Apache error log
tail -f /var/log/nginx/error.log # Nginx error log
tail -f /var/log/apache2/access.log # Apache access log
journalctl -u apache2 -f # Systemd logs for Apache
# Common web server problems and solutions:
# Configuration errors:
# 1. Test configuration syntax:
sudo apache2ctl configtest # Apache configuration test
sudo nginx -t # Nginx configuration test
# 2. Common configuration issues:
# - Wrong document root path
# - Syntax errors in virtual host configuration
# - SSL certificate problems
# - Module loading issues
# Fix configuration errors:
sudo vim /etc/apache2/sites-available/000-default.conf # Apache default site
sudo vim /etc/nginx/sites-available/default # Nginx default site
sudo systemctl reload apache2 # Reload after config changes
sudo systemctl reload nginx # Reload Nginx config
# Permission problems:
# 1. Check web directory permissions:
ls -la /var/www/html # Web root permissions
# Should be: owner=www-data, group=www-data, permissions=755 for directories, 644 for files
# 2. Fix web permissions:
sudo chown -R www-data:www-data /var/www/html
sudo find /var/www/html -type d -exec chmod 755 {} \;
sudo find /var/www/html -type f -exec chmod 644 {} \;
# Resource exhaustion:
# 1. Check server resources:
free -h # Memory usage
df -h # Disk space
iostat -x 1 5 # I/O performance
# 2. Check Apache/Nginx worker processes:
# Apache MPM status:
apache2ctl status # Shows current connections and workers
# If unavailable, enable mod_status
# Nginx status:
curl http://localhost/nginx_status # If nginx status module enabled
# 3. Adjust worker configuration:
# Apache: Edit /etc/apache2/mods-available/mpm_prefork.conf
# Nginx: Edit /etc/nginx/nginx.conf worker_processes and worker_connections
# Port conflicts:
# 1. Check what's using web ports:
sudo lsof -i :80 # HTTP port usage
sudo lsof -i :443 # HTTPS port usage
sudo ss -tulpn | grep ":80\|:443" # Alternative port check
# 2. Resolve conflicts:
sudo systemctl stop conflicting-service # Stop conflicting service
# Or change port configuration in web server config
# SSL/TLS certificate issues:
# 1. Check certificate validity:
openssl x509 -in /path/to/certificate.crt -text -noout # Certificate details
openssl x509 -in /path/to/certificate.crt -dates -noout # Expiration dates
# 2. Test SSL configuration:
openssl s_client -connect localhost:443 # Test SSL connection
curl -k https://localhost # Test ignoring certificate errors
# 3. Common SSL fixes:
# - Renew expired certificates
# - Fix certificate chain issues
# - Correct private key permissions (600)
# - Update cipher suites for security
# Database connectivity issues:
# 1. Test database connection:
mysql -u dbuser -p -h localhost # Test MySQL connection
psql -U dbuser -h localhost dbname # Test PostgreSQL connection
# 2. Check database service:
systemctl status mysql # MySQL service status
systemctl status postgresql # PostgreSQL service status
# 3. Web application database errors:
# Check application logs:
tail -f /var/log/apache2/error.log | grep -i database
# Common issues: wrong credentials, database server down, connection limits
Scenario 10: "Database performance issues"
Problem Description: Database queries are slow, applications timeout, database locks, or connection errors.
Database troubleshooting methodology:
# MySQL/MariaDB Performance Issues:
# Step 1: Check database service status
systemctl status mysql # Service status
systemctl status mariadb # MariaDB status
mysqladmin ping # Test database responsiveness
mysqladmin status # Basic database statistics
# Step 2: Check database connections
mysqladmin processlist # Show current connections and queries
# Look for: long-running queries, locked queries, too many connections
# Step 3: Monitor database performance
mysqladmin extended-status | grep -E "(Threads_connected|Threads_running|Slow_queries)"
# Threads_connected: current connections
# Threads_running: actively running queries
# Slow_queries: queries taking longer than long_query_time
# Step 4: Check database logs
tail -f /var/log/mysql/error.log # MySQL error log
tail -f /var/log/mysql/slow.log # Slow query log (if enabled)
journalctl -u mysql -f # Systemd logs
# Database performance analysis:
# 1. Enable slow query log (temporarily):
mysql -u root -p -e "SET GLOBAL slow_query_log = 'ON';"
mysql -u root -p -e "SET GLOBAL long_query_time = 2;" # Log queries >2 seconds
# 2. Analyze slow queries:
mysqldumpslow /var/log/mysql/slow.log # Summarize slow query log
# Look for: frequently slow queries, queries without indexes
# 3. Check database engine status:
mysql -u root -p -e "SHOW ENGINE INNODB STATUS\G" | less
# Look for: deadlocks, lock waits, buffer pool efficiency
# Common database fixes:
# 1. Kill long-running queries:
mysql -u root -p -e "SHOW PROCESSLIST;" # Get process IDs
mysql -u root -p -e "KILL [process_id];" # Kill specific query
# 2. Optimize database configuration:
# Edit /etc/mysql/my.cnf or /etc/mysql/mysql.conf.d/mysqld.cnf
# Key parameters to adjust:
# innodb_buffer_pool_size = 70% of available RAM
# max_connections = adjust based on load
# query_cache_size = 32M (for read-heavy workloads)
# tmp_table_size = 32M
# max_heap_table_size = 32M
# 3. Database maintenance:
mysql -u root -p -e "ANALYZE TABLE database.table;" # Update table statistics
mysql -u root -p -e "OPTIMIZE TABLE database.table;" # Defragment table
mysqlcheck -u root -p --auto-repair --all-databases # Check and repair all tables
# PostgreSQL Performance Issues:
# Step 1: Check PostgreSQL status
systemctl status postgresql # Service status
pg_isready # Test database readiness
psql -U postgres -c "SELECT version();" # Test connection and version
# Step 2: Monitor active connections and queries:
psql -U postgres -c "SELECT * FROM pg_stat_activity;" # Current connections
# Look for: long-running queries (state != 'idle'), blocked queries
# Step 3: Check PostgreSQL logs:
tail -f /var/log/postgresql/postgresql-*-main.log # Main log file
journalctl -u postgresql -f # Systemd logs
# Step 4: PostgreSQL performance analysis:
# 1. Check database statistics:
psql -U postgres -c "SELECT * FROM pg_stat_database;" # Database stats
psql -U postgres -c "SELECT * FROM pg_stat_user_tables;" # Table statistics
# 2. Find slow queries:
# Enable logging in /etc/postgresql/*/main/postgresql.conf:
# log_min_duration_statement = 1000 # Log queries >1 second
# log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
# 3. Common PostgreSQL fixes:
# Kill long-running queries:
psql -U postgres -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state != 'idle' AND query_start < NOW() - INTERVAL '5 minutes';"
# Vacuum and analyze tables:
psql -U postgres -d database -c "VACUUM ANALYZE;" # Update statistics and clean up
# Monitor locks and blocks:
psql -U postgres -c "SELECT * FROM pg_locks WHERE NOT granted;" # Blocked queries
System Security Issues
Scenario 11: "Suspected security breach or malware"
Problem Description: Unusual system behavior, unexpected network connections, high resource usage, or suspected unauthorized access.
Security incident response:
# Step 1: Immediate assessment
# Check currently logged-in users:
who # Current users
w # Detailed user activity
last | head -20 # Recent login history
lastb | head -10 # Failed login attempts
# Check unusual processes:
ps aux --sort=-%cpu | head -20 # High CPU processes
ps aux --sort=-%mem | head -20 # High memory processes
ps aux | grep -E "(nc|netcat|ncat|socat|telnet|ssh|ftp)" # Network tools
# Check network connections:
ss -tupln # All listening ports
ss -tuap # All connections with process info
lsof -i # Files/processes using network
# Step 2: Check for unauthorized changes
# File integrity checking:
find /etc -type f -mtime -1 # Recently modified config files
find /bin /sbin /usr/bin /usr/sbin -type f -mtime -7 # Recently modified binaries
find / -name "*.sh" -type f -mtime -1 2>/dev/null # Recent shell scripts
# Check for suspicious files:
find /tmp -type f -executable # Executable files in /tmp
find /var/tmp -type f -executable # Executable files in /var/tmp
find /dev/shm -type f 2>/dev/null # Files in shared memory
# Step 3: Analyze system logs
# Authentication logs:
grep -i "failed\|invalid\|break-in" /var/log/auth.log | tail -50
grep -i "accepted\|session opened" /var/log/auth.log | tail -20
# System logs:
journalctl --since "24 hours ago" | grep -i -E "(fail|error|attack|intrusion)"
dmesg | grep -i -E "(kill|attack|exploit)"
# Web server logs (if applicable):
tail -1000 /var/log/apache2/access.log | grep -E "(POST|PUT|DELETE)" | grep -v "200"
tail -1000 /var/log/nginx/access.log | awk '$9 >= 400' # HTTP error codes
# Step 4: Network security analysis
# Check for suspicious connections:
netstat -an | grep ESTABLISHED | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr
# Look for: unusual IP addresses, high connection counts
# Check DNS queries (if logging enabled):
tail -1000 /var/log/syslog | grep -i dns # DNS queries
# Look for: suspicious domain names, DNS tunneling attempts
# Step 5: Malware detection
# Install and run security scanners:
sudo apt update && sudo apt install clamav clamav-daemon
sudo freshclam # Update virus definitions
sudo clamscan -r /home /tmp /var/tmp # Scan common infection areas
# Rootkit detection:
sudo apt install rkhunter chkrootkit
sudo rkhunter --update
sudo rkhunter --check # Comprehensive rootkit scan
sudo chkrootkit # Alternative rootkit scanner
Security hardening and recovery:
# Immediate security measures:
# 1. Change all passwords:
sudo passwd root # Root password
passwd # Current user password
# Force password change for all users:
sudo chage -d 0 username # Force password change on next login
# 2. Disable/remove suspicious users:
sudo usermod -L username # Lock user account
sudo userdel -r username # Remove user and home directory
# 3. Secure SSH access:
# Edit /etc/ssh/sshd_config:
# PermitRootLogin no
# PasswordAuthentication no
# PubkeyAuthentication yes
# Port 2222 # Change default port
sudo systemctl restart sshd
# 4. Configure firewall:
sudo ufw enable # Enable UFW firewall
sudo ufw default deny incoming # Block all incoming by default
sudo ufw allow ssh # Allow SSH
sudo ufw allow 80/tcp # Allow HTTP if needed
sudo ufw allow 443/tcp # Allow HTTPS if needed
# 5. Install intrusion detection:
sudo apt install fail2ban # Install fail2ban
sudo systemctl enable fail2ban
sudo systemctl start fail2ban
# Configure fail2ban (/etc/fail2ban/jail.local):
cat > /etc/fail2ban/jail.local << 'EOF'
[DEFAULT]
bantime = 3600
findtime = 600
maxretry = 3
[sshd]
enabled = true
port = ssh
logpath = /var/log/auth.log
maxretry = 3
EOF
sudo systemctl restart fail2ban
# 6. System hardening:
# Disable unnecessary services:
systemctl list-units --type=service --state=running
sudo systemctl disable service-name # Disable unused services
# Set proper file permissions:
sudo chmod 700 /root # Secure root home
sudo chmod 600 /etc/shadow # Secure password file
sudo chmod 644 /etc/passwd # User database
# Update system:
sudo apt update && sudo apt upgrade -y # Update all packages
sudo apt autoremove # Remove unused packages
# 7. Monitoring and logging:
# Enable process accounting:
sudo apt install acct
sudo systemctl enable acct
sudo systemctl start acct
# Set up log monitoring:
sudo apt install logwatch
# Configure logwatch to email daily reports
# File integrity monitoring:
sudo apt install aide
sudo aideinit # Initialize database
sudo aide --check # Check for changes
Scenario 12: "System crashes and kernel panics"
Problem Description: System randomly crashes, kernel panic messages, system freezes, or unexpected reboots.
Crash analysis and troubleshooting:
# Step 1: Gather crash information
# Check system logs for crash details:
journalctl -b -1 # Previous boot logs
journalctl --since "24 hours ago" | grep -i -E "(panic|oops|bug|crash|segfault)"
dmesg | grep -i -E "(panic|oops|bug|crash)" # Kernel messages
# Check crash dumps (if configured):
ls -la /var/crash/ # Ubuntu crash dumps
ls -la /var/lib/systemd/coredump/ # Systemd core dumps
# Step 2: Hardware diagnostics
# Memory testing:
sudo memtester 1G 3 # Test 1GB RAM, 3 passes
# Note: This requires free memory, may need to run from single-user mode
# Check hardware logs:
dmesg | grep -i -E "(hardware|temperature|thermal|mce|edac)"
# Look for: overheating, memory errors, hardware failures
# CPU and temperature monitoring:
sensors # Hardware sensors (install lm-sensors)
# Look for: high temperatures, fan failures
# Hard drive health:
sudo smartctl -a /dev/sda # SMART data for /dev/sda
sudo smartctl -t short /dev/sda # Run short self-test
# Look for: reallocated sectors, pending sectors, hardware errors
# Step 3: Software analysis
# Check for problematic drivers/modules:
lsmod # Loaded kernel modules
dmesg | grep -i -E "(module|driver)" | tail -20
# Check recent software changes:
cat /var/log/dpkg.log | tail -50 # Recent package changes (Debian/Ubuntu)
cat /var/log/yum.log | tail -50 # Recent package changes (CentOS/RHEL)
# Step 4: System stability testing
# Stress testing (install stress-ng):
sudo apt install stress-ng
stress-ng --cpu 4 --timeout 300s # CPU stress test for 5 minutes
stress-ng --vm 2 --vm-bytes 1G --timeout 300s # Memory stress test
# Monitor during stress test:
watch -n 1 'dmesg | tail -10' # Watch for kernel messages
watch -n 1 'sensors' # Monitor temperatures
# Common crash solutions:
# 1. Memory issues:
# Boot with reduced memory to test:
# Add to kernel parameters: mem=2G # Limit to 2GB RAM
# Replace or remove faulty RAM modules
# 2. Overheating:
# Clean dust from fans and heat sinks
# Check thermal paste on CPU
# Improve case ventilation
# Reduce CPU frequency temporarily:
echo 'conservative' | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# 3. Driver issues:
# Boot with older kernel version
# Blacklist problematic modules:
echo 'blacklist module_name' | sudo tee -a /etc/modprobe.d/blacklist.conf
# 4. Filesystem corruption:
# Boot from rescue media and run:
sudo fsck -f /dev/sda1 # Force filesystem check
sudo fsck -c /dev/sda1 # Check for bad blocks
# 5. Power supply issues:
# Check power supply stability
# Monitor voltage levels with hardware tools
# Test with different power supply if available
Recovery and Backup Procedures
Scenario 13: "Data recovery after accidental deletion"
Problem Description: Important files or directories were accidentally deleted, need to recover data before it's permanently lost.
Data recovery procedures:
# IMPORTANT: Stop writing to the affected filesystem immediately!
# Every write operation reduces chances of successful recovery
# Step 1: Assess the situation
df -h # Check available space on filesystems
mount | grep -E "(ext[234]|xfs|btrfs)" # Identify filesystem types
lsof | grep deleted # Find processes with deleted files still open
# Step 2: Check for simple recovery options
# Look in trash/recycle bin:
ls -la ~/.local/share/Trash/files/ # User trash (GNOME/KDE)
ls -la ~/.trash/ # Alternative trash location
# Check backup locations:
ls -la /home/.backup/ # User backups
ls -la /var/backups/ # System backups
ls -la /backup/ # Common backup location
# Step 3: Recover from deleted but open files
# If process still has file open:
lsof | grep deleted | grep filename # Find process with deleted file
# Copy from /proc filesystem:
sudo cp /proc/[PID]/fd/[FD] /path/to/recovery/location
# Step 4: Filesystem-specific recovery
# For ext2/ext3/ext4 filesystems:
sudo apt install extundelete # Install recovery tool
# Unmount filesystem (if possible):
sudo umount /dev/sda1 # Replace with actual partition
# Recover specific file:
sudo extundelete /dev/sda1 --restore-file path/to/deleted/file
# Recover all deleted files:
sudo extundelete /dev/sda1 --restore-all
# Recovered files go to ./RECOVERED_FILES/
# For more complex recovery:
sudo apt install testdisk photorec # Advanced recovery tools
sudo photorec # Recover files by type
sudo testdisk # Recover partitions and boot sectors
# Step 5: Advanced recovery with ddrescue
# Create disk image first (safer):
sudo apt install gddrescue
sudo ddrescue /dev/sda1 /path/to/recovery.img /path/to/recovery.log
# Then work on the image instead of original disk
# Step 6: Alternative recovery methods
# Using grep to find file content in raw disk:
sudo grep -a -C 500 "unique string from file" /dev/sda1 > recovered_data.txt
# Search for text that was in the deleted file
# Using strings to extract readable text:
sudo strings /dev/sda1 | grep -A 10 -B 10 "known file content" > recovered.txt
Backup strategy implementation:
# Implement comprehensive backup strategy:
# 1. Daily incremental backups:
cat > /usr/local/bin/daily-backup.sh << 'EOF'
#!/bin/bash
BACKUP_DIR="/backup/daily"
SOURCE_DIRS="/home /etc /var/www"
DATE=$(date +%Y%m%d)
# Create backup directory
mkdir -p "$BACKUP_DIR"
# Create incremental backup
rsync -av --link-dest="$BACKUP_DIR/latest" $SOURCE_DIRS "$BACKUP_DIR/$DATE/"
# Update latest symlink
rm -f "$BACKUP_DIR/latest"
ln -s "$DATE" "$BACKUP_DIR/latest"
# Clean old backups (keep 30 days)
find "$BACKUP_DIR" -maxdepth 1 -type d -mtime +30 -exec rm -rf {} \;
# Log backup completion
echo "Backup completed at $(date)" >> /var/log/backup.log
EOF
chmod +x /usr/local/bin/daily-backup.sh
# Schedule daily backup:
echo "0 2 * * * /usr/local/bin/daily-backup.sh" | crontab -
# 2. Database backups:
cat > /usr/local/bin/db-backup.sh << 'EOF'
#!/bin/bash
BACKUP_DIR="/backup/databases"
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p "$BACKUP_DIR"
# MySQL backup:
mysqldump --all-databases --single-transaction --routines --triggers > "$BACKUP_DIR/mysql_$DATE.sql"
gzip "$BACKUP_DIR/mysql_$DATE.sql"
# PostgreSQL backup:
sudo -u postgres pg_dumpall > "$BACKUP_DIR/postgresql_$DATE.sql"
gzip "$BACKUP_DIR/postgresql_$DATE.sql"
# Clean old database backups (keep 14 days)
find "$BACKUP_DIR" -name "*.sql.gz" -mtime +14 -delete
echo "Database backup completed at $(date)" >> /var/log/backup.log
EOF
chmod +x /usr/local/bin/db-backup.sh
# 3. System configuration backup:
cat > /usr/local/bin/config-backup.sh << 'EOF'
#!/bin/bash
BACKUP_DIR="/backup/config"
DATE=$(date +%Y%m%d)
mkdir -p "$BACKUP_DIR"
# Backup system configuration
tar -czf "$BACKUP_DIR/etc_$DATE.tar.gz" /etc/
tar -czf "$BACKUP_DIR/boot_$DATE.tar.gz" /boot/
# Backup package lists:
dpkg --get-selections > "$BACKUP_DIR/package_list_$DATE.txt"
apt-mark showmanual > "$BACKUP_DIR/manual_packages_$DATE.txt"
# Clean old config backups (keep 60 days)
find "$BACKUP_DIR" -name "*.tar.gz" -mtime +60 -delete
find "$BACKUP_DIR" -name "*.txt" -mtime +60 -delete
echo "Configuration backup completed at $(date)" >> /var/log/backup.log
EOF
chmod +x /usr/local/bin/config-backup.sh
# 4. Remote backup synchronization:
cat > /usr/local/bin/remote-sync.sh << 'EOF'
#!/bin/bash
REMOTE_HOST="backup-server.example.com"
REMOTE_USER="backup"
LOCAL_BACKUP="/backup"
REMOTE_BACKUP="/remote-backup/$(hostname)"
# Sync to remote server
rsync -av --delete-after "$LOCAL_BACKUP/" "$REMOTE_USER@$REMOTE_HOST:$REMOTE_BACKUP/"
# Log sync completion
if [ $? -eq 0 ]; then
echo "Remote sync completed successfully at $(date)" >> /var/log/backup.log
else
echo "Remote sync failed at $(date)" >> /var/log/backup.log
fi
EOF
chmod +x /usr/local/bin/remote-sync.sh
# 5. Backup verification:
cat > /usr/local/bin/verify-backups.sh << 'EOF'
#!/bin/bash
BACKUP_DIR="/backup"
# Check if backup directories exist and contain recent files
if [ ! -d "$BACKUP_DIR/daily/latest" ]; then
echo "ERROR: Daily backup missing" | mail -s "Backup Alert" admin@domain.com
fi
# Check backup age
LATEST_BACKUP=$(find "$BACKUP_DIR" -type f -name "*.tar.gz" -o -name "*.sql.gz" | head -1)
if [ -n "$LATEST_BACKUP" ]; then
BACKUP_AGE=$(find "$LATEST_BACKUP" -mtime +2)
if [ -n "$BACKUP_AGE" ]; then
echo "WARNING: Backup is older than 2 days" | mail -s "Backup Alert" admin@domain.com
fi
fi
# Test backup integrity
find "$BACKUP_DIR" -name "*.tar.gz" -exec tar -tzf {} \; > /dev/null 2>&1
if [ $? -ne 0 ]; then
echo "ERROR: Backup integrity check failed" | mail -s "Backup Alert" admin@domain.com
fi
EOF
chmod +x /usr/local/bin/verify-backups.sh
# Schedule all backup scripts:
# 0 2 * * * /usr/local/bin/daily-backup.sh # Daily at 2 AM
# 0 3 * * * /usr/local/bin/db-backup.sh # Daily at 3 AM
# 0 4 * * 0 /usr/local/bin/config-backup.sh # Weekly at 4 AM on Sunday
# 0 5 * * * /usr/local/bin/remote-sync.sh # Daily at 5 AM
# 0 6 * * * /usr/local/bin/verify-backups.sh # Daily at 6 AM
Emergency Procedures and System Recovery
Final Emergency Recovery Checklist
When all else fails - Emergency System Recovery:
# 1. Boot Recovery Environment:
# - Boot from USB/DVD installer
# - Choose "Rescue Mode" or "Try Ubuntu/Live Mode"
# - Open terminal
# 2. Mount and Access System:
sudo fdisk -l # Identify system partitions
sudo mkdir /mnt/system
sudo mount /dev/sda1 /mnt/system # Mount root partition
sudo mount /dev/sda2 /mnt/system/boot # Mount boot partition (if separate)
sudo mount --bind /dev /mnt/system/dev
sudo mount --bind /proc /mnt/system/proc
sudo mount --bind /sys /mnt/system/sys
sudo chroot /mnt/system # Enter system environment
# 3. Critical System Repairs:
# Fix boot loader:
grub-install /dev/sda
update-grub
# Fix filesystem corruption:
fsck -f /dev/sda1 # Force filesystem check
fsck -y /dev/sda1 # Auto-repair filesystem
# Reset root password:
passwd root
# Fix SSH access:
systemctl enable ssh
systemctl start ssh
# 4. Data Rescue Priority:
# Copy critical data first:
cp -r /home/user/important /backup/location
cp -r /etc /backup/location
cp -r /var/www /backup/location
# 5. System Restoration:
# Restore from backups:
rsync -av /backup/location/ /restored/location/
# Reinstall system packages:
apt update
apt install ubuntu-desktop # Or appropriate package group
# 6. Preventive Measures Post-Recovery:
# Implement monitoring:
apt install htop iotop nethogs
# Set up automated backups
# Configure system monitoring
# Document what went wrong and how it was fixed
This comprehensive troubleshooting guide provides systematic approaches to diagnosing and resolving the most common Linux system problems. Each scenario includes step-by-step procedures, command explanations, and prevention strategies to help maintain system stability and recover from various failure modes.