- Newest
- Most votes
- Most comments
When troubleshooting server bottlenecks on AWS EC2 instances, you can combine your Linux expertise with AWS-specific tools for a comprehensive approach:
-
System-level monitoring: Continue using familiar Linux tools like
top
,htop
,free -m
,vmstat
,iotop
, andiftop
to monitor CPU, memory, disk I/O, and network usage from within the instance. -
CloudWatch metrics: AWS CloudWatch provides metrics for EC2 instances including CPU utilization, network throughput, and disk operations. However, memory usage isn't included by default.
-
CloudWatch agent: Install the CloudWatch agent to collect and monitor detailed system-level metrics, including memory usage, which isn't available in the default EC2 metrics. This is particularly useful for identifying memory constraints.
-
Log analysis: Check system and application logs (
/var/log/syslog
,/var/log/dmesg
, application-specific logs) for clues about performance issues. -
Resource constraints: For T-series instances (like t2.micro, t3.micro), monitor CPU credit usage as these are burstable instances with baseline performance.
-
Instance sizing: Verify if your workload is appropriate for the instance type. Running resource-intensive applications on smaller instances can lead to performance issues.
-
EC2Rescue tool: For more serious issues, AWS provides EC2Rescue which can help diagnose and troubleshoot problems. This can be run manually or automatically using AWS Systems Manager Automation with the
AWSSupport-ExecuteEC2Rescue
runbook. -
Swap file: If memory is the bottleneck, consider adding a swap file, though this is a temporary solution and might impact performance.
Common bottleneck scenarios include:
- Instances appearing to run normally but becoming unresponsive (often memory-related)
- Periodic high system load despite normal-looking metrics (could be application-specific tasks or I/O bottlenecks)
- Late response times despite low CPU/memory usage (possibly network latency or application code issues)
If issues persist after your investigation, consider upgrading to a larger instance type or optimizing your application code and resource usage.
Sources
Troubleshoot issues with Amazon EC2 instances - Amazon Elastic Compute Cloud
Run the EC2Rescue tool on unreachable instances - AWS Systems Manager
EC2 Ubuntu instance stops working everyday | AWS re:Post
EC2 Server Late response issue | AWS re:Post
AWS EC2 T2 Medium Performance Issue | AWS re:Post
Step-by-Step EC2 Bottleneck Troubleshooting Guide
-
Start with AWS CloudWatch CloudWatch provides system-level metrics even if you don’t log into the instance: CPUUtilization – if it's >80% for long periods, CPU is likely a bottleneck. DiskReadOps / DiskWriteOps – high values might indicate I/O issues. NetworkIn / NetworkOut – check for bandwidth saturation. StatusCheckFailed – shows instance-level issues (hardware or networking). Note: Enable detailed monitoring (1-minute granularity) if it's disabled.
-
Check OS-Level Metrics (inside EC2 Linux) From your Linux background: top, htop, vmstat, iostat, free -m, df -h → CPU, memory, swap, disk I/O usage. netstat, ss, iftop, nethogs → Network traffic analysis. Example: top -o %MEM # Sort by memory usage iotop # Real-time I/O usage (if installed) dstat # All-in-one overview (needs to be installed)
-
Enable EC2 Instance-Level Diagnostics Install CloudWatch Agent to push memory and disk metrics to CloudWatch. sudo yum install amazon-cloudwatch-agent sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard Log stream monitoring with CloudWatch Logs (optional but recommended).
Consider SSM Agent for access without SSH.
-
Review EC2 Instance Type vs Workload If resource usage is high: Are you using the right instance family (compute-optimized, memory-optimized, storage-optimized)? Would burstable (T series) behavior be a limiting factor? Check CPU Credit Balance.
-
Check EBS Performance If your app is I/O heavy: Is EBS volume gp2 or gp3? gp2 has burst behavior, check VolumeReadOps/WriteOps and BurstBalance. Upgrade to gp3/io1/io2 for more consistent IOPS.
-
Use AWS Compute Optimizer (Free Tool) This can tell you if the instance is over/under-provisioned based on recent metrics.
-
Capture a Performance Snapshot If troubleshooting something transient: Create a CPU profile (e.g., perf, flamegraph, py-spy for Python apps). Use dstat or sar to log metrics over time.
Relevant content
- asked 3 years ago