PERFORMANCE TUNING ¶

Table of Contents

PERFORMANCE TUNING

When performance tuning, it’s not done in a vacuum, OS, or Hardware, or Process. You debug the bottlenecks of the system as a whole until you narrow down the cause.

System performance bottlenecks can be categorized as follows:

I/O Issues ¶

Disk I/O

If it takes us a long time to read, this affects everything, including below where I reference page faults

Read Disk Queue Length

Read time

If it takes us a long time to write data, we normally hold write locks, which ties up OTHER processes as well.

Write Disk Queue Length

Write time

factors

disk speed - ie SSD vs Legacy Spinning drive.

If spinning drive, speed of drive

7200 RPM vs 10k RPM

amount of on disk cache

SCSI controller caching - write through vs actual caching.

If NAS or SAN

utilization of fabric connection

is your fiber switch bandwidth consumed?

do you have clean optics?

What RAID strategy is in use?

Some are less effective heavy writes (like RAID 5)

Some are more effective for fast reads (striping)

Memory I/O

Usually not really an issue except on older systems or when working larger datasets

Usually this is tied to Memory speed and Memory bus bandwidth - which is close coupled with CPU memory bandwidth as they work together.

Some architectures do this better - mostly newer is better

Big Iron, aka mainframes have MASSIVE I/O capabilities.

Capacity issues ¶

Memory Capacity

Without enough memory, you have a lot of page faults, which hurts performance

page fault is when the process has to pull something in from swap/disk virtual memory into physical memory.

page faults/sec

how often we find that what we need is not currently in physical memory

Processor Capacity

If we have a limited processor capacity you will see a lot of context switches, as it roundrobins between processes to give them a CPU slice

Process level context switches are expensive from a performance standpoint.

If you have a lot of things that need to run concurrently, better to have a multiprocessor system (which is pretty much the norm now)

CPU Utilization

if you have every clock cycle packed with instructions and you don’t have any more clock cycles, you’re done.

If your system under a normal load runs at 88% CPU Utilization, you don’t have much overhead left if you need to really make it a busy system for a big job.

You probably need more cores.

Programming/Process issues ¶

Use of blocking I/O

It’s better to have I/O run in a separate thread or thread pool rather than have it tied to main program execution thread as most I/O is blocking.

by blocking I mean the program basically halts until the I/O is finished before it continues.

Waiting on Network I/O

is the network slow? So will be your I/O, and if it’s blocking I/O even worse.

Thread Contention/locks

Concurrent programming requires thread synchronization and lock/unlock/notify mechanisms to access shared objects. If this is done poorly, you can get thread contention, where your processes are basically just sitting waiting in line to access an object for longer than needed, or in worst case, indefinitely - and that is called thread deadlock.

Programming paradigm

parallel/concurrent programming with multiple threads allows your processes to utilize mutliple CPU cores and take advantage of that horsepower that is available.

legacy apps often aren’t multithreaded well, or at all. This means you can have a 24 core system, and the OS can schedule your thread to run on any one of those procs, but just one. All processing is done in one thread, so execution is linear, not concurrent.

Use of on-disk files as caching mechanisms

bad idea from an I/O time perspective

bad idea from a disk life expectancy standpoint.

SSD life expectancy for instance is a finite number of writes/state transitions.

The less you write, the longer its life.

The use of ill-suited data structures

for example parsing a flatfile line by line works fine if it’s a 200kB file, not so much with a 4.5 GB file