Challenges of testing Down-scaling of systems
I once came across an unexplainable phenomena. We where in the middle of a POC, very tight schedule, and it was evening Thuresday. We decided to leave the server 'on', and not inject any transactions or events, and come back Saturday evening to continue working on it. When, after only a few hours later (11 or 12 I believe), we found out that the system carshed!
That was the fist time I saw a big system fall so hard, when not dealing with anything, just in idle state. regardless to say, that we have included from that day forward an 'Idle state test' in the regression of every release.
It brings us to the point of asking do we know to test downscaling systems? We always ask how to test up scaling ones, but the downscaling is a big issue as well. Systems are 'used' to high communication and high volume of events, and are exercising daemons, loggers, and other means to make sure everything is 'alive and kicking', but seldom do we see big systems testing trying to simulate small scale traffic.
What other issues are to take into consideration in downscaling testing?
- log mechanism
- synchronous things that should happen vs. asynchronous ones
- shooting 'by requests' processes and or quesries and or reports after long time