Trouble shooting performance issues is one of the real differences between an operational DBA (what we used to call Jr. DBA's) and a Sr. DBA.
Even with all the new tools, there is always that support call....It's been working fine for the past 5 months, and today the application is performing poorly, and of course it's a database issue. At this point i'm usually biting my lip and silently protesting, but often they are right, so it's time to dig into it.
What's the error ? Answer: We don't know
What's the issue ? Answer: We don't know, but it's slow.
These issues are particularly troublesome. Poorly written error handlers in the application, code developed by off-site vendors that support personnel don't understand, we've all seen the issues...AND now it's your job to fix it!
We have so many more tools now with SQL Server 2005 and 2008, that it becomes important to know which tool to use when, so the problem can be solved quickly. I even have a PDF document, "Troubleshooting SQL Server 2005/2008 Query Performance/Scalability Issues" that we used our plotter printer on 3x4 paper just to help.
I start in with profiler, as I want to try and see if there are any client side errors, since it seems that the applicaiton is not performing any error handling or trapping any errors. Sure enough we find an error on violation of a unique index, PUNT, back to the application team for more analysis; but why after 5 months would this just now come up (of course there have been no changes.)
We finally find an issue with a stored procedure that seems to be hanging or timing out and then progressing to the next step in the application causing the unique index error, since we aren't trapping for errors on the client side and profiler can't find client side time-outs, it was particulary bothersome to find. Even the procedure seems ok when executed through management studio and returns in sub-seconds, but something happens when the application runs it (still haven't figured this out).
Finally determine there is a degree of parallelism issue (DOP), which we found through analyzing the wait types. The wait types was not on our first list of items to check. Changing the DOP from 0 to 2, immediatly resolved the issue and dropped the CPU from 40% to 15%. Utlimately some indexing would help the query, but we can't build indexes on the fly as the load is so heavy that even a table lock for 10-15 seconds would generate 100's of support calls; DOP can be done on the fly with little effect.
Analysis of the query plans shows that as the data grew the DOP changed the query plan and began to cause issues with a wait type of CXPACKET.
This is why we still need real DBA's.