Challenge: Time wasted troubleshooting
PennWell’s publishing application allows authors to post content to the hundreds of online publications that PennWell supports. To serve PennWell’s customers, this application must perform at a very high level – if a web page is slow or fails to respond, customers file complaints with support. For Ben Hofmann, Manager of Web Development at PennWell, troubleshooting these issues was increasingly difficult due to a lack of visibility into the production environment.
“When we received a complaint from support, the infrastructure team would put the log files and thread dumps in one place for us, and then restart the server,” Hofmann said.
But with gigs of logs to review, it would often take half a day before the architects could begin troubleshooting the problem. Hofmann noted, “The lack of visibility into production and QA meant that we were always in a reactive mode. We were half a step behind our end users because we don’t have the staff to sit around and watch 110 websites all day.”
Furthermore, troubleshooting performance problems often ate up valuable development time. “We’re an agile shop,” Hofmann said. “We usually release code into production once or twice a week. But outages and fixes interrupt our sprints and impact productivity.”