Monday, November 21, 2016

The value of proactive integration logging and error notifications

By Steve Endow

Logging is often an afterthought with Dynamics GP integrations.  Custom integrations often do not have any logging, and while integration tools may log by default, they often log cryptic information, or log lots of events that are not meaningful to the user.

A few years ago I developed a custom RESTful JSON web service API for Dynamics GP that would allow a customer to submit data from their PHP based operational system to Dynamics GP.  They needed to save new customer information and credit card or ACH payment information to Dynamics GP, and they wanted to submit the data in real time.

I originally developed the integration with my standard logging to daily text log files.  While application logging purists (yes, they do exist) would probably criticize this method, my 12+ years of experience doing this has made it very clear that the simple text log file is by far the most appropriate solution for the Dynamics GP customers that I work with.  Let's just say that Splunk is not an option.

The GP integration worked great, and the logging was dutifully working in the background, unnoticed.  After a few months, the customer observed some performance issues with the GP integration, so I enhanced the logging to include more detailed information that would allow us to quickly identify performance issues and troubleshoot them.  In addition to enhancing the detail that was logged, I added some proactive measures to the logging.  I started tracking any delays in the GP integration, which were logged, and I added email notification in case any errors or delays were encountered.

The logging has worked very well, and has allowed us to identify several very complex issues that would have been impossible to diagnose without detailed, millisecond level logging.

Today there was a great demonstration of the value of the integration logging, and more importantly, the proactive nature of the error notification process.

This is an email that was sent to the finance department at 9:18am Central time.  It notifies the users that an error has occurred, the nature of the error, and recent lines from the log to help me quickly troubleshoot the issue.  The user won't be able to understand all of the details, but they will know within seconds that there was a problem, and they will see the customer record that had the problem.


Subject: GP Web Service - Registration Error - PROD

The Dynamics GP Web Service encountered the following errors on 11/21/2016 9:18:22 AM: 

SubmitRegistration for customer Acme Supply Co exceeded the timeout threshold: 15.61 seconds

Here are the most recent lines from the log file:

11/21/2016 09:18:06.505: 10.0.0.66 SubmitRegistration called for customer Acme Supply Co (client, Credit Card)
11/21/2016 09:18:06.505: (0.00) SubmitRegistration - ValidRegistrationHMAC returned True
11/21/2016 09:18:06.505: (0.00) RegistrationRequest started for customer Acme Supply Co
11/21/2016 09:18:06.739: (0.22) ImportCustomer returned True
11/21/2016 09:18:06.786: (0.28) InsertCustomerEmailOptions returned True
11/21/2016 09:18:22.43:        (15.53) Non-Agency ImportAuthNet returned True
11/21/2016 09:18:22.121: (15.60) Non-Agency ImportAzox returned True
11/21/2016 09:18:22.121: (15.60) RegistrationRequest completed
11/21/2016 09:18:22.121: (15.61) SubmitRegistration - RegistrationRequest returned True
11/21/2016 09:18:22.121: WARNING: SubmitRegistration elapsed time: 15.61


Just by glancing at the email, I was able to tell the customer that the delay was due to Authorize.net.  The log shows that a single call to Authorize.net took over 15 seconds to complete.  This pushed the total processing time over the 10 second threshold, which triggers a timeout error notification.

Subsequent timeout errors that occurred throughout the morning also showed delays with Authorize.net.  We checked the Authorize.net status web page, but there were no issues listed.  We informed the client of the cause of the issue, and let them know that we had the choice of waiting to see if the problems went away, or submitting a support case with Authorize.net.

The client chose to wait, and sure enough, at 10:35am Central time, Authorize.net posted a status update on Twitter about the issue.


That was followed by further status updates on the Authorize.net web site, with a resolution implemented by 11:18am Central time.


Because of the proactive logging and notification, the customer knew about an issue with one of the largest payment gateways within seconds, which was over an hour before Authorize.net notified customers.

We didn't have to panic, speculate, or waste time trying fixes that wouldn't resolve the issue (a sysadmin reflexively recommended rebooting servers).  The users knew of the issue immediately, and within minutes of receiving the diagnosis, they were able to adjust their workflow accordingly.

While in this case, we weren't able to directly resolve the issue with the external provider, the logging saved the entire team many hours of potentially wasted time.

So if you use integrations, particularly automated processes, meaningful logging and proactive notifications are essential to reducing the effort and costs associated with supporting those integrations.



You can also find him on Google+ and Twitter







No comments: