I had a call today with a customer who is implementing several complex Dynamics GP integrations. One of the integrations involves the following steps:
1. A SmartConnect map imports a large AP Invoice batch once per day
2. Once the batch is fully imported, SmartConnect inserts a database record telling Post Master Enterprise that the batch is ready to post
3. Post Master Enterprise posts the AP Invoice batch
4. Once the batch is finished posting, Post Master Enterprise automatically fires a stored procedure where the customer will have custom SQL script
5. The stored procedure will verify the batch posted successfully and then call a SQL command to trigger a second SmartConnect map
6. SmartConnect will then import AP Manual Payments and apply the payments to the posted AP Invoice batch
7. Once SmartConnect is done importing the Payments and Applications, it will then insert another database record telling Post Master to post the AP Manual Payment batch
In theory, this process will work just fine, as both SmartConnect and Post Master can be configured to work together to coordinate in this manner.
In any system integration with many steps or processes, there is a chance that one of the steps may not work properly, may encounter an error, may fail, may not fire, etc.
And in any system integration that is synchronous and tightly coupled like this one, any such problem in the chain of events can prevent the rest of the process from completing. What if in Step 2, the SmartConnect map has a problem telling Post Master that the batch is ready to post? What if in Step 3, Post Master finds that the Fiscal Period in Dynamics GP is closed, and it is unable to post the batch? What if the batch fails to post due to a data error? What if 682 out of 700 Invoices post successfully, but 18 fail to post due to invalid distributions? What if the custom SQL script in Step 4 encounters an error or table lock that prevents it from completing successfully? What if SmartConnect encounters an error applying the Manual Payments to the posted Invoices?
In isolation, each of these issues is relatively small, but collectively, there are probably at least a dozen such minor problems that could potentially prevent the entire sequence from completing successfully. Between the 7 plus different steps in this system integration and the dozen plus potential errors that can cause a failure, I would anticipate that this integration will have some reliability issues over time.
But that's okay. That's what happens with complex systems--you often have equally complex failures and reliability challenges.
So how do you deal with this? How do you manage such a complex integration? And how do you increase reliability?
While speaking with the customer, a few things came to mind.
1. Error notifications. Both SmartConnect and Post Master Enterprise can be configured to send notifications in case of an error. As soon as a problem occurs, an email should be sent to a power user or administrator that has the tools to resolve the problem.
2. Proactive monitoring. Sometimes problems occur, but notifications don't get sent or received, or are simply missed in the torrent of email that we all receive daily. To supplement the error notifications, a monitoring job can be created that independently checks the status of the process. Did the Invoices get imported by 1pm? Did the Invoices get posted by 2pm? Did the Manual Payments get imported and applied by 3pm? Did the Manual Payments get posted by 4pm? Each of these checks is just a simple SQL query against the Dynamics GP company database, and these checks serve as a second layer of notification in case there is a problem or delay.
3. Asynchronous design. In my experience, synchronous, tightly coupled system integrations tend to be less reliable than asynchronous, loosely coupled integration. So an integration could potentially be modified or redesigned to decouple one or more tasks in the chain, adopting more of a queue based integration rather than a real time integration. In this particular integration, that would likely be challenging and would require a redesign.
4. Integration "Supervisor". Rather than SmartConnect and Post Master working independently and simply handing off messages to each other, a "Supervisor" process or solution could be utilized that manages, or supervises, the entire process. The Supervisor asks SmartConnect to run a map, monitoring that task until the import is complete. It then hands a request to Post Master, monitoring that task until the posting is complete. Rather than having to monitor two different applications and get notifications from each, this central supervisor service could potentially reduce the administration and monitoring of the overall process.
While these steps may help manage the process, they won't directly resolve the issues that cause failures or reliability issues in each step. To increase reliability, I think you have to target the specific failures and eliminate the causes. Increase data validation, add error handling, design each step to be able to proceed if some errors occur, and also utilize diagnostics to quickly track down the causes of errors.
There isn't a single answer for managing complex system integrations or improving reliability, but I think it is always a good idea to approach such complex integrations with an understanding of the potential errors and failures and an expectation that some additional measures may be required to increase reliability.
What have you done to make your complex system integrations easier to manage? Have you used any techniques to increase overall reliability?