Tools & Consulting for the webMethods® platform

Tools & Consulting for the webMethods® platform

Running webMethods Integration Server in containers

Organizations are moving from traditional deployments to containers and Kubernetes to increase business agility. This article discusses a lot of topics that are relevant to actually reach the goal of faster delivery of value to the business through software. Because if you do it wrong, you just end up with yet another technology stack. And, as always, technology is not your biggest challenge.

It sometimes seems these days, that there is absolutely no way around containers and container orchestration tools like Kubernetes or OpenShift. And there are obviously good reasons for it. In this article I want to look at some aspects specifically from the perspective of running webMethods Integration Server in containers.

My goal is to provide a foundation for discussing things in more detail. There are always specifics that need consideration and sometimes they indeed change the picture completely. But without an overview as a starting point for such a discussion, things get more complicated than they need to be.

I am aware that some points that I present as rather black-and-white can be seen differently. I do this because of past experience. When I started in this field, many things we take for granted now didn’t exist or were considerably less sophisticated. So when I experienced the benefit they brought (like CI and good version control systems) to my life, I became a strong proponent.

While this is overall a rather technical article, let me emphasize here that the main question is not a technical one. Instead it is how to increase business agility and deliver better software faster. Ultimately those are the decision drivers and we need to be aware of them.

Not everybody needs containers

While containers and their orchestration platforms solve a lot of issues, they also bring their own complexities. Those are considerable, given that these concepts and tools emerged from the context of running hundreds of thousands of application instances. This is not the workload that most organizations have to deal with. Therefore, you should really think long and hard whether or not this road brings you a benefit, other than running a cool piece of technology.

The main alternative path may be Infrastructure-as-Code (IaC) to automate things within traditional virtual machines. In addition to being much simpler on a technical level, it is also easier to adopt from an organizational perspective.

Problems you need to solve before

You cannot make the third step before the first and expect good results. In other words, there is a bunch of topics that need to be addressed properly(!) before moving towards containers is even an option. Otherwise you will end up with a complexity monster and have a rather unpleasant live. That is especially true for the operations people who now have to deal with this mess.

If I had to come up with “the single biggest challenge”, it would certainly be manual steps during and after deployment. Containers and their orchestration platforms (like Kubernetes) are built on the premise of complete automation. So if you have not totally automated deployment already, you need to do this first.

Other areas are

  • CI/CD: Continuous Integration and Continuous Delivery
  • DevOps: How to organize the collaboration of development and operations
  • SDLC: the overall Software Development Life-Cycle
  • Deployment automation
  • Configuration management
  • etc.

While this is an extensive list, the good thing is that a lot of tooling (much of it open source) and easily available knowledge exists. In that respect the situation is considerably better than 15 years ago. So nothing stops you to get started quickly and with little to no direct investment. All you need is time. 

But I want to start with containers anyway

So what to do, if you don’t have full deployment automation yet? Many organizations are in this position and just saying “no” will typically be a career-limiting move. This means that you have to find a way that gets you started reasonably quickly. At the same time you must be careful to not shoot yourself into the foot.

Ok, but don't do this

Let’s start with what I would strongly advise against. It is the one piece where taking a shortcut carries the biggest risk. I am talking about the need to perform manual activities after the container instance has been started. This is fundamentally at odds with running containers. Because the core aspect of using Kubernetes is to give control about when an instance is started or stopped out of your hands and hand it over to the container orchestration platform.

Yes, you can probably come up with an approach to circumvent this. So it is possible to imagine a combination of Kubernetes and manual changes after deployment. But that only works under very specific circumstances. To determine whether or not this is the case for you, requires expert knowledge of Kubernetes and containers, but also about the application – in this case webMethods Integration Server. However, this is only the technical side. And as usual that is the smaller problem.

It is much more important to fully understand the needs of the business. What does it mean for the business side, if you cannot achieve zero-downtime, because someone has to update a few settings manually? Even worse: It affects not only the actual processing but also the ingress. In other words you need to coordinate with the team that manages your load balancer.

And then there are the future needs of the business. Because they determine what you will have to implement. The problem here is that nobody, including the business, knows what lies ahead. And that is ok, because nobody can look into the future. But it is a dangerous basis, or probably even a non-basis, for making technical decision today.

My recommendation

So on the implementation side we have already identified your priority 1 goal: get rid of all post-deployment manual activities.

The other activity you should start with right away is on the conceptual level. You need to understand in sufficient detail what other activities are involved. This means things how to use your VCS (version control system). Hint: If you want to deliver high-quality results in the shortest possible time, you must do trunk-based development. That means no feature branches and multiple commits per day for everybody. There is a video from Dave Farley, who is one of inventors of Continuous Delivery that looks into this in more detail.

Other important things are the use of a binary repository for the deployment artefacts. These allow you to checksum-control your packages to protect against malicious changes. In certain industries this has been a requirement for decades and I guess it will sooner or later affect you as well. And even if not, it is a really good idea to have something like this in place.

Which, by the way, also helps to prove what version of the code was in production when. Without that ability it is next to impossible to diagnose errors. Plus it is usually a compliance requirement to have proof-level documentation about this.

Then there is the area of test automation. What kind of tests do you need to do? What environments are needed for this? How do you ensure that all critical parts of the software are tested without boiling the ocean? Hint: Test coverage metrics are important, but never(!) use them as a KPI. People will play the system and come up with loads of easy-to-write but worthless tests. And it is not the people who are to blame here, but the system.

The important part is not to understand all those things in detail right now. But you need to know what areas exist that need to be worked on. The typical flow is:

  • Developer with personal environment, completely isolated from others
  • Version Control System (usually Git today) that serves as the central sync point
  • CI server that gets triggered from the VCS when changes occur
    • Build and unit-test code
    • Upload build to binary repo (while you can use Git/GitHub here, it is fundamentally different from its VCS role !!)
    • Build container image and upload to local registry
    • Trigger further test stages using the container image
  • Additional tests (all with automated deployment to the respective environment)
    • Integration
    • UI
    • Performance
    • etc.
  • Move-to-production and support

That is only a very rough overview and it needs tailoring to your organization. But it should give you a first impression about the various parts that we are talking about. To understand all the nitty gritty details is not needed here. Instead focus on the conceptual side and how it fits into the overall SDLC (software development life-cycle). The latter is really critical.

It should also be noted that the approach is fundamentally the same for containers and traditional deployments. So you can mix and match, and also work with a hybrid model. In other words, while some applications will be deployed using containers, others stay with virtual machines. You can also use the same common packages across both types.

What is different with containers?

In addition to understanding the SDLC and setting up the tool chain, there is one more critical part: Containers are different from virtual machines. I am not talking about the infrastructure level, where containers share the same operating system instance, while virtual machines are completely separate. What I mean is that the application needs to be designed and built differently in some respects.

Self-sufficient

Containers are, amongst other things, a different software delivery mechanism and that needs to be taken into account. The aim here is to have a container that is as self-sufficient as possible. It should contain everything needed to perform its duties. It is ok, if you need to specify some credentials to an external database management system. But that should pretty much be it.

The most problematic anti-pattern for Integration Server is the auto-deploy functionality of Microservices Runtime. It allows you to place a number of packages into a directory and during startup they will be deployed. While this looks convenient, it comes with various downsides. On the practical side, you will most likely violate several compliance rules. These days it is a common requirement, which auditors also check, that you can proof what software version was running in production when.

On the conceptual side it violates the idea of the container being self-sufficient. Which is also the problem with mounting packages into the container. Technically it is a bit different from auto-deployment, but the core problem is the same: The behavior of your container instance is not deterministic.

When contemplating the idea to mount packages into containers, many people like the idea of a shared folder for all instances. It seems like a convenient way to ensure that the same version of a package is used in all cluster nodes. That is true. But it also means that you can never do a rolling upgrade. Instead every time, you need to take down the whole cluster.

Another implication of this approach is that you cannot run different versions of your package easily. (Of course you can have multiple shared folders, but is that still an easy solution?) Your initial reaction may now be “Yes, that is exactly what I want”. But there is a number of scenarios where it makes a lot of sense or is even necessary to have different versions of the same package. Think about the following situations (just a few as examples, there are certainly more):

  • Migration scenarios in general
  • A/B testing
  • Running diagnostics in production (yes, sometimes you cannot avoid that)
  • Providing multiple, incompatible versions of the same API
  • Moving trading partners gradually from one interface version to another (usually a variation of the scenario before, but often comes with additional complexities due to the external nature of the API client)
  • Controlled roll-back to earlier version

Overall, the concept of being self-sufficient is one of the biggest value propositions of containers. The ideas discussed in this section have the appeal to look simple at a first glance. But they violate the concept of self-sufficiency, are therefore actually the opposite, and increase complexity enormously.

Configuration and run-time data

One crucial difference with containers is that their file system is ephemeral. You cannot persist anything in there in a reliable way, similar to the /tmp directory in Unix or Linux that gets cleaned up at the next reboot. In addition any change that is done while the container instance is running, increases the size of the file system in a way that it never gets smaller. Even if you delete something, the nature of the underlying file system layering means that you cannot shrink it. Sooner or later your file system is filled up and all hell will break loose then.

So you need to be aware of the different types of data, when it comes to the question whether or not they will change. Then the non-static data need to be placed into dedicated location in the file system, which are mapped to volumes. That is storage, which is external to the container image and also the running instance. It is usually underpinned by NFS shares or something similar.

Many programs are somewhat sloppy when it comes to the location where configuration and run-time data are stored. Integration Server falls into that category to an extent as well (this writing is based on v10.15 and future versions may be different). It doesn’t do a terrible job, but a little bit of extra work is needed.

I have heard people say that Integration Server in its current form is fundamentally unsuitable for containers and I do not share this view. Yes, some things could be easier or more obvious. But on the other hand we are looking at an architecture that is over 25 years old and is still highly suitable for the most demanding workloads. When it was devised, most Intel servers had one single-core CPU with less than 500 MHz and the memory was typically 256 MB or less. VMWare didn’t exist at all and almost nobody (outside the mainframe community) even knew what containers were. The fact that today we can make such an architecture work on Kubernetes with minor adjustments, says a lot about how well designed it is.

But back to the practical side: What we aim for can be compared to the classical Unix file system. There configuration is stored in the /etc directory, binaries are in /usr, and run-time data in /var . This is a bit simplified, but in a nutshell something like that is what we need.

There is a number of things with Integration Server that need to be looked at here:

  • $IS_HOME/config directory
  • $IS_HOME/logs directory
  • Packages’ ./config directories
    • Ports
    • Web service endpoint aliases
  • Adapter connections and listeners (also see blog post on updating connection details)

The $IS_HOME/logs directory is an obvious candidate for using a volume. It will definitely contain a relatively large amount of data that also need to be preserved, analyzed, etc. For the others it is a bit more nuanced. While there is the possibility that they can be changed during run-time, those changes neither require preservation, nor do they pose an issue in terms of disk space.

To adapt the various settings to a certain environment, different approaches are needed. Some configurations are relatively easy to change from the outside, others require more work. This is a topic in its own right and also specific to the overall situation. I will therefore not go into further details here, but if you have specific questions, please let me know in the comments.

Generic relative to its job

A container image, from which an arbitrary number of instances will be spawned, is by definition generic. This is relative to its purpose, so it means different things for different types of application. A database management system (e.g. PostgreSQL or Microsoft SQL Server) can work with data from an ERP system, a CRM system, or machine telemetry from the shop floor; and that for any industry imaginable. Similarly, a GitLab instance can be used for any kind of software development. It doesn’t care about the programming language you use, or what kind of program you develop.

Integration Server (and comparable software) is a bit more difficult to place in this context. Because of its nature you can draw the line in different places, and not all of them are a good idea. You could be tempted to use a plain image and map all packages in (or even worse: use the auto-deploy feature of MSR). That is like taking a base image of Linux and doing the same with your database server. To stay with that comparison: The plain Integration Server image is your “operating system”, and your integration logic is conceptually the same as PostgreSQL.

This answers the question about environments: A container image should never be environment-specific. As a workaround it is possible to start that way. But it will not be a pleasant experience sooner or later. Because in reality it is not a single image that gets multiplied by the number of your environments. Even with a single application, you have multiple versions that need to be kept for compliance, as a fall-back in case of problems with the new release, etc.

The main problem here is not the storage capacity you need, since even with enterprise-grade systems the prices have gone down enough. Plus containers are relatively efficient with de-duplication. Your real problem is complexity. It comes from the multitude of images that you have to understand. You also have a high probability of manual steps in your deployment, because otherwise  this whole situation would probably not exist in the first place.

Complexity reduces business agility, while containers are usually introduced to increase it. 

What should go into a container image?

You need to start thinking of Integration Server not as this one big platform, on which many packages are deployed that do very different things. Imagine instead what you would do on a traditional virtual machine, if more and more integrations are needed. Sooner or later you will start to split things up. There will be installations that only handle B2B communication with the external world. Other deployments handle logistics and warehousing. And again others integrate ERP and CRM with in-house custom applications.

The same approach is needed for containers, only a bit more “extreme”. You no longer group integrations, which are basically a specific kind of application, by certain criteria (see the examples in the paragraph above). Instead you have one container image per integration/application. This will greatly reduce the complexity per container image. And from an operations point of view you have a much better visibility about what is going on.

Some people will now say “but now I have so many different images to deal with, how do I gain anything overall?” The gain comes from the fact that, due to the reduced complexity, you can automate things a lot easier. Remember: The main technical advantage of containers is that they allow a unified approach to run totally different applications, with the specifics of those applications encapsulated into containers. The savings in resource consumption are typically a nice by-product, but not the biggest lever. (This may be different, if you run millions of containers, but few organizations do that.)

With those conceptual aspects addressed, let’s go back to the practical side and look at the image building process. To make our lives easier (again, through reducing complexity) we should layer the creation of container images. This approach is an explicit feature and we are well advised to use it as much as possible. We will then typically end up with a hierarchy of images similar to this:

  • OS image: Base Linux image
  • Platform image: Integration Server incl. standard packages
  • Corporate image: Company-wide additions like helper packages, logging, etc.
  • Department image: Specific to the organizational unit
  • Application image: The actual integration logic and the only image that ever runs in production 

The list above is generic and probably some changes are needed for your organization. But it should give you a starting point for discussions.

In closing

I could hopefully convey the many fascinating and challenging aspects of moving towards containers with Integration Server. Much of the content is broadly applicable and not limited to Integration Server. Or to turn this around: You can leverage a lot of your existing knowledge about containers. And if you start your container journey with Integration Server, much of it will be useful for other systems.

Let me summarize the core points that I see. They are primarily on the business and organizational level, since those are the foundation for the technical side.

  • Understand your business requirements and check whether containers are the best way to address them. There are other ways as well and they often have less complexity.
  • Moving to container orchestration will have an impact on your organizational structure. Also, it will require learning new concepts and tools for all people involved. Be sure you want to take on that burden.
  • Be clear about dependencies in your work streams. Only things that are independent of each other, can be worked on in parallel or by separate people without a lot of synchronization effort. That is your secret sauce for scalability. 
  • Remove the need for synchronization between people/groups wherever possible. It is the by far biggest reason for delays and misunderstandings. A good process has well-defined and easy-to-use interfaces. And I explicitly mean working across organizational boundaries, like teams. Having sync meetings every day usually means that your setup is not good.
  • Start working on the handover points/interfaces between process areas or steps. Example: Integration Server packages should be stored as ZIP files with the Java code compiled in  the binary repository. From there they can be retrieved by the container image build tool. If right now you cannot create and upload those ZIP files to the binary repo, then do it manually. But do not copy them around on the file system to feed the container image build tool; this would mean to ignore the interface.
  • Work backwards. I would start with the build of container images from the local file system. Then move your Integration Server packages to the binary repo. Then automate the creation and upload of those packages with a CI server.

You can see from the lenght of this post alone, that it is a complex subject. To get a feeling for how much more there is, just look around on YouTube or Google.

If all this is new to you, it may also be a good idea to get external help for some aspects. I am not a fan of huge consulting projects, because it usually means that you pay someone else to learn things, rather than learn them yourself. But it makes sense for me to selectively have someone to ask the right questions to speed things up and avoid common pitfalls.

If you want me to write about other aspects of this topic, please leave a comment or send an email to info@jahntech.com. The same applies if you want to talk how we at JahnTech can help you with your project.

© 2024 by Christoph Jahn. No unauthorized use or distribution permitted.

Share:

Facebook
Twitter
Pinterest
LinkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *

On Key

Related Posts

CI/CD learning environment for webMethods Integration Server

A key aspect of my upcoming online course about CI/CD for webMethods Integration Server is the automated creation of a proper environment. You don’t need to deal with all the underlying details, but can fully concentrate on learning CI/CD. This article describes the environment and explains the rational behind it.

Structuring packages in webMethods Integration Server

While you can start simple with just one package, this soon becomes a problem. Here is real-world advice how to structure your code base. This reduces complexity, makes your solution much easier to maintain, and reduces the risk of bugs. And as a special bonus, you end up with a real integration platform.

CI/CD for webMethods Integration Server

CI/CD is a great way to deliver better software faster. So many people are interested how to do this with Integration Server. Here are the core points from my perspective that will get you started.

Microservices and code reuse for integration platforms

Today I want to write about a software development approach I was able to put into practice a few years ago. I was myself very much surprised how well things worked out, and especially how fast I was able to see tangible benefits for the business side.

Custom logging with Log4j on Integration Server

Every serious IT system needs proper logging. This article describes how I use Log4j v2 with webMethods Integration Server. There is no need to change anything on the system, just install the package and it works.