News from JahnTech View online
JahnTech JahnTech
webMethods Tools & Consulting
Newsletter #2

Thanks a lot for the wonderful reactions to the first newsletter. Apart from making me feel happy about the decision to start this, there is a far more important point: You confirmed that it provides value to you.

Bringing valuable content to you is, after all, the primary goal here. Which is why I am always happy for ideas from you what should be in the newsletter. One thing was already brought forward: You would like to know a bit more about what is going on "behind the scenes". Please check out the respective section below.

With that let's jump into the content and I wish you all a great week.

Computer keyboard
Performance tuning for webMethods Integration Server

A topic that comes up time and again is performance. For me it is one of the most fascinating things, because it has so many facets. And while it can look daunting at first, with a structured approach there is nothing to fear.

Here is what to do in case of performance issues with your solution that runs on webMethods Integration Server

  • The biggest lever is almost always your own code. So instead of blaming other parts, start looking there.
  • On the flip-side, do not start with generic tuning (e.g. Garbage Collection). This will likely not get you more than 10-20%, and this requires that you really know what you are doing.
  • More is not always better, especially when it comes to parallelism. Increasing the number of parallel threads can(!) help up to a point. But is no silver bullet. And there is always a threshold, above which you get worse performance.
  • Understand how you test. Profiling single invocations allows you to improve exactly that. But is that really how your application runs? If you have 100 parallel threads in production, that is how you need to test as well. Some folks say "I start single-threaded and switch to real-world load later". Don't do this, it is a complete waste of time.
  • You need to know the different types of CPU load. A 100% utilization does not necessarily mean that you CPU is really busy. It can also be that it has to wait for disk or network I/O.
  • Even with virtualization you still need to be able to monitor the underlying physical hardware at some point. If the latter is over-provisioned a 100% CPU load in your VM is not really 100% but what is left from the other VMs.
  • While hardware has become a lot cheaper than 60 years ago, there is still a point when you need to spend developer time to make your code more "hardware-friendly". Martin Thompson has come up with the term "mechanical sympathy" for this. You can work with the hardware or against it, your choice.
  • You almost always work with a database. While you don't need to become a vendor- certified DBA, you must know how to do at least a basic performance analysis. Also, understanding how "expensive" a certain operation can be is necessary. The same goes for stuff like indices and the data model in general.
  • Expect to spend most of your time finding the root cause of the issue. In many cases the fix will need considerably less time. A typical exception is when you have a data structure that needs to be changed.

This is just scratching the surface. Overall you need to able to think in layers so that you dissect things and make your way from the symptom to the root cause.

What should we learn from the CrowdStrike disaster?

In the video linked below, Dave Farley talks about the "meta-level" of what we should learn as an industry from the recent global outage, caused by the Falcon security product from CrowdStrike.

Apart from the overall danger that comes with running software on the kernel-level, we simply need to assume that sooner or later something will go wrong.

The latter may be because of a programming error. So assuming I am the developer, that is something I need to try really hard to avoid it. But it is the relatively easy part.

When we take a broader perspective, this is not only about proper software engineering. Instead my view is that we are talking about BCM (business continuity management). And that comprises much more:

  • The software that I am responsible for
  • The rest of the computer system (hard-/software)
  • Environmental conditions like possible power outages
  • Recovery/roll-back mechanisms on the technical level
  • Worst case: Alternative processes if all else fails

There is certainly more to say, but that is a good starting point.

My "mantra": Be paranoid, assume failure, and have multiple levels of resilience (provided the consequences of failure warrant all this).

Behind the scenes

Over the next couple of weeks there will be a strong focus on CI/CD as a topic. So if there are particular aspects you would like to see covered, please let me know. This can be about the conceptual side (e.g. what are the underlying reasons why CI/CD is so powerful), the communication part (how to sell CI/CD internally), or the technical side.

Specifically I am currently looking into ways how to provide a "learning environment". You need a number of server instances (Integration Server, CI server, binary repository) and they can be set up in very different ways. Do you have a preference here? Please get in touch, if you would like to provide input.

Quick links

Here are some curated links that might be interesting for you:

JahnTech, Inhaber Christoph Jahn
Nussbaumallee 61, D-64297 Darmstadt, Germany