IBM’s AS/400 was introduced in 1988, when PC servers didn’t exist yet, at least not like we know them since roughly the mid 1990s. As a mid-range system the AS/400 is less powerful than a mainframe, but way above all the Intel-based PC technology available back then.
For a long time the AS/400 used to be a combination of specific hardware and operating system. Today it is an OS, called “IBM i” running on IBM PowerPC servers. For more details, check the IBM product page. Since IBM had changed the name a couple of times, in my experience many people still use the initial “AS/400”, which I will also do in this article.
There are a number of AS/400 topics, that I will not address here. They were not relevant as part of the project and/or I was not involved deeply enough to write something meaningful here.
Scenario and environment
The project, which is the basis for this post, was done in the mid-2000s, which means that hardware was substantially less powerful than today. Especially the amount of RAM and the absence of SSDs had a severe impact in some situations. This not only meant that the same operation would last longer than today. More importantly some approaches were simply impossible.
One quick example for the latter is what was considered a reasonable maximum size for an Oracle database on somewhat normal server hardware. With a cluster of Sun Fire E25K and each node equipped with 72 CPUs and 1 TB of RAM, you could probably have done something similar to what is possible today on a machine that costs around 50k Euros. But at the time on typical server hardware you would try hard to keep the amount of data below 10 GB.
That is of course a generalization, but it should illustrate the boundaries. As to the financial side, I have not found a price for the aforementioned Sun machine at the time. Yet I’m pretty certain that “only” 1 of them would have cost at least a 6-digit-figure amount of money. And since Oracle was usually licensed per core, you can imagine what 72 dual-core CPUs would have added.
Project goal
We were tasked to replace an existing system that basically routed flat files (some up to around 5-6 GB) between machines. The transport was handled by a message broker, roughly comparable to something like ApacheMQ. The actual processing was done by scripts (shell and Perl, if I remember correctly).
The fact that we had to “only” replace an existing system “and add some minor enhancements” created some friction at the start of the project. This has nothing to do with the AS/400 but is worth mentioning, because it had a considerable impact on the “atmosphere”. The initial specification I got was literally one and a half pages long and the customer’s expectation was that it should be sufficient. After some back and forth they accepted the need for a specification, which took between 2 and 3 months to complete and had close to 200 pages.
Challenges
Below I will go into the details of the biggest challenges we faced. As an overall theme I would say that it was the sum of those aspects that made things really difficult. Yes, individually the various points also had their “interesting sides”. But it was really their coming together, often in quite unexpected ways, that turned some of my hair grey ;-).
File size
One of the core issues with the old system was that it had no notion of priorities and was also single-threaded. So a file that was only needed in 3 days and took 6 hours to process, could block another file that was needed in 2 hours and would be completed in 5 minutes. So a high-priority file would have to wait for hours until the blocking (and low-priority) file was completed. This caused massive problems with the clients of our customer, and sometimes even cost a lot of money.
In order to compensate for such situations we had to implemented something like a fast-track route for processing high-priority files. It meant that there was a dedicated “thread pool” in the system only for such urgent files. Processing of regular files would continue in parallel, while resource consumption from there would be limited.
Also, we had to be able to process multiple files in parallel. That is not such a big deal in and off itself. But imagine how that changes, if the size of each of those files is multiple times the amount of RAM in your machine. Today processing 5 files in parallel that each have a size of 6 GB is not a huge problem. Five times 6 GB makes 30 GB, and we multiply it by let’s say 5 again, because with things like double-byte encoding etc. they need a lot more RAM than disk space. So we end up with 150 GB. Today my 12 year old VM server at home has 256 GB of RAM. But our server (mid-2000s, remember 😉 ) only had 1 GB.
This meant that we could never load a file in its entirety, but had to rely on streaming. Initially we had also asked for a RAM expansion. But the vendor quoted 10k Euros per additional 1 GB of RAM. So that was not an option. Plus it is not a good idea to work with large files without streaming. You never know when one comes around the corner that is bigger than anticipated. What if this file contains some really critical transactions? Even if you have it in writing from your customer that this will never happen, the damage is done. And the person who confirmed the maximum size will always find a way to blame you, rather than risk loosing their job.
File encodings
The machines we had to connect with, were literally located across the globe. And since names and addresses were part of the game, staying with ASCII-only had not been an option. In theory the AS/400 is very powerful at automatically handling and converting between different encodings.
That power, however, comes with a certain complexity and over the years too many people had been involved with the many AS/400 servers. So in many, but of course not all ;-), cases the built-in capabilities had been disabled. So we had to handle the various conversions in Integration Server.
Technically this was one of the smaller issues, but it meant a lot of configuration. And sometimes, due to “left-overs”, a system would still handle some files on its own. Over time we all (incl. the customer folks) got used to the fact that every now and then we had to add yet another exception handling.
Server performance
When people think about servers, one of the most common associations is that of speed. And while today’s notebooks often put the most expensive servers from 20 years ago to shame, the current server models are usually still far ahead of those powerful notebooks. Having owned PCs since 1990 with a strong interest in hardware (I even sold hardware professionally in the late 1990s), I remember that it has always been like that. So imagine my surprise when I discovered that my decent but certainly not high-end notebook was faster than the customer’s server. At least when it came to using the built-in flat file capabilities of Integration Server. So what had happened?
This project was the first time that I got in touch with the WmFlatFile
package, which, while being powerful, is not exactly beginner-friendly. The complexity stems from its origin, because it comes from an EDI background. So while it is great that you can do much more than a simple CSV file, if all you have is a CSV file, that power gets in the way. In our case it was not a CSV file but had fixed length fields. However, the argument still applies.
So I had burned some midnight oil in the hotel and was very proud when my first implementation with WmFlatFile
finally worked. A considerable portion of that time had gone into the use of streaming. Something I had also not used on Integration Server before. So you can probably feel with me, when I was truly happy that not only did it work. But also showed an acceptable performance.
Guess what happened the next morning. I had proudly announced to the customer that things looked really promising and even shown my prototype on the laptop. Then we “transplanted” things over to the server, ran a test, and were shocked. The processing took more than twice as long there, compared to my notebook (single core CPU, conventional hard disk with 4200 RPMs, etc.).
So that evening in the hotel I re-implemented everything but this time in pure Java. At the time my Java skills were, shall we say, limited. So it was a stressful effort. And without my good friend Philipp K. I never could have done it.
FTP transfer mode
One of the really hard things was moving the flat files around via FTP. How can that be? After all, it is a really basic transfer method, that has been around for decades, and just works. Well, usually.
Our problem was the EBCDIC encoding of the files. That is a system coming from mainframes and as such it is different than the encodings we non-mainframe people usually deal with. So you cannot use the ASCII
transfer mode of FTP.
So what do you then? Of course you transfer in BINARY
mode and handle the encoding in your code. That is great, except that it does not work on an AS/400 because of its file system. The traditional file system on an AS/400 is very different from what you know from the open systems world. It is underpinned by a built-in version of the DB/2 database and every line is technically stored as a row in the database. There is also the Integrated File System (IFS), which is similar to what you find on Linux/UNIX systems. At the time of the project, however, the customer was not using the IFS, so we had to live with the traditional file management, also known as data management.
Why is that such a problem? Because the database approach is not only how text files are stored but also truly binary files. So when you retrieve a file in BINARY mode with FTP, here is what the AS/400 FTP server does. It reads the first line (think of a database table with just a single really wide column) and sends it over the wire to the FTP client. Then it jumps to the next row in the database, reads the column, and sends the contents to the FTP client again. That continues until the end of the file is reached.
But because it is a binary file, the contents must not be altered in any way. That is of course the only reasonable approach. So what happens when you transfer a file this way, that is actually a text file with discrete lines? You may now realize that “on disk” the end-of-line does not exist as such. Instead it is implicitly marked by the width of the column in the database. An FTP transfer in the ASCII
mode handles this by inserting the end-of-line in the right place, but BINARY
does not.
So the BINARY
transfer mode is not really an option either. Theoretically you could insert the appropriate character (which is not LF or CR-LF, but hex 15) every n bytes. But apart from the performance penalty this comes with two caveats: 1) What if some systems, but not all, use double-byte encodings? 2) You need to know the length of each file type and manage it as a configuration value or query the metadata. The former will likely also create issues, should the file format ever be changed (which it will). And the latter is not trivial to implement.
Luckily, the FTP server on the AS/400 knows an additional transfer mode, which is CCSID
(short for “Coded Character Set Identifier“, the system used by the AS/400 to handle encodings and code pages). So you can tell the FTP server to transfer the file in a certain encoding by issuing the FTP command “QUOTE TYPE C xxx
“, where xxx is a valid CCSID. Problem solved, right? Well, no.
The problem is that the FTP client of Integration Server only knows ASCII
and BINARY
as transfer mode. Yes, you can issue a custom command using the pub.client.ftp:quote
service. But once you invoke the pub.client.ftp:get
service, it will automatically switch back to ASCII under the covers.
In the end I built a custom FTP client, which worked fine. Fun fact: The actual build took less time than our research into the various possible approaches.
PKZIP
Some of the files were also transmitted as ZIP archives, because quite a few locations had really slow Internet access (I seem to remember 64 kBit/s). And since the files were fixed length flat file, they contained a lot of spaces. So with compression the size would often be reduced by at least 80% (sometimes more than 95%) and that easily offset the time for packing/unpacking.
When I first learned that there is a version of PKZIP for the AS/400, I was a bit surprised, having worked with PKZIP in the early 1990s on my first PC. So when I got my first test files, I unpacked them on Integration Server using a custom Java service and then ran the conversion to UTF-8 as the encoding used on Integration Server. That didn’t really work well but produced garbage.
So I pulled up my trusted UltraEdit editor, which was really the reference in those days. At the time it was the only editor (to my knowledge) that worked with really large files, had a great hex mode, and many other useful features. Today, I prefer Notepad++, but that didn’t exist back then. Imagine my surprise when I discovered that PKZIP had inserted LF (hex 0a) as end-of-line characters, instead of hex 15 for the EBCDIC encoding. So we had received a wonderful mix of EBCDIC content and ASCII end-of-line.
At the time it was not possible to tell PKZIP which character to use for end-of-line, so I ended up with a service that replaced 0x0a
with 0x15
on the fly. Again a long evening in the hotel with my at the time less-than-impressive Java knowledge. So the next day I informed the customer that the problem had been solved. They liked that and wanted to see it in action right away. The creation of a test file was triggered, which I then retrieved using my custom FTP client from above.
To my utter surprise and shock it didn’t work, however, and we got messed up data once more. I felt really embarrassed. Ok, so I turned to UltraEdit again and pretty quickly found the issue. In this file PKZIP had not inserted LF but a CR-LF sequence. The customer’s AS/400 people did some digging and finally found that it depends on the version of PKZIP, what was used for end-of-line. No way to customize it on the AS/400 side and it was also no option to bring all machines (dozens across the globe) to the same version of PKZIP.
We ended up adding a check to detect what we were dealing with. The alternative had been a configuration parameter. But since we could exclude the possibility of CR
or LF
being a valid character at the beginning of the next line, this was better from a maintenance perspective and also avoided the risk of a mis-configuration.
In closing
Given that it’s almost 20 years since the project was done, I probably forgot some details. Also, technology has changed a lot, so doing a similar project now might present different challenges. Yet I am convinced that the details from above can provide guidance also today.
I would also like to mention that despite the huge challenges and sometimes stressful situations, I look back at this project with a lot of fondness. It is, in hindsight, one of the best projects I have done during my entire career. Later the customer even made me a job offer, so it probably wasn’t too bad from their end either :-).
If you want me to write about other aspects of this topic, please leave a comment or send an email to info@jahntech.com. The same applies if you want to talk how we at JahnTech can help you with your project.
© 2024, 2025 by JahnTech GmbH and/or Christoph Jahn. No unauthorized use or distribution permitted.