IRS clutches its modernization holy grail

2018 could be the year the IRS crosses the final frontier in its 25-year modernization drive?

At the moment, the agency has a short-term crisis. It has to translate the complicated new tax law into computer code. Its programmers must find all of the lines affected among millions of lines of code.  Their challenge is mainly time and scale.

The agency does some code revision every year.  But the scope this year is much larger than anything since the Reagan-era tax reforms…plus the underlying code has 36 more years worth of alterations preceding these changes.

IT is only part of the challenge. IRS needs new forms, new advisories, new instructions, and new training for its people.


IRS gets knocks because so much of its code is written in assembler. Successive commissioners have described this code in countless hearings. Assembler is like Shakespearean English. It’s dated. A shrinking number of people can deal with it. But it’s also elegant and  highly functional.

That this code has functioned through more than 50 filing seasons testifies to its durability. That it still works after countless alterations because of ever-changing policy testifies to the talent of the IRS work force. But assembler experts are becoming scarce, and IRS is called on to do much more than batch processing for which its assembler applications were optimized. Electronic filing, business intelligence, anti-fraud activities — all have spurred development of systems in a variety of architectures and languages.

Thus IRS operates many inter-related systems. It’s modernization efforts have produced some successes. But none of the efforts, none of the companies involved — principally the former Computer Sciences Corp. and IBM — have been able to solve what everyone understands is an essential key to modernizing. Namely, the individual and business master files that still exist as entities coded in assembler.

Now, IRS is on the verge of solving this problem.  The solution was engineered by a group of about eight people. And not under a multi-hundred-million-dollar systems integration contract. A leader of the group was Jian Wang, a Chinese emigre who is now a naturalized citizen. Wang told me his solution isn’t a silver bullet but rather a carefully worked-out methodology. It has three components so potentially powerful the IRS has filed patent applications for them.

I say “was” because he’s left the agency, and the status of the project is dark.

Wang was working under streamlined critical pay authority the agency has had since its landmark 1998 restructuring. It gave the IRS 40 slots under which it could pay temporary, full-time employees higher than GS rates. Former Commissioner John Koskinen pointed out Congress did not re-up this authority in 2013, despite his entreaties to former Congressman Jason Chaffetz’s Committee on Oversight and Government Reform.

“The last one ran out this past summer,” Koskinen said. The departures included Wang. He says he applied to become a GS-15 or Senior Executive Service member so he could see through the assembler-to-Java project. But his approval didn’t come through until a week before his employment authority expired. By then he’d accepted another job. Wang says he had a house to pay for, kids to educate. Koskinen confirms the agency wanted to convert Wang. But the process of approval from Treasury headquarters and the Office of Personnel Management simply took too long.

In a speech to the National Press Club last April, Koskinen mentioned Wang and his colleague Mark Yu “who developed a method for translating the programming language used in our legacy tax processing applications into the JAVA language.”

Wang explained his solution to assembler conversion to me in some detail. It proceeds from the fact that “in theory, there’s no way to translate assembler code. They way it runs is not how it reads.” Indeed, because it is so tightly coupled to machine instruction sets, assembler looks totally cryptic to 21st century programmers.

Wang and his team nonetheless developed a logical translation component, a “technical rule language” that acts as an intermediate stage to retain the logic withdrawn from the assembler, and a data extractor. By separating out the data, Wang says it was possible to trace the assembler logic flows, then abstract it into structured code in the technical rule language. He says testing proved the three parts could result in a Java program that accurately reproduces what the assembler code does. He said this was proven using production-sized data sets.

An early, comic application of motion picture technology titled “Dog Factory” shows pooches being “translated” into coils of sausages and back again by a big, hand-cranked machine. (Thank you, Library of Congress for preserving this gem.) Converting assembler code won’t be such a simple input-output matter. The IRS has thousands of assembly modules. The resulting Java must be tested for interoperability with all the others and for security. But Wang’s work shows it can be done.

I checked with former IRS chief technology officer Terry Milholland. He said, “They were literally at the stage of converting assembler to Java when Wang left.” In fact, Milholland’s own special employment authority also ended then. He believes IRS has bid out this work. “It’s frustrating to see what’s possible but not happening.”

I’ve asked IRS repeatedly over a period of weeks to clarify what is going on, but so far it hasn’t made anyone who might know available.

When the individual and business master files were coded in the early 1960s, assembler was an ideal solution. Computers and memory of the day were expensive. An IBM computer typically came with 512K of memory. But the IRS knew what it was doing. For example, in 1975, Computerworld reported how IRS programmers handled a tax rebate program signed into law by President Gerald Ford. IRS used six IBM S/360s, each with 512K of memory, and an S/370 with 2M of memory. The machines drew on a total of 68 tape drives. The rebate job was scheduled to take five batch cycles of 320 hours each.

In many ways, assembler is still excellent for this application. Milholland said of the code, “The assembler is well written. It’s incredibly efficient and effective.” But a shrinking number of people understand it. And it’s not optimized for the online, transaction mode to which the IRS needs to keep moving. Java, relatively inefficient as it may be, is the current standard and has legions of people who know it.

Now Wang is at another agency as a GS-15. This agency also has lots of legacy code, but it’s Cobol. Wang chuckled when I joked, compared to assembler, Cobol would be a piece of cake.

IRS techies seem to have a solution to their ultimate modernization puzzle in hand. If so, the question is whether they’ll use it.