Washington just can’t catch a break. Between the debt ceiling, sequesters, shutdowns, and now an epic fail of a single portal solution meant to provide primary access to the new healthcare access law – Healthcare.gov

Love or hate the law, the model for the electronic Health Care access provision (Healthcare.gov) is definitely an epic fail of a technology solution roll-out. And like always, whenever an epic fail occurs people are naturally inclined to pile on it. And as architects, we not only wish to pile on but also offer our own opinions. That way, we can fix things in the next update.

What else is an architect to do if we can’t pile on and offer our own opinions? At the same time, we are hoping to catch someone’s attention or get the sense someone is thinking about it from a design point of view, as it appears the media and hearings. In doing so, hopefully we can arrive at solutions to improve, and God knows they could use solutions.

Let me be blunt: Healthcare.gov is a bad design. I’m not commenting on the policy part – the pundits do enough of that. I’m commenting on the architecture itself.  The following captures our internet sleuthing, colleague discussions, some musings about oddly similar historical events and our current deductions that arise as a result. And when we put it all together, we can hopefully arrive at a nice solution for a very controversial problem. Or at least, paint a picture of what not to do in the future.

UX And Web Design Is Fine – The Front End Is Not The Issue

Much fanfare has gone to the Healthcare.gov website glitches. A lot of that fanfare was written back in June by Alex Howard in the Atlantic. I had the pleasure of connecting with Alex during data.gov work back in 2010. Also, off and on, we correspond on social media occasionally. I have a respect for his writing, what he follows, and have found generally that he is spot on with bringing collaboration across traditional boundaries into the world of information and technology.

That said, I may be a bit biased.

First of all client-side tweaking is a really good practice, especially for large sites with lots of traffic. Every little bit of debris cleanup helps. It appears that the problem is primarily a server-side architecture problem. There were definitely several issues with the front-end performance as the above articles suggest. It’s easy for the web technophiles to do, since most of the web processing is on the client side (i.e. hit F12 on a PC in Chrome or do  YSlow plugin for Firefox) are easy to use. Also, websites like Google Analytics can analyze performance, content safety, usage statistics and so much more. Point being, are you going to put more resources in to fix the leaky faucet or the gushing, gaping hole in the water main first?

That Said…

Now, Alex Howard only reported on the developments of the front end and UX component. There is well-deserved praise for Prose.io, Jekyll, open source concepts, garage organizations breaking beltway development stereotypes, and persona development. All are typically viewed as a way to develop the navigation for this brand new pattern.  The point being, the UX and web design part is fine and and dandy. Alex was right back in June and still is. It’s the same architecture I did for united.com back in ’99. It’s just different tech. When 9/11 hit, my site was the only airline site (check internet wayback machine) and call center that stayed up. So, this model for hc.gov is fine.

Unfortunately, Alex did not report on the back-end part. Mind you, this part is underreported and the complexities of the iceberg under the water have been misunderstood. A lot of folks have joined in bashing Alex’s journalism and attacked his integrity as a bandit of sorts. In other articles, I saw some purist nerds talk about some of the poorly grouped javascript added some extra callbacks and how it wasn’t as static as it should have been. I felt pretty bad for him and generally as a citizen, embarrassed. So like many of us architect weenies, I dug into the problem as many other colleagues have. Regardless of politics, we all want a working country.

Why The Logic For Real-time Data Aggregation Architecture?

For comparison, the Tacoma Narrows Bridge (See Figure 1 below) didn’t fall because the construction contractor failed. It was because the architecture forgot one piece of logic: the wind on the river was very strong. There was nothing to dampen the flow, and when the wind blew, it blew the bridge past and near its natural frequency. Those vibrations shattered the bridge. It was built to specification by the contractors, but the design was horrible because it had flawed logic.

OK, where is the parallel? If the back-end problem for healthcare.gov is the real issue then where is it? My background and recent work on “hero” architecture made me hope it was a minor performance issues that could be fixed by some horizontal scaling of servers. They did that, no major changes. Maybe it could be some technical server or software tuning, but no such luck in that either. That only leaves bad logic, like with the Tacoma Narrows Bridge.

 Image-Tacoma_Narrows_Bridge1

While We’re On The Subect…

Now it’s possible the contractor did conduct faulty construction, and there wasn’t enough foresight to do more parallel testing, load testing, integration testing and the “7 steps of doneness“. The architects came out and said it today. Though that sounds a little passive aggressive now, as an architect of a building, would you say that after the bridge collapsed? Sounds like either buck passing or gag order or droopy dog, no one is listening to me.

But even that would have been able to be fixed by now. So, aside from the obvious lack of discipline for engineering failures, which is the civil engineering equivalent to the Tacoma Narrows Bridge, what logic am I talking about?

I believe it comes down to the logic of architecture in this case. They treated the architecture in healthcare.gov like a controlled real-time MIS distributed system on data that has not been standardized or proven to be easily integrated through the test of time. And consequently, we got a poorly designed product that collapsed like a bridge.

But Enough Piling, Where Do We Go From Here?

Hard to say. While we did not really want to pile on the website (everyone else did enough of that), we cannot really deny the issues it has. And when you consider the examples we had laid out that show the sort of “historical precedent” in what not to do in these scenarios, it’s actually surprising things went as bad as they did. State the obvious though, it can be a lot better. If a website is rolled out with issues, you need to find solutions to improve so that the next version is better. It’s the same with apps. At Xentity, we strive to help our clients improve and evolve their current tools as well, and it is our hope that the people running Healthcare.gov do the same.