National Archives and Records Administration Logo

Migration of Public Web Records

The National Archives & Records Administration (NARA) desired to migrate their public web records to a cloud-hosted architecture to provide accelerated delivery and discovery for any users requiring said records. The Oracle-based non-RAC Amazon Web Services (AWS) solution that was developed was the first of its kind on Amazon. In 2013, Xentity provided both Oracle Database Administrator (DBA) Performance tuning advisory and services. Also, prototype alternative architectures in rapid NoSQL search indices and working google-like web form entry with required default search signals.

So Much Data, Not Enough Power

At the time, NARA invested in an architecture model developed pre “big data”. It could handle full text search as well as facet searches. However, this model tended to be taxing on computing power in order to return accurate results. Also, those results were not contextually aware of popularity, typos, synonyms, etc. For NARA’s cloud-hosted, database transaction management solution, they needed a fast document search solution. One that could search by keywords across both metadata and full search. Also, it would have an intuitive user experience that could handle searching of millions of documents, authorities, and lookup lists along with thousands of monthly transactions.

NoSQL engines can have multiple index-like “signals” that the query engine can look up to better help interpret key signals to infer what the user may be looking for. The search engine solution developed by Xentity would handle and have an increased investment in interpretative signals. Google solutions or Open Source solutions like MongoDB are fast at addressing these “signals”, but are limited at full text document searches of extremely long documents which is required by NARA. 

Testing, Testing, 1, 2, 3

Newer search engine technologies, like the ones addressed above, have the potential to address all of NARA’s requirements. For example, there may be 10,000 results a NARA user wants to pull into their software. Then they move into transaction/edit mode. Finally, they commit those edits in Oracle. This immediately updates the NoSQL index. Also, it is available for near immediate use for full-text in Oracle or in some search engine solutions in full-text.

For the prototype alternative architectures, Xentity in a 2-week sprint, took a 1% sampling of the billions of records. In the AWS EC2, search time was reduced from 45 seconds to a sub-second in the NoSQL environment, while also allowing for mere minutes to load 1% and build the search index. For the second phase, Xentity tuned the Oracle storage and other configuration parameters. These dealt with advanced Oracle components such as ASM, Active Data Guard, XML Chunk sizing and oracle block sizing. This reduced the Oracle query timing from 45 seconds to under 3 seconds.

Speeding Things Up A Notch

According to Forbes, 90% of the world’s data was created in the last two years at the time of the article (2018). So, it’s important for organizations like NARA to have a cloud-hosted architecture. One capable of hosting and returning its own data. And we at Xentity were happy to help with the process.

For starters, Xentity’s efforts have reduced query and search time throughout the project. The important thing to note is that in an organization with large amounts of data such as NARA, the new architecture required handling millions of various documents and other kinds of data. By improving the discovery process (query times, search times, etc.), NARA will see an improved discovery process in query returns. This will hopefully return smarter results. This also allows users to take quicker actions on their intended “transactions” as the search results will be more context-aware. They will be particularly aware of language issues, popularity, and user personalized needs. Also, NARA was able to quickly rule out any hard constraints that some DB platforms can not support.