When Moving To The Cloud, Consider Changing Your Discovery Approach

We do not want to pave the cowpath (What cow paths, space shuttles, and chariots have in common or What are some patterns or anti-patterns where architecture and governance can help cover this point). All we want is to not only save the monies in moving to IT Commodity utility model. We also have an important question to consider. Do we just take the Management Information System (MIS) architecture and pattern and put that in the cloud? Or do we look at new patterns. New patterns like new search index, engine, or NoSQL models that allow rapid, near real-time smart discovery on the read part of the solution?

This will increase your data and digital assets relevancy. Especially as the market continues to make things easier, simpler, and instant gratification.

Traditional: Keyword Search Matching

This applies to many new large, cloud-hosted, database transaction management solutions. There are several organizations that need a fast document, records, objects, or content searches by facets, keywords across both metadata and full. These organizations strive to create a simple search experience. One that can handle millions of documents, authorities, and lookup lists along with thousands of monthly transactions.

Currently, the models architecture clients invested in, were models that were developed before “big data”. These models emulated the MIS form. They were searched for by trained users. Typically, they used a supporting search engine that does a full scan of any keyword, category, or facet filtering to return ALL matching records weighted by keyword closest match. This can not only handle full text search, but also facet search. However, it does tend to be taxing for computing power to return accurate results. Also, the results will not be aware of popularity, typos, synonyms, etc.

New Searching Architecture Is Not Just Faster, But Is Smarter

NoSQL models are put into a query box as well. However NoSQL engines can have multiple index-like “signals”. The query engine can use these signals to look up to better help interpret them. From there, they should be able to figure out the key signals to infer what the user may be looking for. The search engine solution would handle, and have an increase investment in, interpretative signals. For example, fuzzy logic support for popular search weighting, typos, thesauri integration, synonym, typo recognition, community based, event/trending, business rules, profile favorite patterns, and more. This could also include researching improving description framework improvements. Such research includes improving overlapping categorical/alignment, schema.org. Also, moving towards the Resource Development Framework in Attributes (RDFa).

When These Solutions Do Not Apply

As Apache on Hadoop (or it does imply NoSQL more in general) states, Hadoop is NOT:

Apache Hadoop is not a substitute for a database – you need something on top for high-performance updates
MapReduce is not always the best algorithm – if an you need MR jobs to know about the last, then you lose the parallelization benefits
Hadoop and MapReduce is not for beginner Java, Linux, or error debugging – Its open source, and emerging, so many of these techs built on top bring that and are worth the extra layering

Initial Newer Search Engine Better At Metadata Search, But Not Full Text And Full Results

Google solutions or Open Source solutions like MongoDB are quick at addressing these “signals”. However, they only do full text document searches of extremely long documents. Legal, policies, or other regulations sometimes require these documents. For instance, whether it is CraigsList or Groupon, a user is searching against metadata fields. The millisecond-long results of the top x hundred results return as a result. However, the results are neither hitting the raw data nor every record. Instead, they are hitting these index-like constructs with the record ID. If they succeed,, the user can then call a URL to then go into transaction mode back in Oracle. If editing occurs, the trigger to update the NoSQL index can undergo immediate updates. Full-text updates can be updated in Oracle reasonably fast, but definitely not as fast as the NoSQL index.

There are other solutions in the newer search engine technologies that can address all the aforementioned requirements. For example, there may be 10,000 results and the user wants to pull all those results into their software. Then, the user will move into transaction/edit mode and commit those edits in Oracle. Finally, the NoSQL index can undergo immediate updates, and be available for near immediate use for full-text in Oracle, or in some search engine solutions in full-text.

Exploring NoSQL And New Signals Will Yield Faster And Smarter Results

Point being, the improved discovery system will not only be faster from a “query return” point-of-view, but also by returning smarter results. This will also make the discovery process itself faster, moving the user to faster actions on their intended transactions as the search results will be more context aware of language issues, popularity, and user personalized needs. This can be achieved by technologies such as ElasticSearch, possibly Spinx, or possibly a combination of MongoDB for fast search, and the already-existing Oracle for full-text search.