Features Comparison Matrix

There are many new Document Oriented Databases out there to choose from for managing your documents. Here is a quick high-level comparison of features of five of these newer technologies that were compared when creating the prototype concepts discussed in the previous blog post: When moving to the cloud, consider changing your discovery approach

Oracle on AWSMongoDB on AWSElasticSearchSphinxUse Mongo for quick search and Oracle for Full-Text
TypeSQLBSONJSONMix1Mix
EC2 CompatibleYesYesYesYesYes
Scale HorizontallyNon-RAC on AWSYesYesYesYes
LicensePaidOpen AGPL v3Open Apache 2OpenCombined #1 and #2
FullText (FT)YesUp to 1GB docsYesYesYes for Oracle
Near/ProximityYesNoYesYesYes on Oracle
Conditional QueriesYesYesYesTBDYes
RegExYesYesYes+NoYes
FacetsWould need to be coded into formsAggregationYesYesYes
Document LimitMeets Complicated document needs16MB/GridFS2GB*?Combined #1 and #2
Paging (FT) ResultsYesNo (16M limit)YesYesCombined #1 and #2
Speeds
Inserts?Fast?Combined #1 and #2
Updates?Fast?Combined #1 and #2
Indexing?FastReally Fast10-15MB of text/secCombined #1 and #2

Pros / Cons

Oracle

Pros: Likely already invested, easy to do updates in Oracle, ACID for transactions, large workforce

Cons: RAC not on AWS yet, if on XML database, index updates are complicated and high CPU/memory regardless of tuning efforts, no smart search components (i.e. no “signals” to provide more search or semantic context yet), public-facing licenses often priced different than for internal enterprise license

Mongo

Pros: Proves fast to sprint, improve, and add new signals, Proves fast for the metadata load, index update, batch load, search requirement for non full-text document search

Cons: Mongo is good for lots of things but for full-text search requirement, MongoDB cannot do that.

ElasticSearch

Pros: Solr is also a solution for exposing an indexing/search server over HTTP, but ElasticSearch provides a much superior distributed model and ease of use. Elasticsearch uses Lucene v4 to provide the most powerful full text search capabilities available in any open source product. Search comes with multi-language support, a powerful query language, context aware did-you-mean suggestions, autocomplete and search snippets. All fields are indexed by default, and all the indices can be used in a single query, to return results at breathtaking speed. And, can do still can do updates in Oracle or traditional RDMS directly and just sync with ElasticSearch

Cons: No built in security access to RESTful services but there are 2 plugins https://github.com/Asquera/elasticsearch-http-basic and https://github.com/sonian/elasticsearch-jetty as well as just nginx as a reverse proxy. Technology is maturing, new releases often, so your configuration management will be tested. This may require additional optimization and debug period, but other similar feature and document repository and search solutions have been created with this technology.

Sphinx 

Pros: Currently very new JSON support but do support the following. SQL database indexing – Sphinx can directly access and index data stored in MySQL (all storage engines are supported), PostgreSQL, Oracle, Microsoft SQL Server, SQLite, Drizzle, and anything else that supports ODBC. Non-SQL storage indexing – Data can also be streamed to batch indexer in a simple XML format called XMLpipe, or inserted directly into an incremental RT index.

Cons: Sphinx is maturing but marketing and overview is not as clear to get up and running. It is not really JSON friendly and is a bit more cryptic to plug and play.

Mongo (Read) / Oracle (Transaction / Sync

Pros: Re-uses Oracle investment for ACID and licenses, Still can do updates in Oracle directly, Mongo can be updated near real-time and fast, best of both worlds. Oracle could do the full-text part as a secondary search requirement which would likely get less use, and Mongo could to the rest. If a quick partial migration or architecture change is digestible, but not ready for the full swap out, this is something to pass on fast, easy to maintain, supports interpretative signals, google like experience, scales. Mongo can do < 1GB searches and Oracle can do full text on > 1GB

Cons: It is the Prius or Volt model – its a hybrid, so maintaining two tech stack for long period of time, which can happen, can be more than a nuisance.

Recommendation:

Our recommendations depend on your sunk investment costs, system constraints, workforce, and your ultimate needs. Your mileage may vary, but:

  • If you are sunk in Oracle, Mongo/Oracle is recommended
  • Also, if you can move away or searching, and do not have full-text search requirement, and want to move fast, Mongo is the winner
  • Furthermore, if you want to move away from Oracle, and do have full-text search requirement, ElasticSearch is the big brother of Solr and has a little more steam and the winner.

Better yet…the best way to find out which system is best for you is to conduct a prototype with light architecture definition upfront. A prototype project usually can be done by 2-3 Full Time Employees (FTEs)  in 2-4 weeks, assuming 10GB test data slice, Cloud access, data load, some performance testing, and an AJAX UI test harness. If you need help, let us know. Reach out to us via email at sales@xentity.com or over the phone at 303-376-6217. Anyway, the best way to get buy-in on architecture beyond definition and rigor is demonstrating it has legs. In other words…can you show that it is able to stand on its own?