Features Comparison Matrix

There are many new Document Oriented Databases Out there. Here is a quick high-level comparison of features of five of these newer technologies that were compared when creating the prototype concepts discussed int he blog post – When moving to the cloud, consider changing your discovery approach

 Oracle on AWSMongoDB on AWSElasticSearch SphinxUse Mongo for quick search and Oracle for Full-Text
TypeSQLBSONJSONMix1Mix
EC2 CompatibleYesYesYesYesYes
Scale HorizontallyNon-RAC on AWSYesYesYesYes
LicensePaidOpen AGPL v3Open Apache 2OpenCombined #1 and #2
FullText (FT)YesUp to 1GB docsYesYesYes for Oracle
Near/ProximityYesNoYesYesYes on Oracle
Conditional QueriesYesYesYesTBDYes
RegExYesYesYes+NoYes
FacetsWould need to be coded into formsAggregationYesYesYes
Document LimitMeets Complicated document needs16MB/GridFS2GB*?Combined #1 and #2
Paging (FT) ResultsYesNo (16M limit)YesYesCombined #1 and #2
Speeds     
Inserts?Fast?Combined #1 and #2
Updates?Fast?Combined #1 and #2
Indexing?FastReally Fast

10-15MB of text/sec

Combined #1 and #2

Pros / Cons

Oracle

Pros: Likely already invested, easy to do updates in Oracle, ACID for transactions, large workforce

Cons: RAC not on AWS yet, if on XML database, index updates are complicated and high CPU/memory regardless of tuning efforts, no smart search components (i.e. no “signals” to provide more search or semantic context yet), public-facing licenses often priced different than for internal enterprise license

Mongo

Pros: Proves fast to sprint, improve, and add new signals, Proves fast for the metadata load, index update, batch load, search requirement for non full-text document search

Cons: Mongo is good for lots of things but for full-text search requirement, MongoDB cannot do that.

ElasticSearch

Pros:  Solr is also a solution for exposing an indexing/search server over HTTP, but ElasticSearch provides a much superior distributed model and ease of use. Elasticsearch uses Lucene v4 to provide the most powerful full text search capabilities available in any open source product. Search comes with multi-language support, a powerful query language, context aware did-you-mean suggestions, autocomplete and search snippets. All fields are indexed by default, and all the indices can be used in a single query, to return results at breath taking speed. And, can do still can do updates in Oracle or traditional RDMS directly and just sync with ElasticSearch

Cons: No built in security access to RESTful services but there are 2 plugins https://github.com/Asquera/elasticsearch-http-basic and https://github.com/sonian/elasticsearch-jetty as well as just nginx as a reverse proxy. Technology is maturing, new releases often, so your configuration management will be tested. This may require additional optimization and debug period, but other similar feature and document repository and search solutions have been created with this technology.

Sphinx (http://sphinxsearch.com/about/sphinx/ )

Pros: Currently very new JSON support but do support the following. SQL database indexing –  Sphinx can directly access and index data stored in MySQL (all storage engines are supported), PostgreSQL, Oracle, Microsoft SQL Server, SQLite, Drizzle, and anything else that supports ODBC. Non-SQL storage indexing – Data can also be streamed to batch indexer in a simple XML format called XMLpipe, or inserted directly into an incremental RT index.

Cons: Sphinx is maturing but marketing and overview is not as clear to get up and running. It is not really JSON friendly and is a bit more cryptic to plug and play.

Mongo (Read) / Oracle (Transaction / Sync

Pros: Re-uses Oracle investment for ACID and licenses, Still can do updates in Oracle directly, Mongo can be updated near real-time and fast, best of both worlds.  Oracle could do the full-text part as a secondary search requirement which would likely get less use, and Mongo could to the rest. If a quick partial migration or architecture change is digestable, but not ready for the full swap out, this is something to pass on fast, easy to maintain, supports interpretative signals, google like experience, scales. Mongo can do < 1GB searches and Oracl can do full text on > 1GB

Cons: Its the Prius or Volt model – its a hybrid, so maintaining two tech stack for long period of time, which can happen, can be more than a nuisance.

Recommendation:

Depends on your sunk investment, constraints, workforce, and needs. Your mileage may vary, but:

  • If you are sunk in Oracle, Mongo/Oracle is recommended
  • If you can move away or searching, and do not have full-text search requirement, and want to move fast, Mongo is the winner
  • If you want to move away from Oracle, and do have full-text search requirement, ElasticSearch is the big brother of Solr and has a little more steam and the winner.

Better yet, the best way to find out is do a prototype with light architecture definition upfront. The project usually can be done by 2-3 FTE in 2-4 weeks, assuming 10GB test data slice, Cloud access, data load, some performance test, and an AJAX UI test harness. If needing help, let us know. Best way to get buy-in on architecture beyond definition and rigor is demonstrating it has legs.