What cow paths, space shuttles, and chariots have in common

Blog post
edited by
Wiki Admin

A colleague recently sent me a chain email (they do still exist) about the old adage on how new technology is driven by thousand year old standards. I had seen it before. I remember then I liked it. But, my new habit on chain emails or viral urban legends was to poke around. Being childlike, I hope for fun new ways too see things, but being a problem-solver as well, I am skeptical of these amazing discovery of trivial connections. Regardless, its still a fun story where one can mine some good nuggets.

The anecdote essentially notes how historical inventions are connected and a moral. Reading it backwards, it connotes how the width of the space shuttle rocket boosters are due to width of railroad tunnel. And how railroad tracks width are due to the carriage wheel width. And how that width is tied to chariot width because of the width of two horses. Point being, the boosters width is derived due to width of two horses rear-ends.

Like I said, it is fun, but the tangents are more loosely coupled and coincidental than the “seven degrees of Kevin Bacon” concept. Snopes nicely walks us through how while this is true, but only through generalities – not unlike how someone could say the clothes we wear now is because of a medieval tailor sized it that way. Snopes can be a party pooper some time, but they did also note a few things about people and change (insert my agenda HERE). This is why I do like stories like this as I can tie my own tangential take-aways from it.

Snopes points out humans presets on change:

Although we humans can be remarkably inventive, we are also often resistant to change and can be persistently stubborn (or perhaps practical) in trying to apply old solutions to new conditions. When confronted with a new idea such as a “rail,” why go to the expense and effort of designing a new vehicle for it rather than simply adapting ones already in abundant use on roadways? If someone comes along with an invention known as an “iron horse,” wouldn’t it make sense to put the same type of conveyance pulled by “regular” horses behind it?

It goes on for several more examples noting how new innovations leverage the blueprints of previous generation inventions, regardless of their direct influence. The tone felt a bit down when noting this, but I felt this continuity is not wholly a bad thing.

As a physical society that build infrastructure to share, this compatibility is needed to limit the impact of disruption while progressing towards addressing societal challenges of Maslo’s Hierarchy of Needs globally.

For example, lets say there is a future decision to stop using dams for hydroelectric power and go into a series of nano-electric generators that works off river flow that would impede water less and generators more power. This is great as we have a lower cost, simpler, more efficient solution that also does not disrupt the ecosystem such as riparian development, fiash spawning, etc. like dams have for decades.

How do we transition to the new nano solution. The railroad story says we would use the previous footprint of the dam, and once ready, slowly migrate to the new solution to allow the water flow to slowly come back in place. This would allow the wetlands and riparian ecosystem to grow back at natures pace, and allow for fish and river life to adapt generationally.

Yet, the new solution does not require the same footprint. We could build it anywhere along the river. It could even be setup in a series of micro generators, and once the level of energy put into the grid matches the dams, in theory, the dam could just be exploded, and we could progress on without anyone in the future anthropocene historic footprint to be aware that a dam was ever there.

But, removing the previous infrastructure in a responsible way will be key. Blowing up a dam means the water release would cause major sediment displacement, kill the dam-resulted adapted riparians and wetland ecosystems, and generations of fish and river life would actually die as a result. The dismantling process, though not required for the new direct energy human need, is very critical to consider the indirect impact of the evolved ecosystem. 

If still interested, check out the follow-up blog post So what is the point of this metaphoric drivel

Comparing NoSQL Search Technology Features

Blog post
edited by
Wiki Admin

Features Comparison Matrix

There are many new Document Oriented Databases Out there. Here is a quick high-level comparison of features of five of these newer technologies that were compared when creating the prototype concepts discussed int he blog post – When moving to the cloud, consider changing your discovery approach

 

Oracle on AWS

MongoDB on AWS

ElasticSearch

Sphinx

Use Mongo for quick search and Oracle for Full-Text

Type

SQL

BSON

JSON

Mix1

Mix

EC2 Compatible

Yes

Yes

Yes

Yes

Yes

Scale Horizontally

Non-RAC on AWS

Yes

Yes

Yes

Yes

License

Paid

Open AGPL v3

Open Apache 2

Open

Combined #1 and #2

FullText (FT)

Yes

Up to 1GB docs

Yes

Yes

Yes for Oracle

Near/Proximity

Yes

No

Yes

Yes

Yes on Oracle

Conditional Queries

Yes

Yes

Yes

TBD

Yes

RegEx

Yes

Yes

Yes+

No

Yes

Facets

Would need to be coded into forms

Aggregation

Yes

Yes

Yes

Document Limit

Meets Complicated document needs

16MB/GridFS

2GB*

?

Combined #1 and #2

Paging (FT) Results

Yes

No (16M limit)

Yes

Yes

Combined #1 and #2

Speeds

 

 

 

 

 

Inserts

?

Fast

?

 

Combined #1 and #2

Updates

?

Fast

?

 

Combined #1 and #2

Indexing

?

Fast

Really Fast

10-15MB of text/sec

Combined #1 and #2

Pros / Cons

Oracle

Pros: Likely already invested, easy to do updates in Oracle, ACID for transactions, large workforce

Cons: RAC not on AWS yet, if on XML database, index updates are complicated and high CPU/memory regardless of tuning efforts, no smart search components (i.e. no “signals” to provide more search or semantic context yet), public-facing licenses often priced different than for internal enterprise license

Mongo

Pros: Proves fast to sprint, improve, and add new signals, Proves fast for the metadata load, index update, batch load, search requirement for non full-text document search

Cons: Mongo is good for lots of things but for full-text search requirement, MongoDB cannot do that.

ElasticSearch

Pros:  Solr is also a solution for exposing an indexing/search server over HTTP, but ElasticSearch provides a much superior distributed model and ease of use. Elasticsearch uses Lucene v4 to provide the most powerful full text search capabilities available in any open source product. Search comes with multi-language support, a powerful query language, context aware did-you-mean suggestions, autocomplete and search snippets. All fields are indexed by default, and all the indices can be used in a single query, to return results at breath taking speed. And, can do still can do updates in Oracle or traditional RDMS directly and just sync with ElasticSearch

Cons: No built in security access to RESTful services but there are 2 plugins https://github.com/Asquera/elasticsearch-http-basic and https://github.com/sonian/elasticsearch-jetty as well as just nginx as a reverse proxy. Technology is maturing, new releases often, so your configuration management will be tested. This may require additional optimization and debug period, but other similar feature and document repository and search solutions have been created with this technology.

Sphinx (http://sphinxsearch.com/about/sphinx/ )

Pros: Currently very new JSON support but do support the following. SQL database indexing –  Sphinx can directly access and index data stored in MySQL (all storage engines are supported), PostgreSQL, Oracle, Microsoft SQL Server, SQLite, Drizzle, and anything else that supports ODBC. Non-SQL storage indexing – Data can also be streamed to batch indexer in a simple XML format called XMLpipe, or inserted directly into an incremental RT index.

Cons: Sphinx is maturing but marketing and overview is not as clear to get up and running. It is not really JSON friendly and is a bit more cryptic to plug and play. 

Mongo (Read) / Oracle (Transaction / Sync

Pros: Re-uses Oracle investment for ACID and licenses, Still can do updates in Oracle directly, Mongo can be updated near real-time and fast, best of both worlds.  Oracle could do the full-text part as a secondary search requirement which would likely get less use, and Mongo could to the rest. If a quick partial migration or architecture change is digestable, but not ready for the full swap out, this is something to pass on fast, easy to maintain, supports interpretative signals, google like experience, scales. Mongo can do < 1GB searches and Oracl can do full text on > 1GB

Cons: Its the Prius or Volt model – its a hybrid, so maintaining two tech stack for long period of time, which can happen, can be more than a nuisance.

Recommendation:

Depends on your sunk investment, constraints, workforce, and needs. Your mileage may vary, but:

  • If you are sunk in Oracle, Mongo/Oracle is recommended
  • If you can move away or searching, and do not have full-text search requirement, and want to move fast, Mongo is the winner
  • If you want to move away from Oracle, and do have full-text search requirement, ElasticSearch is the big brother of Solr and has a little more steam and the winner.

Better yet, the best way to find out is do a prototype with light architecture definition upfront. The project usually can be done by 2-3 FTE in 2-4 weeks, assuming 10GB test data slice, Cloud access, data load, some performance test, and an AJAX UI test harness. If needing help, let us know. Best way to get buy-in on architecture beyond definition and rigor is demonstrating it has legs.

 

When moving to the cloud, consider changing your discovery approach

Blog post
edited by
Wiki Admin

As we do not want to pave that cowpath (What cow paths, space shuttles, and chariots have in common or What are some patterns or anti-patterns where architecture and governance can help cover this point), we want to not only save the monies in moving to IT Commodity utility model, but also consider, do we just take the MIS architecture and pattern and put that in the cloud, or do we look at new patterns, such as new search index, engine, or NoSQL models that allow rapid, near real-time smart discovery on the read part of the solution.

This will increase your data and digital assets relevancy as the market demands to make things easier, simpler, and instant gratification. 

 

Traditional: Keyword Search Matching 

For many new large, cloud-hosted, database transaction management solution, organization needs a fast document, record, object, or content search by facets, keywords across both metadata and full search with a quick, nice experience that can handle millions of documents, authorities, and lookup lists along with thousands of monthly transactions.

Currently, the architecture clients invested in is a model that was developed pre “big data”. These models emulate MIS form based searched by trained users with a supporting search engine that does a full scan of any keyword or some category or facet filtering to return ALL matching records weighted by keyword closest match. This can handle full text search as well as facet search, but does tend to be higher taxing on computing power to return not only accurate results, but results are will not be context aware of popularity, typos, synonyms, etc..

New Searching architecture is not just Faster, but is Smarter

NoSQL models are put into query box as well, but NoSQL engines can have multiple index-like “signals” that the query engine can look up to better help interpret should be able to figure out the key signals to infer what the user may be looking for. The search engine solution would handle and have an increase investment in interpretative signals (i.e. fuzzy logic support for popular search weighting, typos, thesauri integration, synonym, typo recognition, community based, event/trending, business rules, profile favorite patterns, etc.). This could include as well researching improving description framework improvements such as improved overlapping categorical/alignment, schema.org and move towards RDFa.

When these solutions do not apply

As Apache on Hadoop states, Hadoop (or it does imply NoSQL more in general) is NOT:

  1. Apache Hadoop is not a substitute for a database – you need something on top for high-performance updates
  2. MapReduce is not always the best algorithm – if an you need MR jobs to know about the last, then you lose the parallelization benefits.
  3. Hadoop and MapReduce is not for beginner Java, Linux, or error debugging – Its open source, and emerging, so many of these techs built on top bring that and are worth the extra layering.

Initial newer search engine better at metadata search, but not full text and full results

Google solutions or Open Source solutions like MongoDB are fast at addressing these “signals”, but are limited at full text document searches of extremely long documents which is sometimes required by legal, policies, or other regulations.  For instance, when doing a CraigsList or Groupon, a user is searching against metadata fields, i.e. “Bike Vintage” between 1950 and 1960 and most of the text, and what is returned is milli-second results of the top x hundred results, but the results are not hitting the raw data and nor every record, but instead is hitting these index-like constructs with the record ID. For those results succeeding, the user can then call a URL to then go into transaction mode back in Oracle. If an edit is made, the trigger to update the NoSQL index can be updated immediately as well as full-text updates can be updated in Oracle reasonably fast, but definitely not as fast as the NoSQL index.

There are other solutions in the newer search engine technologies that can address all requirements. For example, there may be 10,000 results a user wants to pull all those results into their software, then move into transaction/edit mode, and commit those edits in Oracle, and the NoSQL index can be update immediately, and be available for near immediate use for full-text in Oracle or in some search engine solutions in full-text.

Exploring NoSQL and new signals will yield faster and smarter results

Point being, the improved discovery not only be faster from a query return point of view, but also by returning smarter results. This will also make the discovery process itself faster, to move the user to faster actions on their intended transactions as the search results will be more context aware of language issues, popularity, and user personalized needs.  This can be achieved by technologies such as ElasticSearch, possibly Spinx, or possibly a combination of MongoDB for fast search, and existing Oracle for full-text search.

Xentity holding Methodology Development review for ACT-IAC, Smart Lean Government

Blog post
edited by
Wiki Admin

Xentity, Methodology Development example (2002-2008 How TAA grew to MBT grew to OMB FSAM)

Over 6 years, it took several iterations of method, applying to projects scaling up method scope, application on programs/segments, lots of info sharing, and then promotion to base of FSAM. This is not to direct the Content of the ACT-IAC, Smart Lean Government (SLG) efforts, but how the method got developed, training material built out and trainer, how the team went out information sharing, promoting, and then helping apply to support true business transformation and modernization blueprints.

Summary of Objectives:
– Covered how Xentity and partners supported the Dept of the Interior iterate/develop the “Methodology for Business Transformation” showing some sample deliverables, templates, and visuals
– Discuss how we iterated from Bureau TAA (02-03) to DOI MBT (04-06) to OMB FSAM (07-08) through maturing from bureau, to agency, to federal through not only iterative method development, but through applying in iteration. First a few blueprints at bureau, then four at agency cutting across different driven blueprints (legal, organization development, target architecture, and shared services), then improving the method, and doing more blueprints, then doing more communications and training, and finally, promotion to OMB through collaborative integration with other methods.
– Cover lessons learned, critical success factors (i.e. framework agnostic, mapping to frameworks, culture understood, tested in 3 cycles of blueprints, breaking into services, first attempted foray into doing as NGO in 2006)
– Pitfalls (i.e. overselling, under training, lacking certification, didn’t emphasize earlier phases, limited bureau accountability, perception and politics, moving to FSAM – losing key aspects, calling it method instead of approach, perceived waterfall and not presented situational, training/maintenance budget limits)
– Cover things we would do now differently (i.e. wiki, more examples, more new Media communications)

Related Xentity links:

 

Some favorite TED talks

Blog post
edited by
Wiki Admin
– “Migrated to Confluence 5.3”

A business partner last night said “I don’t wake up and turn on my phone, or watch TV, or check email right away. I try to keep it simple… ” he said as several of us waxed rhapsodic of the pre-pocket tech and internet days and how teenagers patterns know no other world. But yet he continued, “OK, well that’s not true, I do get my morning dose of TED for inspiration”. 

Its just one more to add to the many morning intake mediums. People seeking personal philosophical guidance in the morning through religion, scripture, reading a story, meditation, prayer, mind-body engagement or quiet time. People seeking temporal context in morning news TV, newspaper, internet and feeds, websurfing (can I still use that term?), tablet time. People seeking social engagement with morning coffeee at the diner with the guys/gals, spouse or/and kid quality time, the facebook rise-and-shiner, or other social media digests. People seeking inspiration in either of the above

Personally, I have yet to ever find my morning ritual and I bounce in different mediums. Sometimes, its playing trains or toys or some activity with the family when we get a good rhythm going that morning, sometimes it is tablet browsing when feeling curious on various news or video feeds, sometimes it is mindless TV news digestion, and probably more rare than I should, sometimes it is outside quiet time in a run, bike, walk, or reading or whanot. Other times, the day gets going to fast, and there is no interstitial time, and an east coast call to this mountain time zone starts right up.

Though, I haven’t found my rhythm, but over the last partial decade here are a few of the greatest TED hits I’ve tweeted out as greatest hits and found inspirational :

Hans Rosling: Stats that reshape your world-view (Jun 2007)

Geoffrey West: The surprising math of cities and corporations (July 2011)

TEDxUofM – Jameson Toole – Big Data for Tomorrow (May 2011)

Eli Pariser: Beware online “filter bubbles” (Mar 2011)

Sugata Mitra: Build a School in the Cloud (Feb 2013)

Deb Roy: The birth of a word (Mar 2011)

-mt