Some favorite TED talks

Blog post
edited by
Wiki Admin
– “Migrated to Confluence 5.3”

A business partner last night said “I don’t wake up and turn on my phone, or watch TV, or check email right away. I try to keep it simple… ” he said as several of us waxed rhapsodic of the pre-pocket tech and internet days and how teenagers patterns know no other world. But yet he continued, “OK, well that’s not true, I do get my morning dose of TED for inspiration”. 

Its just one more to add to the many morning intake mediums. People seeking personal philosophical guidance in the morning through religion, scripture, reading a story, meditation, prayer, mind-body engagement or quiet time. People seeking temporal context in morning news TV, newspaper, internet and feeds, websurfing (can I still use that term?), tablet time. People seeking social engagement with morning coffeee at the diner with the guys/gals, spouse or/and kid quality time, the facebook rise-and-shiner, or other social media digests. People seeking inspiration in either of the above

Personally, I have yet to ever find my morning ritual and I bounce in different mediums. Sometimes, its playing trains or toys or some activity with the family when we get a good rhythm going that morning, sometimes it is tablet browsing when feeling curious on various news or video feeds, sometimes it is mindless TV news digestion, and probably more rare than I should, sometimes it is outside quiet time in a run, bike, walk, or reading or whanot. Other times, the day gets going to fast, and there is no interstitial time, and an east coast call to this mountain time zone starts right up.

Though, I haven’t found my rhythm, but over the last partial decade here are a few of the greatest TED hits I’ve tweeted out as greatest hits and found inspirational :

Hans Rosling: Stats that reshape your world-view (Jun 2007)

Geoffrey West: The surprising math of cities and corporations (July 2011)

TEDxUofM – Jameson Toole – Big Data for Tomorrow (May 2011)

Eli Pariser: Beware online “filter bubbles” (Mar 2011)

Sugata Mitra: Build a School in the Cloud (Feb 2013)

Deb Roy: The birth of a word (Mar 2011)

-mt

The Four V's of BigData – Variety Veracity Velocity Volume

Blog post
edited by
Matt Tricomi

If we are simply talking about lots of retail data, lots of sales data, lots of management data, lots of metadata, we aren’t talking BigData. Though for some reason, those data are going through and into the new architectures. Yeah, sure the retail, defense, and intelligence worlds have been sifting through huge data stores for years. 

But the marketing coined term BigData is not just referring to the Volume of the data. There are 4 V’s of big data . We have been enjoying using as learned through our information exchanges and partnering with IBM.

The first two v’s focus on mission side: Variety and Veracity

Variety (how many categories of data does it cover) includes as well technology (sources, formats (

(e.g. numeric, text, objects, geocoded, vector, raster, structured, unstructured – email, video, audio, etc.), methods) and legal  (complexity, privacy, jurisdiction). Essentially working with various types of data including various dimensions (temporal, geospatial, sentiment, metadata, logs, etc.) and 

Veracity (understanding authority of data) includes known data quality, type of data, data management maturity so that you can understand how much you can trust the data is right and accurate

The other focus on the speed and amount: Velocity and Volume

Volume (how much data). – Capturing, Processing, Reporting and managing a large volume of data

Velocity (how often it changes or real-time) – Analyzing and exploiting lots and lots of new data in real-time

Changes to the V’s over the last 15 years

 

 

Veracity is the newer one. The concept of data quality has always been the orphaned step-child. It is the I in IT. Its the part of the iceberg under the water. All IT vendors want to sell speed, and handles lots of data, and some commodities of variety support, but once sold, you are on your own (or $300/hour for mission customization). But, we are happy to see IBM got there and added it.

As of late, Value has been introduced as well. Paraphrasing the spanish article, even if you can produce information, is there is no real action that can be done with it, its not Valuable to the organization.

Then again, some have accused IBM of stealing the V’s from 2001 Gartner V’s development . This V-gate controversy :

Nonetheless, if anything it proves that its a concept that is worth fighting over meaning and digging into it, you see it has its merits. 

Will geoscience go for a shared service environment

Blog post
edited by
Wiki Admin

Will geoscience go for a shared service environment?

As the previous “How can we help geoscience to move their data to shared services” blog noted, unless we align the stakeholders, get a clear line of sight on their needs, and focus on earning trust and demonstrating value, the answer is no. But let’s say we are moving that way. How do we get started to fund such an approach?

Well, first off, the current grant and programmatic funding models are not designed to develop shared services or interoperable data for the geosciences.  Today, there are many geoscientists who are collaborating between disciplines and as a result improving the quality of knowledge and scaling the impact of their research.  It is also well established that the vast majority operate individually or in small teams.  Geoscientists, rightly so, continue to be very focused on targeted scientific objectives and not on enabling other scientists. It is a rare case when they have the necessary resources or skills.  With the bright shiny object of data driven science /Big data; do we have the Big Head wagging the body of the geoscientist community?  Xentity sees opportunities to develop funding strategies to execute collaborative performance-based cross discipline geosciences. It has been this way since World War II really expanded upon its war-time successful onesy-twosy grants to universities since then. There has been some movement towards hub and spoke grant funding models, but we are still out to get our PhD stripes, get our CVs bigger, and keep working with the same folks. I know it is a surly and cynical view. OK, the real is they are doing amazing work, but in their field, and anything that slows down their work, for greater good, is lacking incentive.

Also, there are few true shared services that are managed and extended to the community as products and services should be. Data driven science, which is out fourth paradigm of science, has been indirectly “demanding” scientific organizations and their systems to become 24×7 service delivery providers. We have been demanding IT programmers to become service managers, scientist to become product managers, or data managers.  With a few exceptions, it has not worked. Geoscientists are still struggling to find and use the basic data/metadata, produce quality metadata (only 60% meet quality standards per EarthCube studies) for their own purposes, let alone making the big leap to Big data and analytics. Data driven science requires not only a different business or operating model, but a much clearer definition of the program, as well as scientist’s roles and expectations.  It requires new funding strategies, incentive models and a service delivery model underpinned by the best practices of product management and service delivery.

Currently, and my favorite, there is limited to no incentive for most geoscientist to think beyond their immediate needs.  If geoscientists are to be encouraged to increase the frequency and volume of cross-discipline science, there needs to be enablement services, interoperable data and information products that solve repetitive problems and provide incentive for participation.  We need to develop the necessary incentive and management models to engage and motivate geoscientist, develop a maturity plan for the engineering of shared geoscience services and develop resourcing strategies to support its execution. Is this new funding models, new recognition models, new education, gamification, crowdsourcing, increasing competition, changing performance evaluation? Not sure as any changes to “game” rules can and usually introduces new loopholes and ways to “game” the system.

The concept of shareable geoscience data, information products and commodity or analytical computing services has an existing operating precedent in the IT domain –shared services.  Shared services could act as a major incentive for participation.  An approach would identify the most valuable cross cutting needs based on community stakeholder input. The team would use this information to develop a demand driven plan for shared service planning and investment. As an example, a service-based commodity computing platform can be developed to support both the Big Head and Long Tail and act as incentive to participation and perform highly repetitive data exchange operations.

How does one build and sustain a community as large and diverse as the geosciences? 

The ecosystem of geoscience is very complex from a geographic, discipline and skill level point of view. How does one engage so diverse a community in a sustainable manner?  “Increased visibility of stakeholder interests will accelerate stakeholder dialogue and alignment – avoiding “dead ends” and pursuing opportunities.” The stakeholders can range from youthful STEM to stern old school emeritus researchers; from high volume high frequency data producers of macro scale data to a single scientist with a geographically targeted research topic. It is estimated that between 80-85% of the science is done in small projects.  That is an enormous intellectual resource that if engaged can be made more valuable and productive.

Here is a draft target value chain :

The change or shift puts a large emphasis on upfront collaborative idea generation, team building, and knowledge sharing via syndication, and new forms of work decomposition in the context of crowd participation (Citizen Science and STEM).  The recommended change in the value chain begins to accommodate the future needs of the community.  However, the value chain becomes actionable based on the capabilities associated to the respective steps.  Xentity has taken the liberty to alliteratively define these four classes of capabilities or capability clusters as:

Encouragement, Engagement, Enablement, and Execution.

Encouragement capabilities are designed to incentivize or motivate the scientist and data suppliers to participate in the community and garner their trust. They are designed to increase collaboration, the quality and value of idea generation and will have a strong network and community building multiplier effect. 

Questions

Capabilities

  • How can new scientific initiatives be collaboratively planned for and developed?
  • How can one identify potential collaborators across disciplines?
  • How can one’s scientific accomplishments and recognition be assured and credited?
  • What are the data possibilities and how can I ensure that it will be readily available?
  • How can scientific idea generation be improved?
  • Incentives based on game theory
  • Collaboration, crowd funding, crowd sourcing and casting
  • Needs Analysis
  • Project Management and work definition
  • Credit for work Services

Engagement Capabilities include the geoscience participant outreach and communication capabilities required to build and maintain the respective communities within the geoscience areas.  These are the services that will provide the community the ability to discuss and resolve where the most valued changes will occur within the geosciences community, who else should be involved in the effort?  

Questions

Capabilities____________________

  • What participants are developing collaborative key project initiatives?
  • What ideas have been developed and vetted within the broadest set of communities?
  • Who, with similar needs, may be interested in participating in my project?
  • How can Xentity cost share?
  • Customer Relationship Management
  • Promotions
  • Needs Analysis
  • Communications and Outreach
  • Social and Professional Networking

Enablement capabilities are technical and infrastructure services designed to eliminate acquisition, data processing and computing obstacles and save scientist time and resources.  They are designed to solve frequently recurring problems that affect a wide variety and number of geoscience stakeholders from focusing on their core competencies – the creation of scientific knowledge. Enablement services will have a strong cost avoidance multiplier effect for the community on the whole if implemented and supported.

Questions

Capabilities

  • How does one solve data interoperability challenges for data formats and context?
  • How do I get data into the same geographic coordinate system or scale of information?
  • How can I capture and bundle my Meta information and scientific assets to support publication, validation and curation easily?
  • How can I get access extensible data storage throughout the project lifecycle?
  • Where and how can I develop an application with my team?
  • How can I bundle and store my project datasets and other digital assets later retrieval?
  • How can I get scalable computing resources without having to procure and manage servers to complete my project?
  • Workflow
  • Process Chaining
  • Data Interoperability
    • Data transformations
    • Semantics
    • Spatial Encoding and Transformation
    • Data Services
  • Publishing
  • Curation

Syndication

Execution Capabilities are comprised of the key management oriented disciplines that are required to support shared infrastructure, services or to help evolve a highly federated set of valuable assets “edges” to be more useable and valuable to the evolving community over time.

Questions

Capabilities____________________

  • How do we collectively determine what information might require a greater future investment?
  • What are the right incentives in the grant processes?
  • What are the future funding models?
  • What models should be invested in?
  • Which technologies should be evaluated for the shared assets?
  • What upcoming shared data or technology needs are in common to a large number of participants?
  • Governance,
  • IT Service Management (ITSM),
  • Product Management,
  • Performance Management,
  • Requirements Management,
  • Data Management,
  • Data Supply Management,
  • Data Life Cycle Management
  • Funding
  • Grants and processing

So, why did we develop these classes of capabilities? 

They represent, at the macro level, a way to organize a much larger group of business, operating and technical services that have been explicitly discussed in NSF EarthCube efforts over the last 3-4 years. We then been derived these outputs from analysis and associate them to the most important business drivers. Check out this “draft” relationship of capabilities drivers and rational 

 RationaleDrivers
EngageThe best way to create communities and identify common needs and objectives, begin to build trust and value awareness; bring the respective communities into an environment where they can build out their efforts and sustain collaborative approaches.Agency (how to navigate planned versus emergent change), intellectual property rights, infrastructure winners and losers, agreement on data storage, preservation, curation policies and procedures, incentives to share data and data sharing policies, and trust between data generators and data users.
EncourageThe best models to incentivize scientist’s and data producers to participate and collaborate. Xentity have developed game theory based approaches and large scale customer relationship management solutions Social and cultural challenges: Motivations and incentives, self-selected or closely-held leadership, levels of participation, types of organizations, and collaboration among domain and IT specialists)
EnableThe most costly data processing obstacles – The lowest common denominator – highest impact problem.  A common problem found in shared service environments. We have developed enterprise service analysis tools for cost benefit for the DOI geospatial community, so we have seen this work80% of scientist data needs can be expressed as standard data product, and 80 % of scientist time is spent getting data into proper form for research analysis
ExecuteA governance model that will increase the “edge effect” between the legacy and future capabilities and a very diverse set of communities. Simple planning capabilities that empower scientist to work complex cross disciplines ideas amongst themselves, define work and coordinate with the power of the crowd. We have designed collaborative environments and crowd based frameworks for data collection and analysis with corresponding performance management system.Conceptual and procedural challenges: Time (short-term funding decisions versus the long-term time-scale needed for infrastructures to grow); Scale (choices between worldwide interoperability and local optimization);

So why don’t we do it?

Well, this does introduce an outside approach into a closed knit geoscience community who is very used to solving for themselves. Having a facilitated method from outside consulting or even teaming with agency operations who have begun moving this route for their national geospatial data assets is not seen as something fits their culture. We are still learning of hybrid ways we can collaborate and help the geoscientists setup such a framework, but for now it is still a bit foreign of a concept, and while there is some awareness by the geoscientist community to adopt models that work for other sectors, industries, operational models, the lack of familiarity is causing a lot of hesitation – which goes back to the earn trust factor and finding ways to demonstrate value.

Til then, we will keep plugging away, connecting with the geoscience community in hopes that we can help them advance their infrastructure, data, and integration to improve earth science initiatives. Until then, we will remain one of the few top nations without an operational, enterprise national geoscience infrastructure.

Data Science Research Areas Punch List

Blog post
added by
Wiki Admin

In Launching this Data Science Research service area, the following are areas of data science research we are actively pursuing.

Solutions 

  • Tactical Industry and Trend Context Reports: Data Visualizations
  • Implementable Changes in Industrial Engineering Practices
  • Linking Research & Commercial Industries
  • Crowd and Commercial Effectivity
  • Place-based and Geospatial business cases and impact levels by data theme, product, and service types
  • Semantic vs. machine learning applications for integrating large Corporate or Government common datasets
  • Real-Time National or World Scale Topological Big Data Modeling and Decision Support 
  • Proving existing major corporate and government datasets, social information data quality and semantic readiness, and existing or new platforms and applications to support Smart Cities in simulation environment such as urban planning, decision making, and policy/rules-based intelligence (aka Real-world SimCity and Civilization models)
  • Remote Sensing Integration with BigData Sources and Analytics
  • New Energy Model Research & Development Repository and Social Network Enhancement
  • Information Patterns & Historical Analysis
  • Integrating Computer and Library Science Techniques
  • Blending Machine Learning and Semantic Web
  • Historical Timeline Visualizations – knowledge, technical evolution
  • Roadmap Prediction Visualizations
  • AI/Robotic Integration with Decision Making
  • Data Supply Chain models analysis in support of creating data ecosystem flow for major static and real-time datasets.
  • Impacts of Next Generation or Internet2 architectures on existing content and dataset

Management

Data Science and Architecture Management Research

  • Integrating Academic GeoScience Communities using Architecture Methods 
  • Investigating how Bill/Policy Motives align with Federal Portfolio
  • Leveraging Architecture concepts to advise and improve bills
  • Real-World Enterprise Architecture analysis
  • Federal readiness for architecture and change management maturity by agency using 
  • Performance Measurement analysis for management and budget policies
  • Reduction and impact evaluation of burden on government agencies for data calls – 
  • Value-measurement on policies and metadata
  • Strategic progression of maturing datasets (i.e. What dataset to build next and butterfly effect?)
  • Realistic blending of private sector and public sector best of breed techniques
  • Historical context analysis for current information management policy and bills for future decisions
  • Analyze policy shaping techniques (i.e. market-driven policy, policy reformation, protectionism policy, new value transition or adoption) diversity by industry.
  • Improving Product Management Subjectivity
  • Agile Project Management
  • Architecture Methodology
  • Data Supply Chain Management efficiency patterns
  • Integrating Geospatial Architectures into Industry
  • Industry Acceleration & Stabilization Evaluations
  • Gaming theory application readiness for Corporate and Government policy and increasing energy and quality output (i.e. MMORPG, Social Network, Strategy games, incentive models, talent/skill development, state of integration such as Mechanical Turk models).

Data Science Research Concepts

Blog post
added by
Wiki Admin

Our corporate goal is two-fold:

  • Analyze and assure forward-thinking concepts are applicable
  • Increase our expert staff knowledge base

We first need to assure our concepts and consultants are current, relevant to our partners, clients and industry, and forward thinking; second, allowing for us to excite, retain and increase our talent knowledge base for longer period of time than typical consulting-only firms which allows us to lower personnel costs and maintain lower overhead costs. 

We are actively working and establishing new research partnership academia (i.e. major university research hubs, local STEM programs), government (i.e. Federal Programs, State, Local), municipal services (i.e. water utilities), and engineering/scientific service companies.

In considering the types of innovative data science research that Xentity is seeking to make a transformative impact upon Data Science Solutions Research and Data Science and Architecture Management Research