Xentity makes Top 50 Fastest Growing Tech Companies List

Silicon Review has selected Xentity to be included in their Top 50 Fastest Growing Tech Companies List. Silicon described its selection approach on multiple dimensions on size, markets, and penetration factors to help determine those acknowledged in the list. SR conducted a detailed interview with Xentity and published an article on Xentity : Web: How are […]

Xentity has a blast at our 2nd Culture Jam supporting Intercambio

Xentity recently supported one of the Summer Events hosted by Left Hand Brewing Co. which specifically raises money to benefit Intercambio Uniting Communities: Culture Jam 2015. The event is truly a fun summer night event bringing together many cultures, various music, and family friendly. The folks at Left Hand and Intercambio put on a great show, raising money to support an […]

How Understanding Complexity may help you change the game

In Data and IT, there are various levels and classes of complexity which our design methods need to adapt Choosing our methods for consulting in design, architecture, and strategic consulting need to understand the environments. For instance, Enterprise Architecture came of age during the heyday of MIS in the 1980s as standalone applications on mainframes […]

Delivering Open Data in Bulk on the Cloud

Blog post
added by
Wiki Admin

We just finished some work for a large National Government data provider who measures their number of files in the millions, records in the tens to hundreds of millions, and storage in sub-petabyte. Below is the obfuscated general requirements if you were to be looking to deliver your bulk data in the cloud : Storage requirements, access, methods, discovery, communications, and applications.

These requirements have been generalized or completely redacted or some cases, added to, to allow for all in Government Open Data delivery with large public datasets to consider. This is simply the business requirements, and not considering the technologies, vendors, cost models, capacity planning, etc. That was done separately.

1.    Storage – Storage supporting file form factors including

Investigate the free public data set clearinghouse areas like http://aws.amazon.com/publicdatasets/ or on Azure, etc. 

Consider  various form factors of files or services

  • Gigabyte Size Files
  • Medium Size Files, but totals more than Gigabyte Size Files
  • Many Terabyte or Gigabyte files that have been broken into medium files for transfer
  • Millions of small files usually delivered in buffered stream
  • Data-driven file delivery via services
  • Terabyte Files only deliverable via Sneakernet Import/Export

2. User Access – easy access for users to copy files to target environment

Public Read-Only Users should not be required to have to pay for access to end-solution (i.e. should not require user to have cloud account on hosted solution)

Internal Users will require access to private directories for files not or yet to be publicly released files (i.e. in response to emergencies, access to licenses data, interim work products)

Internal Users will benefit from lower-latency access than public users. Solutions such as cached volumes, integration with on-premise IT and cloud environment, and secure file transfer.

3.    Multiple Access Methods – Service, Download, Media, Cloud-to-Cloud

Users will look to have data provided in bulk one of three ways: Web Service, Bulk Media, or Cloud to Cloud

Admins should have access user traffic statistics for viewing, exporting statistics logs, and calling statistics logs via hosted applications.

User pulls a directory, set of directories, set of files or a mix via online web access via HTTP, REST, FTP, UDP, or SCP.

Learn about high-performance file transfer solutions are possible such as Edge Network publishing to move closer or supporting high-performance file transfer such as UDT (UDP-based data transfer protocol)

For faster and likely larger file requests, User requests a directory, set of directories, set of files or a mix to be put onto storage device by the service provider and the device is delivered back to user.

For faster and likely larger file requests, User requests a directory, set of directories, set of files or a mix to be put onto storage device by the service provider and the device is delivered back to user. Bulk Media minimum specifications for external hard drives

Users who have existing cloud accounts for storage or who have virtual machine processing points on the cloud, will make requests or will pull a directory, set of directories, set of files or a mix Data pushed to the users cloud point

4.    Discovery – increased visibility and discovery of staged products in catalogs and search engines

Data Products are usually downloaded via keyword, geospatial or temporal product discovery applications based on filtering their search, creating an order, and downloading the products in small group.

Public file directory listing should be discoverable and optimized for discovery by search engines

Public collections should be discoverable and optimized for discovery by search engines

Explicitly demonstrate how bulk data registrations will be discoverable and registered in both Sciencebase.gov and data.gov

Catalogs should be able to pull or push harvest public FGDC, ISO-19115, or RDF metadata of files in the directories for transaction or bulk loading into their catalog.

File Directory Listing can be queried via open-standard discovery service to assist in developing a download filter list.

The National Map can be discoverable in proposed service provider catalog, but the catalog reference needs to follow the metadata provided along with each file with at minimum presenting source, created date, updated day, title, basic description, and the provided DOI link for the file or directory.

Service Provider should be able to be support being called via a Digital Object Identifier

5.    Publishing– support batch file release updates for thousands of files monthly.

Consider if publishing and updating files within datasets incrementally, and will require service or bulk media methods to update the datasets.

Files published will be stored in original formats.

Updates are expected to be updated monthly at no more than on average 10% of files or file storage.

Updates to files should be logged to trigger notifications to subscribed users.

File updates should be able to maintain success and parity check status.

Offline File transfer should support processing of delivered storage devices with clear instructions

Online upload transfer per storage unit (i.e. per gigabytes) should not have transfer charges akin to transactional charges to bulk download area

Online upload should have high performance data transfer capabilities such as UDT (UDP-based data transfer protocol) for between on-premise data and cloud.

Moving from cloud to cloud, i.e. if moving from transactional area to public dataset hosting area, should have very high-speed transmission speeds and should consider location proximity issues.

6.    Notifications – providing ways for users to subscribe to staged product files update notifications

Users can subscribe to changes to directory, sub-directory, or specific files

Users can be notified of such changes via push notifications via such ways as per change, daily changes, RSS updates, or other notification techniques.

Users can use the notifications as ways to request the bulk file updates

7.    Download API  – Supporting applications or including applications that help the user download in bulk

Have a download API that can be controlled by api.data.gov which can uniquely identify, provide HTTP access to via GET parameter in a URL query, support an hourly limit of number of requests per hour based on API Key settings. If api.data.gov rate limit is exceeded, an HTTP status code of 503 should be returned.

3rd party applications should be able to support HTTP, REST, FTP, or SCP calls.

Software Development Kit access (java, python, .NET, PHP, etc.) access should be allowable as well.

The file download should be able to support multiple file requests, allow for parallel downloads, handle restarting partial download file requests, and governor anonymous volume requests

Peer-to-Peer solution support (i.e. such as BitTorrent) must comply with Federal Regulations.

Identify what, availability, and cost for User Training and Sanctioned or third-party consultants for Software Developers is available

8.    Applications – Support the end user experience for unzip files and load into geospatial database

If the user will received multiple zipped files that will require the user to click each link to download, unzip each file, and then load each file using the provided metadata manually into a database, can this be automated

Vendor can create premium either accelerator, increased access or additional formats are part of the delivery if branded separately as a vendor branded product and as long as there is one version that is published clearly marked as Authoritative Government as published and controlled by such in its original published form.

 

Remembering Henry Wang

Blog post
edited by
Matt Tricomi

Henry Wang

Technology Architect Senior Consultant 

With great sorrow we pass on that our friend and colleague, Henry Wang, on November 8, 2014 passed losing his battle with Leukemia. Henry impacted many of us at Xentity, teaching us values, leadership, and impact.

In 1999, he was my director and mentor working on projects for Sapient Corporation from United Airlines, Star Alliance, Navigant International and more. He helped me grow my architecture skills on communication, patterns, capacity planning, issues triage, deducing client needs into concepts, and many more. More so, he taught principles of values, ethics, and leadership.  One of the great premises Henry provided was “to get to the value of what someone needs, think one level higher than you thought you needed, and you may get it.”

We traveled weekly together back and forth between Boston and Chicago, Boston and Denver, Denver and Memphis, and beyond, for months. He provided valuable knowledge, and over time, would reciprocate the same. Henry and I nearly perished on September 11, 2001, but hours and a day before respectively, the UA 175 Boston to LA flight plans were changed a day due to pushing a meeting. After conversations with him, family, and very few others, two weeks later, Xentity was founded after discussions with many folk, including Henry’s guidance, about making impact.

Later in 2002, Henry joined Xentity’s first project in travel sector as an independent consultant himself. From that, a mutual partner saw his talent, and he became the CIO at OneTravel. We kept in close touch over the years, seeing his kids grow, seeing my family grow, discussing business but even when we caught up in Texas, Boston, or recently over gchats while he was in the hospital, our discussions on transformation, life, family, faith, leadership, and more carried on like we still were traveling together.

Those times will be missed by us, and his family.

-mt