Monday, November 17, 2014

Teiid 8.9 Final Released

The Teiid community is excited to announce yet another (nearly) time boxed release of its Data Virtualization software - version 8.9. This release took an extra 4 weeks on top of regular 12 weeks to accommodate a NEW version of the host environment - JBoss EAP 6.3.0 Alpha.

You can find the Teiid downloads at http://teiid.jboss.org/downloads/

A little over 100 issues were resolved and this is looking to be solid release. The major features include:
  • TEIID-3009 WITH project minimization - common table expressions will have their project columns minimized.
  • TEIID-3038 geoSpatial support for MongoDB translator
  • TEIID-3050 Increased Insert Performance with sources that support batching or insert with iterator.
  • TEIID-3044 Function Metadata is available through system tables and DatabaseMetaData.
  • TEIID-1910 TeiidPlatform for EclipseLink integration is now provided via the teiid-eclipselink-platform jar in maven.
  • TEIID-3119 Performance improvements in grouping and duplicate removal as well as general improvements to memory management.
  • TEIID-3156 Collation aware prevention of order by pushdown via the collationLocale translator property and the org.teiid.requireTeiidCollation system property.
  • TEIID-3042 Usage Information on views and procedures in the system table SYSADMIN.Usage.
  • OData4 Support - There is partial support for OData4 using the Apache Olingo project. The OData2 is still intact. We consider this still as experimental feature.
Thank you for continued support in the community, especially to the individuals who found issues, entered JIRAs, and provided other valuable contributions. Huge thanks to Alex K.,  Bram GadeyneCristiano NicolaiDevesh MishraGary GregoryHaifen BiHarrison GentryIvan ChanJoao ViragineJoseph CHIDIACMark AckertMark AddlemanMichael FarwellPranav KSalvatore RSanjeev GourSunil VarmaTom Arnold for all your contributions.

The development of Teiid 8.10 is already well underway, but there is plenty of time to add more features. The target release date will be February 15th, 2015. If you have been thinking about a feature for Teiid, this is a great time to engage the leads and submit your request/proposal. We really need experts out there to help with our big data story.  If you have experience with NoSQL stores, please write a translator in Teiid or improve what we already have in the build.  Check out the Open To Community issue bucket to see some existing issues that could be good to start with.  We are also looking for web smiths, as we really want update the main Teiid website - any volunteers are welcome.

If you are thinking about production support, look into Redhat Data Virtualization.  You can also download a free developer version of upcoming DV 6.1.0 version there.

Thank you. 

Steve and Ramesh.

Thursday, November 6, 2014

Teiid 8.9 CR3

It seems that CR2 was well received or at least there were no major issues with the switch to EAP 6.3 Alpha.  So now we are moving onto 8.9 CR3, which is available in the downloads and maven.  The final release should be expected within the next 10 days.

There are a couple of fixes beyond what was in CR2 and we are up to 116 total issues addressed for this release. 

Work has begun on 8.10.  Expect an 8.10 Alpha about a week after the 8.9 Final.  Now is a great time to vote for or log issues to be included in that release.

Thanks,
Steve

Tuesday, October 28, 2014

Teiid 8.9 CR2

We are happy and trepidatious about the availability of 8.9 CR2 in the downloads and maven.  This moves us closer to the final, but introduces two important and late changes:

  • The target platform is now EAP 6.3 Alpha.  This platform change better aligns us with more recent product releases and removed the need for applying the RESTEasy patch. We also have a high degree of confidence in the changes as we have already been through this effort for productizing DV 6.1. You may however need to be signed into a jboss.org account for the EAP 6.3 Alpha download - we are working with the EAP team to get that restriction removed.   
  • The semantics of the pg transport security configuration have changed. By default the socket configuration has SSL disabled.  If login security is used, then clients will be required to use a secure login - which is currently only GSS.  If SSL is fully enabled, then clients will be required to use SSL. This more closely matches the behavior of the JDBC transport.  There is a backwards compatibility option if needed - see System Properties.
Given these changes we'll expect another CR release before the final.  We apologize for the delay as we typically do want to stick as close as possible to a quarterly release.  

Thanks again for all of the community effort in logging issues.  We are up to 110 addressed for this release, so we are expecting a high quality result.  Work will begin shortly on 8.10, which means that now is the time to vote for or log issues to be included in that release.

Steve

Sunday, October 26, 2014

The 3 big problems with data and how to avoid them


The 3 big problems with data and how to avoid them

Register Now

You’re invited to attend the Beyond Big Data Webinar Series, a set of 5 Red Hat webinars. Read more about the entire series.

Webinar 1: The 3 big problems with data—and how to avoid them
November 5, 2014, 11:00 a.m. - 12:00 p.m.

No matter what your organization looks like, chances are you're wrestling with at least one of the following data challenges:
  • Data silos that are difficult to access when needed
  • Point-to-point integration that doesn't scale
  • Data sprawl leading to security and compliance risks
Join this webinar to learn how to implement a data strategy and architecture to avoid these problems.

Speakers:
Syed Rasheed, solution marketing manager, Red Hat
Syed Rasheed, solution marketing manager at Red Hat, coordinates marketing, evangelism, and consulting activities. In addition to helping customers address integration challenges, he is responsible for working with customers, partners, and industry analysts to ensure next-generation Red Hat technology meets customer requirements for building business process automation and integration solutions. Syed is an 18-year veteran of the IT industry with extensive experience in business process management systems, business intelligence, and data management technologies. His work spans several industries, including financial services, banking, and telecommunications.

Ken Johnson, director of product management, Red Hat
Ken Johnson, Red Hat director of product management, is responsible for SOA and data integration products and technologies. Prior to joining Red Hat, Ken was a senior engineering manager at MetaMatrix, Inc., a pioneer in the enterprise information integration (EII) market. He has also held technical leadership positions at Vignette Corporation, Oberon Software, and Sybase, Inc., with a focus on application integration and data management technologies.

Register Now

Wednesday, October 22, 2014

Single sign-on (SSO) with Teiid with Kerberos secured Web Service

With the latest upcoming release of Teiid 8.9, user can achieve single sign-on (SSO) in accessing any REST or SOAP based web services that are secured through kerberos.

In the previous installment, I have shown few articles on how to do Kerberos authentication through Teiid/JBoss EAP in consuming a REST web service.

In this blog I will showcase few more articles that are designed around SOAP web services. Mostly these articles are written tutorial fashion such that one can easily follow and create and as well as consume the web services in JBoss EAP and Teiid.

1) How to implement a SOAP Web Service with Kerberos authentication in JBoss EAP

2) How to implement Kerberos authentication to a SOAP Web Service using Teiid


In article (1) shows how to create sample SOAP web-service that can be secured through Kerberos

In article (2) shows how to consume web-service in Teiid that is secured through Kerberos

In article (3) show a SSO scenario, where Teiid is secured through Kerberos, and how the authenticated token can be used  to access a same web service that is secured also through Kerberos.

Next step we want to provide consuming services that are secured through SAML & OAuth. If you are familiar with implementation of these technologies in CXF and can help implement a solution, please let us know.

Thanks.

Ramesh..

Friday, October 17, 2014

Teiid Platform Sizing Guidelines and Limitations

Users/customers always ask us about the sizing of their Data Virtaulization infrastructure based on Teiid or the JDV product from Redhat. Typically this is very involved question and not a very easy one answer in plain terms. This is due to fact that it involves taking into consideration questions like:
  • What kind of sources that user is working with? Relational, file, CRM, NoSQL etc.
  • How many sources they are trying to integrate? 10, 20, 100?
  • What are the volumes of data they are working with? 10K, 100K, 1M+?
  • What are the query latency times from the sources? 
  • How you are using Teiid to implement the data integration/virtualization solution. What kind of queries that user is executing? Even small federated results may take a lot of server side processing - especially if the plan needs tweaking.
  • Is materializing being used?
  • Is query written in optimal way?
  • and so on..
Each and every one of the question affects the performance profoundly, and if you got mixture of those then it become that much more harder to give a specific configuration.

Before you start to thinking about beefing up your DV infrastructure, the first thing you want to check is:
  • Is my current infrastructure serving my current needs and future expectations?
  • What kind changes your are expecting?
  • Is there a change in type of sources  coming, like using Hadoop or using cloud based solutions?
We need to build the DV infrastructure on based on these available resources combined with mandated requirements for your usecase. Since Teiid is real time data virtualization engine, it heavily depends upon the underlying sources for data retrieval (there are caching strategies to minimize this). If Teiid is working with slow data sources, no matter much hardware you throw at it, you still going to get a slower response.  The place where the more memory and faster hardware can help DV is, when Teiid engine doing lots of aggregations, filtering, grouping and sorting as result of a user query over large sets of rows of results. That means all the above questions I raised may directly impact based on each individual query in terms of CPU and memory.

There are some limitations that Teiid engine itself has:

1.  hard limits which breaks down along several lines in terms of # of storage objects tracked, disk storage, streaming data size/row limits, etc.
  • Internal tables and result sets are limited to 2^31 rows. 
  • The buffer manager has a max addressable space of 16 terabytes - but due to fragmentation you'd expect that the max usable would be less (this is relatively easy to scale up with a larger block size when we need to).  This is the maximum amount of storage available to Teiid for all temporary lobs, internal tables, intermediate results, etc.
  • The max size of an object (batch or table page) that can be serialized by the buffer manager is 32 GB - but no one should ever get near that (the default limit is 8 MB). A batch is set or rows that are flowing through Teiid engine.
Handling a source that has tera/petabytes of data doesn't by itself impact Teiid in any way.  What matters is the processing operations that are being performed and/or how much of that data do we need to store on a temporary basis in Teiid.  With a simple forward-only query, as long as the result row count is less than 2^31, Teiid be perfectly happy to return a petabyte of data.

2. what are the soft limits for Teiid based upon the configuration such that it could impact sizing

Each batch/table page requires an in memory cache entry of approximately ~ 128 bytes - thus the total tracked max batches are limited by the heap and is also why we recommend to increase the processing batch size on larger memory or scenarios making use of large internal materializations. The actual batch/table itself is managed by buffer manager, which has layered memory buffer structure with spill over facility to disk. 

3. There are open file handle and other resource considerations (such as buffers being allocated by drivers) that are somewhat indirect from Teiid depending upon the particulars of the data source configurations that can have an impact as well.


4. Using internal materialization is based on buffer manager, it is directly dependent upon it.

5. When using XA the source access is serialized, otherwise source access happens in parallel. This can be controlled using # source threads/per user query.

Some scenarios may not be appropriate for Teiid.  Something contrived, such as 1M x 1M rows cross-join in Teiid, may not be a good fit for the vituralization layer.  But is that a real usecase where you are going to cursor over trillion rows to find what you are looking for? Is there a better targeted query? These are the kind of questions you need to be asking yourself when designing a data virtualization layer. 

Take look at query plan, command log and record the source latencies for a given query and see if your Teiid instance is performing optimally for your usecase. Is it CPU bound vs IO bound (larger source results and large source wait times). See if your submitted queries has been waiting in queue (you can check queue depth). Depending upon where you see the fallout is that is where you may need additional resources.

Our basic hardware recommendation is for smaller departmental use case is (double if you need HA or for disaster recovery) 
  • 16 Core processor
  • Minimum of 32 GB RAM
  • 100+ GB of buffer manager temp disk  (may be use of SSD based device will get better results when lot of cache miss or swapping of results)
  • Redhat Linux 6+
  • gigabit Ethernet
Then do a simple pilot with your own usecase(s) with your own data in your infrastructure with anticipated load. If you think that a DV server is totally CPU bound and queries are being delayed due to that, then you can consider adding additional cores to server or additional nodes in a cluster. Note again, to make to sure your source infrastructure is built to handle the load that DV is executing against it.

What would be really great would be sharing your hardware profiles that you selected for your Teiid environments, and techniques you used to get to the decision.

Thank you.

Ramesh & Steve.