IBM Insight 2014: The PureData System for Analytics Session

By Joe Clabby, Clabby Analytics

There is something very interesting happening in the computing industry right now with respect to system designs. In days gone by, the industry used to rely on microprocessor fabrication (die shrinkage) to add more processing power at the chip level to increase system performance. But microprocessor designers are reaching the end of the road in how small they can shrink a chip before nuclear fission occurs (they are now dealing with silicon chip designs at the atomic level to shrink microprocessors). And, as a result, performance improvements need to be found elsewhere in computing system design – more specifically in improving system bus speeds (input/output); in memory design and deployment (main memory, Flash cache memory); and in the use of accelerators (specialized system designs that may use multiple different types of processors to speed the processing of certain elements of a given workload).

IBM’s PureData System for Analytics (evolved from IBM’s “Netezza” acquisition) fits into this class of “accelerator systems”. This environment uses the latest/greatest Intel processors for data processing as well as field programmable gate arrays (FPGAs) to speed data transfer and handling – resulting in a system design that can process complex analytics workloads exponentially faster than traditional systems designs that rely on only one type of processor. It also exploits over 200 in-database analytics functions to speed the processing of highly parallel analytics workloads; and offers a built-in predictive model markup language (PMML) that enables fast prediction scoring within a given database (because prediction scoring can be done in-line within the data that resides on the PureData System for Analytics). This in-line processing enables near real-time performance results to be achieved.

And, at IBM Insight 2014, IBM showed how its PureData System for Analytics has gotten even better. In the new N3001, processors have gotten faster and there are more cores; security has been greatly improved; and packaging has been changed to accommodate businesses of all sizes. As a result, this new system can process even larger volumes of “Big Data”; and it offers stronger data protection for data that enters the system (using self-encrypting disk drives that don’t degrade performance). Software enhancements speed up data load rates – and improvements have been made to managing performance (enhancements to its performance portal).

Having access to a faster system that can handle even larger volumes of Big Data is nice, but packaging this system in a more modular design should also have a positive impact on sales as it makes IBM PureData System for Analytics more affordable to small and mid-sized businesses. The new N3001 family is a rack mountable appliance that offers the same processing speed of larger models – and because it is an appliance, it can be easily deployed. The new rack mount design allows businesses to start small and expand to increasingly larger configurations (an 8-rack configuration is available that allows enterprises to work with over a petabyte of data using a single system).

Security improvements are also important. To Clabby Analytics, security has become a white hot topic this year. Accordingly, we have written two reports on security in 2014: the first on counter fraud found here, and the second on advanced data security found here. In these reports we have written about security at rest (where data must be protected within a systems environment using authorization and authentication techniques as well as other approaches); as well as about security for data-on-the-fly (data that is transferred between systems). To protect data-at-rest and data-on-the-fly, the N3001 uses industry standard Kerberos authentication protocols – and new, self-encrypting disk technologies that have no impact on system performance (this is important because the system is encrypting very large amounts of data and using other approaches on other hardware has the potential to slow overall systems performance).

It should not be overlooked that there are several types of analytic workloads including ad hoc reporting, complex query (deep analytics) processing, on-line analytical processing (OLAP), operational analytics, predictive analytics and more. What IBM does in its PureData designs is it prebuilds its systems with the right hardware and software to tackle each analytics workload in an optimal fashion. PureData System for Analytics is particularly good at processing complex parallel analytics workloads – whereas its brother system, PureData System for Operational Analytics has been built and optimized to process operational analytics workloads. Both architectures have been designed to process their respective analytics workloads faster than traditional systems designs. And I expect that, over time, IBM will introduce even more system designs that are optimized for specific analytics workload processing.

Leave a Reply

Your email address will not be published. Required fields are marked *