Many disciplines today share a common problem: the storage and management of large amount of data. Cineca, by means of the Department of Supercomputing, Applications and Innovation (SCAI), has recently initiated a new program for the promotion of Big Data Science, addressing the challenges that arise when either the volume of data or its structure or the velocity at which data are collected make them difficult to be processed.

These challenges are already well known in those fields where massive data sets are acquired from instruments or from simulations: the rate at which data can be produced has far outpaced the rate at which can analyzed and new algorithms and infrastructures have become necessary. The volume and variety of scientific data have been typically associated with Astroparticle, Climatology, Environment, but the necessity of grappling with Big Data and the desirability of unlocking information hidden within it is now a key theme in all the sciences, maybe the key scientific theme across all fields (examples can be found in the spatial data, machine data analytics, text analytics, socio economical data analytics, energy efficiency, clinical data).

The specific needs of data analytics projects were addressed in the definition and build of the latest Cineca's infrastructure, PICO, just open to production. The machine will be devoted to data intensive computing as a complement of high-parallel computation. The new platform complies with the peculiar hardware requirements (large memory per node, massive storage equipment and sharing, fast data access and transfer, etc.), the software tools, and the high-throughput technologies needed by data-oriented projects, such as an accelerated visualization environment, cloud computing, hadoop etc. It also offers the perfect environment for the development of techniques and methodologies of key relevance to a Big Data problem. 

The projects which will be hosted on Pico will address critical challenges for data management, data analytics, or scientific discovery impacted by the processing of vast amount of data, either structured or unstructured. Approaches can be computational, statistical, or mathematical,

Areas of major interests follow:

  • I/O bound data analytics application dealing with either structured (i.e. database) or unstructured data, which could take advantage of the PICO’s high-end solid state storage.

  • High throughput applications based on the MapReduce paradigm.

  • Applications that require the integration of multiple data sets or observational and simulation data to glean new insights through the use of data mining and machine learning techniques

  • Cloud computing applications and services which require an elastic environment for performance scale-up

  • Legacy applications that require a dedicated, high-performance, environment to be executed.

The Cineca team will support future projects during the deployment phase of selected applications and services to guarantee the proper use of the system for the achievement of best-possible performance and to guarantee the support in the selection and application of the appropriate algorithm for data exploration and predictive modeling. 


© Copyright

  • No labels