The Importance of Distributed Systems Development

In an age of ever-increasing information collection and the need to evaluate it, building systems which utilize the yet untapped and available compute resources in everyone’s home and hands should be driving the development of more sophisticated distributed computing systems. Today, large data processing facilities provide significant compute capabilities. Utilizing the worldwide plethora of distributed resources in a coherent way is much more powerful.

Distributed programming and processing tools and techniques are currently a reality but are in their infancy. Potential rapid growth of distributed systems is already supported by:

  • Storage, bandwidth and CPUs staying on course to becoming nearly free. (Free. The Future of a Radical Price)
  • The number of people and devices connected to the internet continually grows.
  • Data storage requirements increase as data accumulation from all sources grows as does the number of sources.

It is becoming more  common to see Terabyte storage devices in homes. Desktop and laptop appliances have become somewhat of a commodity affordable to your average consumer. You can stake out claim to a table for a period of time at your local coffee shop and access the internet for free. Becoming a social citizen on the internet with portable compute resources was once cost prohibitive and is now plummeting to a price affordable to a significant portion of the population.

Distribution of affordable, cheap and free compute devices to the general public continues to grow. Most of the resources sit idle much of the time. Game consoles, cell phones, tablets, laptops, desktops, etc. can now all participate in the storage and processing of data.

Like it or not, the ability to capture and share data is becoming increasingly easy. You can watch your favorite gorilla in the jungle or use collective intelligence to extract social and individual’s patterns with service APIs provided by large corporations like Google and Amazon. Today’s transient data coming from sources in real time will eventually be stored. Much of the data is and will be captured and stored in perpetuity within corporations and in data centers. Some of what should be available may be accessible through the gates of these data centers.

The approach to managing and controlling processing remains focused on huge data centers. In this sense, social and engineering thought is still akin to 19th-century practice of building monolithic systems with centralized control. As data generation increases and the cost of storage decreases, huge data centers are being built to house and process data. Google, Apple, Codera and NTT America to name just a few. What will they do with all this data and how much will be shared?

IBM announced its plans to build a petaflop machine for the SKA telescope program. It is a laudable and beneficial effort. Undoubtedly, research and lessons learned from the effort will be valuable. But efforts should be made to build distributed systems of equal or greater benefit. Efforts such as BOINC provide a rudimentary effective start. File sharing peers using DHT have already demonstrated power and influence. Both illustrate the cost-effective usage of existing distributed compute resources where most data is accessible to everyone.

Distributed Computing is in its infancy (I’m not referring to Cloud Computing). A number of technologies supporting distributed computing have been developed. Some have survived and some waned. A sophisticated distributed system is on par with the importance of nanotechnologies and artificial intelligence. It will support those other technologies as well. It has the potential to distribute energy needs for processing rather than requiring a power plant dedicated to running a data center. It has the potential to distribute data storage so it’s never lost and provides a means for individuals to control their own personal information. It has the potential to provide mechanisms which capture data in real time and process as needed where needed with the most efficient usage of resources. In so doing, mirroring the real world (ala Gelernter’s Mirror World).

So although building data center citadels and powerful HPC computers is valuable, so is developing and building sophisticated distributed computing systems. In fact, it’s likely much more important.

Reblog this post [with Zemanta]