Wanted: New architectures for IoT and Augmented Reality

Software technology changes rapidly. Many new tools and techniques arrive in the software engineering realm to take on old and new problems. There are still big architecture and implementation holes yet to be addressed. For example, as a few billion more smart phones, tablets and internet connected sensing devices come online across the world, how are they all going to discover and utilize all the available resources collaboratively?

One of the current problems with most existing architectures is data gets routed through central servers in a data center somewhere.  Typically software systems are still built using client/server architectures. Even if an application is using multiple remote sources for data, it’s still really just a slight variation. Service and data lookups are done using a statically defined address rather than through discovery. Even remote sensing and home automation devices barely collaborate locally and require a local router to communicate with a remote server in a data center.

In the past month, I have been to both the Internet of Things World and Augmented World Expo (AWE). At both of these excellent conferences, there was at least some discussion about the need for a better infrastructure

  • to connect devices in a way to make them more useful through collaboration of resources and
  • to connect devices to provide capabilities to share experiences in real time.

But it was just talk about the need. No one yet is demonstrating any functional capabilities in this manner.

On a side note: I saw only one person, out of 3000 or so, at the AWE conference wearing a device for augmentation. It was Steve Mann who is considered the father of wearables. I dare say that most proponents of the technology are not ready to exhibit it nor is the infrastructure around to support its use effectively. There is great work progressing though.

Peer-to-peer architectures used in file sharing and the architecture Skype uses start providing directional guidance for what is to come in truly distributed architectures. Enhancing these architectures to include dynamic discovery of software and hardware resources and orchestrating dynamic resource utilization is still needed.

There are a few efforts in development beginning to address some of the internet-wide distributed computing platforms needed for data sharing, augmented reality and machine learning. Those of you thinking of batch jobs or wiring up business services as distributed computing, this is not what I’m talking about. I am talking about a small footprint software stack able to execute on many different hardware devices with the ability for those devices to communicate directly with each other.

If you know about development efforts in this vain, I would like to hear about them.

Advertisements

The Importance of Distributed Systems Development

In an age of ever-increasing information collection and the need to evaluate it, building systems which utilize the yet untapped and available compute resources in everyone’s home and hands should be driving the development of more sophisticated distributed computing systems. Today, large data processing facilities provide significant compute capabilities. Utilizing the worldwide plethora of distributed resources in a coherent way is much more powerful.

Distributed programming and processing tools and techniques are currently a reality but are in their infancy. Potential rapid growth of distributed systems is already supported by:

  • Storage, bandwidth and CPUs staying on course to becoming nearly free. (Free. The Future of a Radical Price)
  • The number of people and devices connected to the internet continually grows.
  • Data storage requirements increase as data accumulation from all sources grows as does the number of sources.

It is becoming more  common to see Terabyte storage devices in homes. Desktop and laptop appliances have become somewhat of a commodity affordable to your average consumer. You can stake out claim to a table for a period of time at your local coffee shop and access the internet for free. Becoming a social citizen on the internet with portable compute resources was once cost prohibitive and is now plummeting to a price affordable to a significant portion of the population.

Distribution of affordable, cheap and free compute devices to the general public continues to grow. Most of the resources sit idle much of the time. Game consoles, cell phones, tablets, laptops, desktops, etc. can now all participate in the storage and processing of data.

Like it or not, the ability to capture and share data is becoming increasingly easy. You can watch your favorite gorilla in the jungle or use collective intelligence to extract social and individual’s patterns with service APIs provided by large corporations like Google and Amazon. Today’s transient data coming from sources in real time will eventually be stored. Much of the data is and will be captured and stored in perpetuity within corporations and in data centers. Some of what should be available may be accessible through the gates of these data centers.

The approach to managing and controlling processing remains focused on huge data centers. In this sense, social and engineering thought is still akin to 19th-century practice of building monolithic systems with centralized control. As data generation increases and the cost of storage decreases, huge data centers are being built to house and process data. Google, Apple, Codera and NTT America to name just a few. What will they do with all this data and how much will be shared?

IBM announced its plans to build a petaflop machine for the SKA telescope program. It is a laudable and beneficial effort. Undoubtedly, research and lessons learned from the effort will be valuable. But efforts should be made to build distributed systems of equal or greater benefit. Efforts such as BOINC provide a rudimentary effective start. File sharing peers using DHT have already demonstrated power and influence. Both illustrate the cost-effective usage of existing distributed compute resources where most data is accessible to everyone.

Distributed Computing is in its infancy (I’m not referring to Cloud Computing). A number of technologies supporting distributed computing have been developed. Some have survived and some waned. A sophisticated distributed system is on par with the importance of nanotechnologies and artificial intelligence. It will support those other technologies as well. It has the potential to distribute energy needs for processing rather than requiring a power plant dedicated to running a data center. It has the potential to distribute data storage so it’s never lost and provides a means for individuals to control their own personal information. It has the potential to provide mechanisms which capture data in real time and process as needed where needed with the most efficient usage of resources. In so doing, mirroring the real world (ala Gelernter’s Mirror World).

So although building data center citadels and powerful HPC computers is valuable, so is developing and building sophisticated distributed computing systems. In fact, it’s likely much more important.

Reblog this post [with Zemanta]

TED 2009 and Distributed Computing

The 2009 TED conference was this week. This is its 25th year although the first time I have heard of it thanks to Twitter mostly and those twittering the experience. I didn’t attend but monitored the activities and sessions some. It is a gathering and sharing of great minds, their visions, aspirations and creations in both science and art.  I hope to be able to attend in person at some point.

With all the talk and demos about technological advances and the need to capture, mine and process the vast amounts of electronic data produced, I’m surprised there was no mention of harnessing the compute power available in phones, desktops, clouds, supercomputers, and all devices everywhere. Also, how that might be done. Maybe I missed it (since I wasn’t there) or maybe it wasn’t the proper forum for that kind of discussion, but it seems to me that it was glaringly missing.

If anyone knows about such discussions taking place at the conference in sessions or even breakout groups, I am interested in finding out about them.

Moving Away from a Statically Defined Distributed Environment

Most distributed computing environments consist of a set of known devices in well known configurations. The devices are usually cataloged somewhere in a database by a person. If the state of the environment changes by a server being removed, replaced or taken offline, the database must be updated. This kind of environment I refer to as Statically Defined and inflexible.

This is a feasible way of accounting for a small number of machines in a network. However as data centers grow in dimensions to 100s or thousands of servers and as all computing devices begin to randomly come and go, keeping track of changes must become dynamic.

Assumptions in software architecture are typically made with the expectation that there will always be access to a desired server at a known location. To handle a worst case scenario it’s typical to configure clusters of servers  in case one fails and the software architect can ignore failures.  This only marginally protects against failures. More realistic designs must account for the fact that most anything can fail and any given moment. The architects philosophy should be that all things will fail and all failures should be handled.

Consider an architecture that represents a system that is hard to shut down rather than one representing handling a few failure scenarios. One such architecture represents the peer-to-peer (p2p) file sharing systems executing across the internet. From the perspective of any client, the system is always running and available. As long as the client has access to the internet, accessing shared files is almost always possible.

Core to p2p architecture is a network overlay using distributed hash table algorithms to manage mappings of hosts across the internet which dynamically join and leave.  Add to this

  • a mechanism to determine the attributes of the server such as hardware, OS, storage capacity, etc.,
  • software deployment and installation capabilities  at each host,
  • an algorithm to match services to a host that is best suited for executing the service
  • monitoring capabilities to insure services are executing to defined SLAs

Then you have an architecture that dynamically scales and maintains itself. Assimilator is one of the few systems that is capable of doing this today.

Some Cloud Computing vendors claim massive scaling capabilities. This of course assumes the vendor has many thousands of server and that clients have statically defined usage of servers in advance. True massive scaling will come with resources that are allocated automatically and managed dynamically without human intervention.

Reblog this post [with Zemanta]

Corporate Lock-in vs. Open Clouds

Lately there has been publicity about how major corporate Cloud Computing offerings are really just a play to lock you into vendor specific solutions while they collect information about you and your customers.

Richard Stallman says cloud computing is ‘stupidity’ that ultimately will result in vendor lock-in and escalating costs.

Oracle’s Larry Ellison says cloud computing is basically more of the same of what we already do. I think he is saying they will continue business as usual and jump on the band wagon and use the term.

Tim Bray blogged about cloud computing vendor lock-in being defined as  “deploying my app on Vendor X’s platform, there have to be other vendors Y and Z such that I can pull my app and its data off X and it’ll all run with minimal tweaks on either Y or Z.

Even Steve Ballmer seems to be anti cloud computing, citing that consumers don’t want it. I’m not sure I understand his argument other than essentially saying it requires some proprietary software to run in the context of someones cloud.

Tim O’Reilly wrote an excellent blog on Open Source and Cloud Computing. He provides this bit of laudable advice. “if you care about open source for the cloud, build on services that are designed to be federated rather than centralized. Architecture trumps licensing any time.”

While some of the paranoia about being lock-in and vulnerable to a corporation is warranted, there is also an undercurrent of revulsion to its marketing. This stems from the fact that the term ‘cloud computing’ has already achieved a high silliness factor in its use to brand everything (à la 2.0). Also, this computing model is not yet sorted out and should evolve into something better with input and guidance from those who are technology savvy.

A well constructed architecture for a distributed execution platform will provide a truly open and scalable solution for clouds and distributed computing in general. By Distributed Execution Platform I mean primarily a platform which can among other things:

  • dynamically discover resources on a network
  • enable dynamic software provisioning of software services where execution is most efficient
  • manage services as needs dynamically change
  • detect failures and automatically reconfigure itself to accommodate

A non-proprietary platform with these types of capabilities must be architected to execute in data centers and across individual computers connected to the internet as well as out to the edge. UIs access the remote services and data independent of where they may be running much like browsers accessing web sites. Distributed data is accessible across peers in a p2p model and accessible to all services. Data is accessed using common apis and through use of proprietary interfaces used by collections of collaborating services.

The payoff for using services executing in a open distributed execution platform will be for

  • small companies needing to exist on strict budgets,
  • individual developers looking to create the next killer application and
  • large corporations who run virtualized services (for free or fee paid) in their own data centers

The above characteristics enable large corporations and individuals to compete basically on the same playing field.

Note: I was hoping to coin the phrase ‘Distributed Execution Platform’.  But, it has been used periodically elsewhere. Most commonly at the moment by the Dryad project. Maybe it will become the next overused buzzword.

Open Source and Cloud Computing

These are some comments I have on Tim O’Reilly’s insightful post about open source and cloud computing.

There are interesting thoughts in the post about clouds becoming monolithic and how control of data by a few privileged companies will drive the development of services which access and manipulate our information.  The entry into the market of smaller organizations with new and better ideas  becomes more difficult. This is all probably true. One of the main contributing factors to this happening is that we all let it happen. Most people are not technical and are primarily concerned with an application performing some function adequately for  their needs. If it happens to be a service built and hosted by a monopoly, most people don’t care. At least not until they grow weary of the application, perceive their may be better alternatives and then want them. So the evolution of monopolies with monolithic systems arises from organizations pushing their services for profit (which is fine) and the majority of service users only being concerned with their own satisfaction. Open source and open apis and standards don’t solve this problem.

Open source does make it easier for those who are technically savvy to build new software systems and services. It doesn’t solve the issue of being able to easily publish services for wide usage. It doesn’t solve the problem of having access to network, server and storage resources which the services may need to use. If you have choices of services that represent these resources, you begin to solve the problem. If these services can be discovered dynamically rather than referenced as static locations it begins to provide an even better solution. The process becomes:

  1. I decide I want a type of service (maybe storage)
  2. I lookup what my choices might be
  3. I discover which ones are available
  4. and select one.

Standards don’t necessarily help out either. Many of the existing protocols are sufficient for communication and data transfer. Standard APIs satisfy groups of service providers that may share resources and software. But if everyone uses the same standard doesn’t it become monolithic and antiquated as it no longer provides the needs of  and access to newly emerging technologies? Having multiple standards and options is usually a better alternative. I wish I could credit the original author of this quote that has been around for at least 15 years; “The great thing about standards is that there are so many to choose from.”

So the answer to keeping monolithic organizations from squeezing out small companies’ new ideas is not through the use of open source and standards (although open source is beneficial).  The answer lies in creating a platform which executes on compute resources within the internet allowing among other things:

  • a way to look up desired services
  • identify if they have the desired capabilities
  • discover where these services maybe available
  • select the desired services for use

By services I mean, software that represents a set of capabilities implemented as

  • a software component
  • as a component in conjunction with other components or services
  • or as a software component utilizing hardware resources such as CPU, Storage, network bandwidth
  • the services are dynamically hosted where they may run most efficiently

The answer lies in allowing everyone the opportunity to create and publish services for use on a platform accessible by everyone. Think of it as a layer on top of the existing internet. It is a network overlay within which everyone has access to services. There would be a large collection to choose from with an always changing selection.  This is analogous to selection of services we choose in our everyday lives for food, auto repair, home services, etc. The answer doesn’t lie in enforcing open source and standards, the answer lies in creating an open execution platform enabling all to create and provide services.

World Community Grid

Tackling the worlds problems with technology. At least that’s the goal of the World Community Grid. The organization uses BOINC as the infrastructure to run software on your computer using idle CPU time for processing power. Remember running SETI@Home on your computers to process of chunks of data looking for radio signals? Same technology. So there is not much new here but BOINC is still going strong and the World Community Grid is a worthy cause and a simple model to help utilize idle compute resources in a worldwide grid. Among the list of active projects, their latest project, launched in May 2008, aims to determine the structure of proteins in rice. With this knowledge better strains of rice can be developed for higher yield crops, stronger resistance to disease and pests and contain higher nutritional value. Hopefully working towards relieving some of the pains of world hunger.