Most distributed computing environments consist of a set of known devices in well known configurations. The devices are usually cataloged somewhere in a database by a person. If the state of the environment changes by a server being removed, replaced or taken offline, the database must be updated. This kind of environment I refer to as Statically Defined and inflexible.
This is a feasible way of accounting for a small number of machines in a network. However as data centers grow in dimensions to 100s or thousands of servers and as all computing devices begin to randomly come and go, keeping track of changes must become dynamic.
Assumptions in software architecture are typically made with the expectation that there will always be access to a desired server at a known location. To handle a worst case scenario it’s typical to configure clusters of servers in case one fails and the software architect can ignore failures. This only marginally protects against failures. More realistic designs must account for the fact that most anything can fail and any given moment. The architects philosophy should be that all things will fail and all failures should be handled.
Consider an architecture that represents a system that is hard to shut down rather than one representing handling a few failure scenarios. One such architecture represents the peer-to-peer (p2p) file sharing systems executing across the internet. From the perspective of any client, the system is always running and available. As long as the client has access to the internet, accessing shared files is almost always possible.
Core to p2p architecture is a network overlay using distributed hash table algorithms to manage mappings of hosts across the internet which dynamically join and leave. Add to this
- a mechanism to determine the attributes of the server such as hardware, OS, storage capacity, etc.,
- software deployment and installation capabilities at each host,
- an algorithm to match services to a host that is best suited for executing the service
- monitoring capabilities to insure services are executing to defined SLAs
Then you have an architecture that dynamically scales and maintains itself. Assimilator is one of the few systems that is capable of doing this today.
Some Cloud Computing vendors claim massive scaling capabilities. This of course assumes the vendor has many thousands of server and that clients have statically defined usage of servers in advance. True massive scaling will come with resources that are allocated automatically and managed dynamically without human intervention.