Network Overlays

Basically a network overlay is a network built upon another network. It uses the underlying network as a support infrastructure without changing it and defines its own protocols to communicate between nodes.  This adds additional capabilities to the underlying network. Many peer-to-peer networks are built in this way.

The nodes in the peer-to-peer network are defined logically, change dynamically, have their own protocols for discovering nodes and transferring data and utilize internet protocols. They are overlays on the internet.

Overlays create the potential for creating new and disruptive network and service architectures. They are easily deployed on existing host machines connected to the internet. It can provide a mechanism for services to migrate across the internet, perform dynamic discovery of remote services and become adaptive to handle failures and distributed load. These capabilities could be available across any and all computers connected to the internet.

There are a handful of network overlay software packages available as open source today. They provide features beyond what the internet offers. Some of the packages include:

  • RON - to improve the availability and reliability of data packet routing
  • Chord - used to build scalable distributed peer-to-peer systems
  • Bamboo - implementation of a DHT algorithm for use in peer-to-peer architectures

If the following can be combined:

  • a good DHT algorithm to manage mappings of host across the internet which dynamically join and leave
  • software that discovers and host services at each node in the network

This would create a very powerful implementation of an architecture supporting many different kinds of dynamic services and social networks that execute not just in the isolation of data centers but across any and all computers wishing to join the network.

Cloud Computing Thoughts

Cloud computing is becoming one of the next industry buzz words. It joins the ranks of terms including: grid computing, utility computing, virtualization, clustering, etc. Cloud computing overlaps some of the concepts of distributed, grid and utility computing, however it does have its own meaning if contextually used correctly. The conceptual overlap is partly due to technology changes, usages and implementations over the years. Trends in usage of the terms from Google searches shows Cloud Computing is a relatively new term introduced in the past year. There has also been a decline in general interest of Grid, Utility and Distributed computing.  Likely they will be around in usage for quit a while to come.  But Cloud computing has become the new buzz word driven largely by marketing and service offerings from big corporate players like Google, IBM and Amazon.

* distributed computing
* grid computing
* utility computing
* cloud computing

term trends graph

The term cloud computing probably comes from (at least partly) the use of a cloud image to represent the Internet or some large networked environment. We don’t care much what’s in the cloud or what goes on there except that we depend on reliably sending data to and receiving data from it. Cloud computing is now associated with a higher level abstraction of the cloud. Instead of there being data pipes, routers and servers, there are now services. The underlying hardware and software of networking is of course still there but there are now higher level service capabilities available used to build applications. Behind the services are data and compute resources. A user of the service doesn’t necessarily care about how it is implemented, what technologies are used or how it’s managed. Only that there is access to it and has a level of reliability necessary to meet the application requirements.

In essence this is distributed computing. An application is built using the resource from multiple services potentially from multiple locations. At this point, typically you still need to know the endpoint to access the services rather than having the cloud provide you available resources. This is also know as Software as a Service. Behind the service interface is usually a grid of computers to provide the resources. The grid is typically hosted by one company and consists of a homogeneous environment of hardware and software making it easier to support and maintain. (note: my definition of a grid is different from the wikipedia definition, but homogeneous environments in data centers is typically what I have run across). Once you start paying for the services and the resources utilized, well that’s utility computing.

Cloud computing really is accessing resources and services needed to perform functions with dynamically changing needs. An application or service developer requests access from the cloud rather than a specific endpoint or named resource. What goes on in the cloud manages multiple infrastructures across multiple organizations and consists of one or more frameworks overlaid on top of the infrastructures tying them together. Frameworks provide mechanisms for:

  • self-healing
  • self monitoring
  • resource registration and discovery
  • service level agreement definitions
  • automatic reconfiguration

The cloud is a virtualization of resources that maintains and manages itself. There are of course people resources to keep hardware, operation systems and networking in proper order. But from the perspective of a user or application developer only the cloud is referenced. The Assimilator project is a framework that executes across a heterogeneous environment in a local area network providing a local cloud environment. In the works is the addition of a network overlay to start providing an infrastructure across the Internet to help achieve the goal of true cloud computing.

Assimilator 1.1 Release

I just released Assimilator 1.1 on Sourceforge. This release contains:

  • corrected cybenode.sh script to properly accept parameter designating codebase location
  • JMX interface added to Monitor, Cybernode and Webster core services
  • drag and drop of oar files onto console service graph to deploy services added
  • corrected oar deployment code to handle operational strings with Include tags allowing deployment of dependent operational strings.
  • removed old viewer and opmon utility code, documentation and build file references.
  • changed helloworld example files and documentation to consistently use steps 1-6.
  • new webster using MINA 1.1.4 (JDK 5 version) and integrated it
  • moved jini ServiceStarter class to assimilator and modified it to add loggers from top level config file (we had a really hard time get jini loggers to turn on properly before and this should fix it), quite a bit or exception refactoring and other general refactoring
  • moved all the script and config files to use the new service starter and new webster.

The new distribution file can be downloaded here.

In the works is a network overlay for Assimilator so that distributed services can be managed over the internet.

Passing Objects Around Networks

With the advent of XML has, in some cases, come the overuse of it for transferring information between endpoints in a distributed networked environment. It’s true that XML and Javascript Object Notation are more interoperable allowing multiple languages to be used for application development between client and server. It is also more human readable which mostly doesn’t matter except for simplifying debugging during development. I just don’t quite understand why the social realm of developers chose primarily to adopt XML over transferring the bytes of serializable objects without having to transform it to readable characters.

Binary is more efficient for serializable objects. There is less transformation required at endpoints before and after transferring data around. Less code and libraries are required to develop and deploy the distributed components of an application. Maintenance of the code is typically easier. And more sophistication can be applied in the passing of serializable object as method parameters, return values and exception handling information. Java RMI for example with its use of stub/skeletons provides the mechanism to invoke functions on remote objects which is truly more powerful than making http requests with XML data.

I suppose it’s likely that XML is just easier to use for now. The tools and IDE plugins were easier to develop and came faster than any tools to support building applications with remoting capabilities. Http and XML are just easier to deal with than configuring EJBs. Although EJB3 components are now much easier to deal with since the wiring is largely defined using annotations.

From the perspective of distributed computing being web servers scattered everywhere, it’s likely XML and RESTful style web services will continue to dominate. True next generation distributed computing capabilities where all computers are peers will require greater sophistication to handle mobile code, mobile services, service discovery and the transfer of data between the services. It’s not going to be HTTP and XML!

Crummy Scrum Development Process

Having been on a number of different projects attempting to use Scrum in multiple organizations, I have noticed recurrent patterns of it being applied incorrectly. The decision to use Scrum (or in some cases force its use) is well intentioned. But when the rubber meets the road and a project gets underway, typically teams find out Scrum is more difficult than originally thought. This leads to skipping parts of or trying to shortcut the methodology and process. This results is the team not reaping the benefits Scrum has to offer and gives the impression to team members that Scrum doesn’t work. Here are some of the behaviors I have observed.

Skipping the Planning Session
I have seen this happen because the stakeholders, project managers and architects/engineers can’t or don’t want to all meet up to do the planning. I have also seen this happen when the primary project driver or stakeholder refuse to do nothing more than throw a bulleted list of ‘things’ desired at the development team with an completion date.

This results is an incorrect or incomplete set of backlog items for the team to deal with. Of course, this increases risk and probability of failure.

To help alleviate this problem a surrogate should be appointed for those key stakeholders and team members that can’t or won’t participate. Also the team can rally and not start any further work until a planning session is accomplished making it clear the non-participants are the road block.

Missing the Point of the Scrum
It is very common for team members to chat way beyond necessity during the scrum meeting. People just like to chat. It’s also common for members to talk about things other than their assigned tasks. I have seen scrums complete in the alloted 15 minute time box where none of the defined task were truly discussed.

This behavior unequivocally leads to a project out of control. Probably leads to the typical chaotic behavior that was being practiced before Scrum was mandated. There are a number of causes for this happening including the Scrum Master not enforcing proper behavior probably due to lack of training and experience with Scrum. It is also caused by the tasks not being properly defined. If a developer is discussing working on things other than assigned tasks, that person is either goofing off (probably not the case) or has identified another set of tasks which need to be captured. This probably means the backlog items weren’t defined correctly (see Skipping the Planning Session) or the project development has gone way off track.

It helps to have some training or an experienced person lead or help run the Scrum process. It is difficult to learn from scratch and stumble over all the typical mistakes. It also helps to scope the sprints short adding only a few backlog items with a small number of tasks. 2 weeks max for the sprint 4-8 hours max for each task.

Task Lengths Way Too Long
Defined tasks with a length greater than 12 hours means either you don’t have a good understanding of how to break down the task or you’re padding the time you think it will take so there is no pressure in completing it. Probably the former. I have seen task length definitions of 40+ hours.

Not taking the time to plan out the tasks and not making an effort to scope them properly does not help the project and defeats the purpose of using Scrum. Part of Scrum is learning how to estimate properly and learning the capabilities of the team. This knowledge helps in future planning sessions.

Assuming Nothing Changes
Change happens on virtually all projects whether or not you use Scrum. The fallacy is that once the project plan is in place nothing will change. Most management teams put in place a set of defined product features that is fixed. It is to be completed by a fixed date with a fixed set of resources. I have never seen a plan Not waver while in progress.

Scrum and Agile methodologies are specifically designed to help address changes during development. It’s ok to set long range goals but define short sprints, spend time planning them, use feedback from the sprint results and changes that happen during the sprint in a feed back loop during planning and expect anything may change mid-flight during development. You will be able to better adjust if you have flexible development cycles.

There are other wrong Scrum practices. These are recurrent frequently. If you plan on using Scrum, take the time to learn it first and decide if it’s appropriate for your organization. If you use it, make a concerted effort to follow the methodology correctly. It will be beneficial!

http://www.mountaingoatsoftware.com/scrum
http://en.wikipedia.org/wiki/Scrum_(development)
http://www.codeproject.com/KB/architecture/scrum.aspx
http://www.danube.com/scrumworks/basic

Assimilator Introduction

Just a quick introduction to the Assimilator project that I have been working on. I will discuss it to some extent here as the project develops. Assimilator is and open source project hosted at SourceForge. It represents the next step in evolutionary development of distributed computing. It provides a platform for distributed application development which includes key capabilities required to execute distributed applications. Application development using this platform does not require development of handling the difficult aspects of distributed computing. The platform enables applications to be built as a collection of services that can be executed on a network where resources are available and may dynamically change. Service lookup and discovery, distributed events and transactions, service provisioning and deployment, and service management are all provided by the platform.

The open source environment allows free usage of the platform and encourages feedback from users on how to enhance features, extend capabilities and eradicate defects. Open source allows greater distribution of the platform hopefully leading to wider adoption of its use. Custom applications built with the platform will demonstrate platform capabilities and provide a means for individuals and organizations to generate revenue.

The primary goals of the project include:

  • Develop a distributed computing platform capable of dynamically managing and monitoring services in a network.
  • Develop the platform in and for the open source community to drive adoption of its use.
  • Develop the platform with the intent that it can be used for easy development of distributed applications.

Network includes running across the internet and on any device connected to it. The current version executes in WANs and work is underway to get it working across the internet. At which point it becomes feasible to start building social network applications and gaming environments that utilize available resources across peered devices.

NEPOMUK

NEPOMUK or Networked Environment for Personal Ontology-based Management of unified Knowledge is a project I ran across that appears to have similar goals as I have for building self organizing community of information from the perspective of the individual and their devices (or tools).  In particular it’s goals include:

  • supporting the knowledge life cycle, including articulation, organization, sharing and exchange of information;
  • the management of all relevant desired  information through linking, browsing and tagging;
  • knowledge communication across distribute social networked communities with capabilities for distributed search, storage and massive scaling.

It is an open source framework that can accommodate plug-in capabilities for contributions for those who wish to develop this kind of social networking capabilities. I have similar goals for Assimilator  which is a  platform for managing distributed services; services that can migrate, work in groups with dependencies and can be used to build knowledge management tools for social networks.

Achieving the goal of allowing individuals to manage their own information in their own context using their own personal tools will require a lightweight framework that accommodates and manages services or small components which collaborate in a dynamically changing environment  distributed across the internet. I believe NEPOMUK is on the right track. I hope I am as well.

A Different Social Networking Paradigm

Wouldn’t it be nice if you had control of all your social networking information in one location? Forget about keeping track of how many social network sites you have signed up with. Stop trying to remember which friends are using which networking site. Ignore having to cross link and make duplicate uploads. Wouldn’t it be nice to just consolidate all the information you want to share into topics and groupings the way you want to present it and only have to maintain it in one place?

It could be a location on your own personal laptop or one location hosted by your favorite ISP. It would be your collection of information, relevant to you, organized according to your preferences. Everyone already does this with their belongings. Why should I have to place my things someplace outside of my control just to share them, when I can define a link to the definitive resource?

Having your belongings scattered about across many websites is cumbersome and difficult to maintain. It’s much easier to have it generally located in the same place. I have all my stuff where I want it, you have yours where you want it.

Now I can share some of my stuff. I share only what I want to. The rest is private. Essentially each individual defines links to the information they want to share with others. Each individual presents a point of view of who they are. Nothing new here, we all do this in real life already. But remember I’m talking about information on my laptop.

Next, friends share information with each other about where their stuff is accessible. It would be a portal to their information. Now friends can see the presentation of other friends. They can see each others point of view or perspective. Each persons collection of information is still controlled locally but accessible between each other.

There will likely be common interests between friends. They could start a topic of interest. A set of information of similar interest. Essentially it is a collection of links back to the original information. Friends discover there is information about others being shared. Those friends can establish a group. They share similar interests, similar topics of interest.

So now individuals share their point of view, create topics of interest which others can contribute to and they can create, join or leave groups. Others may discover topics of interest and subsequently join groups base on similar interest. This starts to sound more like a real social network which individuals create from their perspective all from there own location, not across many different web sites. There are no accounts, no sign-up, no limitation on how information should be place, categorized or presented.

The concept of groupings can then be extended collections of groups. These would be communities. Groups would join and leave communities. Individuals would come and go as desired from the groups. It is very dynamic. The thing that always remains is the presentation of an individuals point of view and all the corresponding information that belongs to it.

All the information comes from individuals just as all the information on the internet today is posted audio, video, photos, text, etc. It is shared and discovered across peers rather than on web sites much like P2P sharing, but sharing links rather than files. Information will be tagged in multiple ways depending on its context and meaning. Hierarchical referencing should make it possible to access information of interest from multiple different points of view rather than from one perspective. Redundancy of information with multiple links of access discoverable across a distributed network approaches the mechanism required for universal memory. There are multiple entry points and multiple pathways to get to any bit of information.

Bootstrapping

Distributed computing and all its aspects continue to become more prevalent at a continually accelerating pace. In this blog I will capture some of the ideas in this area by others and myself. Also, how it is influencing and should influence the sharing of knowledge and development of a consciousness though social participation and usage of tools for sharing and increasing open computing resources.