Something I see often in system implementations is a lack of understanding of when and how to effectively use concurrency. Specifically, how to break a large set of data into discrete problems without making the whole system slower. Too often the solution is worse because the benefits of concurrency are not clear or the limitations of the system itself are not understood well enough. Like cars in the mountains waiting for an accident to be cleared the whole system stops while doing I/O–having multiple processes all stopping at I/O blocks doesn’t really solve the problem, it only uses more memory and doesn’t necessarily speed up the process.
Concurrency in relation to processing and storage is almost always misunderstood in databases. I once saw a process built to use 20 threads to run a massive set of data through an aggregation routine that took well over 6 hours. The change was to do the aggregation to the database layer and hold the writes until the end which allowed the entire job to be done in 10 minutes in a single process. The modern programmer really needs to understand the whole stack in order to effectively deliver a great solution, but this isn’t usually the case. The tools that developers use to accelerate their solutions should help with this.
Pollen will be designed in such a way to encourage seamless concurrency. This is partially enabled through Go‘s incredible, but not original, CSP model. Pollen will drive its data processing through channels which will be quite simple to make concurrent where necessary. The developer will still need to understand the right place to perform specific activities–like not writing to the database on every loop iteration–but it will be much easier to conceive of and implement concurrent processing.
The main concept that will protect the developers from themselves will be the separation of data processing from other kinds of processing. Data mapping will be encouraged to be an atomic activity (with other methods possible, of course) and after data mapping the I/O activities on the entire result dataset.
An Enterprise Service Bus is a methodology, not a product. There are software systems sold as ESBs but they don’t do much for you beyond providing a framework. Truly a great thing to have, but not a solution to a problem. You still have to decide what the best solution is and figure out how the product supports the solution, if it does. The overall design of the integrated solution, the services it relies on, and the services it exposes are defined and maintained separately. This documentation is critical to the success of the enterprise and is very rarely properly maintained. Most products offer some form of this but they are ill-defined and operate as separate systems which require other licensing costs, more maintenance and have their own documentation headaches.
What I would like to use is a product that first you document and then you implement and it’s all in one system. The ESB is the obvious choice for this kind of solution as it will hold the connection points for all systems in the enterprise landscape, if an ESB centric model is chosen. Even if the ESB is only a peripheral, having the entire landscape documented in a way that’s easy to understand, is comprehensive and can be used by technical and managerial types would be a huge boon. The documented endpoints can then be leveraged as starting points for development of the ESB solution which would be in the same system.
I have a vision of Pollen being capable of housing this enterprise repository. Enterprises can no longer afford–not sure how they ever could, really–to invest time and money into IT projects that languish as wreckage on the border of the enterprise. Maintenance and good governance should be supported and enforced by the software and shouldn’t be a financial burden. Of course, it also needs to be flexible enough to support different business models and types of system landscapes. An open source solution that can be bolted onto any enterprise at any point in the life-cycle would be killer.
Information overload is an often referred to “fact” of modern life at least partly brought on through the use of the Internet. People say that the amount of advertising, news, etc. is overwhelming our senses. I don’t think that’s accurate. Spend an evening in the woods; get bit by mosquitos, hear unexplained cracks in the nearby forest floor and strange noises from the trees above.
How does that feel? A fair bit more intrusive and overloading of the senses, I presume, then sitting in your comfortable living room reading this blog post (or wherever you are–maybe in the woods with mosquito bites). My point is the amount of information is not the issue. Exposing yourself to a natural environment makes this quite clear. There is an amazing amount of information that our senses are tuned to receive, comprehend and respond to.
This really isn’t about sensory input, where we have built-in filters, but what is happening once the information gets into the brain. Information overload happens when we consume too much; it’s our own fault caused by our choices in what we would like to or are expected to comprehend. What’s interesting to me is that is it after our sensory apparatus has processed the primary data on the screen, page, sign, etc. that it becomes a problem. This suggests that what we are calling information overload doesn’t even consider the fact that there is a lot more information floating around than what we perceive. The amount of data that is slipping by us without our knowledge must also be causing anxiety and must also be part of information overload, because we are aware of it happening.
We need tools to help us filter and comprehend that data before it becomes overload in the brain. This is especially true in the realm of the enterprise.
Pollen will facilitate the ability to reduce the overload through process visibility and creative means not known yet. Fundamentally, our computer systems must start being useful enough to no longer contribute to the overload but, through good design and better implementation, reduce it.
In a certain Canadian province there is a popular plant that would normally be pollinated by the wind. It just so happens that this is not desired with this plant and other methods must be used by breeders. I’ve heard there are cherry and plum orchards that are hand pollinated in China because of the lack of pollinating insects. This is a little different in the province I speak of since the seeds in said plant are not desirable unless you want to grow their offspring. Which mostly isn’t what people like. My father was a small player in this business. That was a matter of public record shortly before he died. He was also a master breeder.
The brush that is pictured in the header of this blog and on this post was made by him to pollinate–he acted as a go between for the differently sexed plants (or as sex itself, if you will; he used the more explicit word “f**k”). He would separate the boys into the “boys club,” to keep them from inadvertently fertilizing the females, and take some of their pollen when the time was right. The pollen would then be brushed onto the “girls” and a bag put around the flower in order to keep others from being fertilized too. The brush bristles are some of his hair. He had many brushes.
What’s striking to me is the amount of information that this little process entails. I mean the information of life. Think of the bits moving around from plant to plant. Think of the flow of information that is happening everywhere all the time in the natural world. Think of the forces that are used incidentally or harnessed with purpose to ensure that the right result is met both by humans and non-humans. Think of the wasted bits that do not meet any purposeful end. Is the information of the Internet not approaching this complexity?
Imagine a coming world where artificial intelligence lives in or at least on the boundaries of the Internet. What kind of data stream would we see then? Would it not resemble bees flying between trees and funny little processes trying to ensure that a desired outcome is reached? What will be the pollen brushes of tomorrow?
Software design is complicated. But, keeping things straight within a modern IT environment is next to impossible. There are methodologies to be followed, like ITIL, that really do help, but there is a lack of tools that do a satisfactory job of actually tracking the design, build and maintenance process, let alone the artifacts. I think this is failing for a number of reasons, but rather than list them I would like to talk about my personal experience.
I began as a salaried consultant 7 years ago after doing my own consulting gig the in the EAI space for the 7 years before that. An SAP implementation in 2005 was my first time as a member of a large team COTS implementation project. I was dumbstruck by the lack of tools to support collaboration. Things have gotten significantly better in general for collaboration, but not much we in consulting can leverage for our core activities beyond the latest features of SharePoint 2010 which at least make editing spreadsheets possible in a “web context.” Not sure what kind of issues that will create yet. Can’t be much worse than the norm of storing Excel workbooks on SharePoint today.
Ever since joining the company that was then purchased by my current firm I have been thinking, and I can’t stop, about how it could be better. The challenge that I keep returning to is a technical one. How can you capture information in a way that is useful, as in searchable, sortable, etc. but not restrictive? Restrictions in a tool that is to be used for tracking design, implementation and maintenance will kill its usefulness. The tool needs to be nimble.
How does this relate to an ESB? One of the main problems I have run into throughout my career is lack of knowledge or lack of storage of found or extracted knowledge about other systems (usually legacy systems, but not exclusively). Wouldn’t it be something if the full system landscape definition was right in the tool that would be leveraged to integrate such systems?
This blog is about what I am calling “Pollen” and how to get it actualized. A first step to that is to define the problem and the approach to solving it.
- Enterprise software is incredibly difficult to implement
- Business processes become brittle with use of enterprise systems
- Enterprise systems are generally difficult to connect to each other (and the literature generally lies about capabilities where systems are supposed to be integrated seamlessly)
- Vendors provide solutions but licensing costs are very high and expensive teams are still required
- Lock-in to any solution is a fact of life because of costs to change
- Provide tools for implementation
- Designer would capture a system design while guiding the design team through the process
- Business requirements and functional design feed into technical design which are all captured in a single repository
- System specifics would be captured in pluggable architecture that is easily extended
- Pollen is the platform for the design and the process
- Pollen implementation would be accelerated and augmented by design inputs
- Other systems, like SAP, Oracle EBS and Salesforce would be accelerated through custom transformations of design properties
- Pollen’s process engine would support barely repeatable or one-off processes where the users can track and change activities according to what works best and what is learned on the job
- Pollen will be able to process common data formats and deliver via common transports out of the box
- Pluggable architecture will allow vendors or implementors to augment system (augmentations will have a mechanism for being shared back to the community)
- Data mapping will be inherent and connected directly to design
- Licensing will be free, leaving the client able to spend money on the service rather than the product
- Lock-in to Pollen is just as likely as any open standards based implementation
- Goal of project is to attract a range of people interested in doing things better
- Many people interested and using the software means many people able to support and less likely that lock-in is detrimental
These are just some initial thoughts. More to come on this in further posts to the problem space category.