What did we learn today?: programming

Showing posts with label programming. Show all posts

Wednesday, February 13, 2008

Data Warehousing Dilemma

We are in the midst of a long running Business Intelligence project. The original thought behind the iniative was a solution allowing analysts to explore their data. Through a combination of historical data and predictive modelling, the solution would provide key metrics to manage their business. After several mis-starts, we discovered the wisdom of Ralph Kimball.

Kimball's Data Warehouse Toolkit was an ephifany and inspiration. Suddenly before us was the solution to performance woes. His solution? A star schema, where with the warehouse measures are retained in a single table linked with related descriptive data. All the descriptive data (dimensions) were related through the single fact table containing the measures.

Our project started simple enough. Using sample report templates from our Product Management team, we determined a grain for the warehouse and computed the appropriate measures. The grain, by the way, is the lowest level of detail needed to answer the questions asked of the data.

It soon became apparent that there was a flaw in our design. Not the design of the warehouse, per se, but the design of the system. The reports designed by our Product Management team were at the level of the grain. Running them would produce thousands of pages of detail. Nowhere were we taking advantage of the warehouse's dimensions to drill into these reports. Seeing the flaw, we tasked our Product Managers with spec'ing the entry points to their reports.

What was returned to us was a disaster. Instead of taking advantage of the warehouse or even the capabilities of the Business Intelligence technology, the PMs designed more reports. These new reports were summaries with drill paths to the detail provided earlier. They also contained an entirely new set of metrics, all calculated at a different grain.

The problem of the different grain was exocerbated by many of the new metrics. These metrics were computed with division of aggregated counts. The counts, however, were not on the fact table, instead they were distinct counts of dimension values. There is the dilemma, we needed figures that could not be pre-computed into our cubes. The metrics were computed "on-the-fly" and resulted in tremendous performance problems.

I believe the solution is simple and obvious. We need additional fact tables a different grains. The purists among my team didn't see it so clearly (they will by time we're finished). The problem is Kimble's treatse on data warehouses discourages multiple fact tables in the database schema.

Kimble oversimplifyies warehouses with the star schema. Any complex set of data will have measures that can not be summarized into a single fact table. In truth, though, multiple fact tables will be an integral part of any practical solution based on a data warehouse.

Wednesday, January 23, 2008

Are we dumbing down programming?

Her resume looked great. She was coming to us as a high end (price) developer. But as I listened to her describe an abstract class in response to my asking her about abstraction, I had to wonder; have we made it too easy to be a programmer?

It's true that abstraction is the most intangible concept of OOP. During my phone screens of candidates, though, I find myself wishing I had tipped them off. I will be asking about OOP. Go look up "object oriented programming" on wikipedia.

I generally blame our universities for this failure. OOP is largely conceptual, and should be introduced and reinforced by any Computer Science department worth their accreditation. Once in the workplace, software engineers rarely get the proper mentoring on solid coding habits.

We really can't lay blame entirely on universities and trade schools. No, much of the problem lies with the technologies we use to build applications. High on the list of offenders are Visual Basic, Visual Studio, Java, and .Net. Throw in HTML, XHTML, XML, and all the mark-up language derivatives. Then add in any of the web development tools like Cold Fusion and Flash. Of course the scripting languages, JavaScript, VBScript, and Perl virtually prevent solid coding practices.

My obsession with OOP stems from a very specific business need. I have to support ten software products with a very modest staff. The most basic way to accomplish this is by reducing the amount of code. An obvious way to reduce code is reusing code. Unfortunately I inherited a situation based on copy-and-paste code. After three years of fighting copy-and-paste habits, we still support multiple versions of code that perform the same task.

Many developers confuse using objects with OOP. Dropping a control onto a form does not constitute object oriented programming. In fact, there will be nothing reusable in the result. In addition, the automatically generated code written by the action of dropping the control is almost certainly unreadable. But then, many developers today don't even realize code is being generated.

So our tools have dumbed down programming skills. Especially for those developers who rely on the designers and tools built into their development environments (IDE). For me, I'd like to find an engineer or two who would love to create the next Visual Studio, instead of dragging controls from a toolbar like some pre-programmed automaton.

Thursday, January 10, 2008

How much detail in requirement?

I mentioned that over the course of twenty five plus years, I've tried nearly every development process you can name. In my experience, I find that projects tend to break down at the start. Yep. Failure is built in into the project during the requirements stage.

Requirements are written in varying detail. During my tenure with DoubleClick in 2000, we had initiated a project for the advertiser side of our industry. In my ten months with the firm, the document was never finished. The last time I saw it, it exceeded 300 pages. We appointed a steering committee to oversee the project and they would debate the document endlessly. And the result of the debate? A bigger document.

In short, the entire exercise was useless. In parallel to the requirements fiasco, an engineering team was already well into the project, virtually ignoring the written requirements. I have seen this practice of over doing requirements at several firms, including DoubleClick, Information Builders, and Ameritech. In most cases the resulting document is either too large to be useful, or is ignored by the developers.

Of course the reverse is often true too. In fact, the reverse is probably much more frequent. For example, my team recently received the following requirement for a component of a new product:

"Management reports – using multiple selection criteria, produce reports that track turnaround time and other metrics."

Thing about the number of questions that are opened by this simple one sentence requirement. What reports? How many? What columns, rows, sorting, totals? What selection criteria? What is turnaround time? What other metrics? In fact, this requirement tells a developer nothing. There is absolutely nothing the system architect can do except return to the business analyst and ask questions. To make matters worse, the business analyst believe he has provided a useful document.

Then there are requirements that mask the business need in technical details. Some might read something like:

"Add a column to the x table to hold a flag. Display the flag on the detail page. When transactions are received for records with the flag set to 0, ignore the transaction."

This might look like the appropriate middle ground, between the extreme examples illustrated earlier. But it's not. The business analyst has interpreted the business requirement and supplied a solution. Although many business analysts fancy themselves as system architects, I have yet to meet one who is better than the engineer. What is the requirements above? A method of disabling master records? A method to manually override batch operations?

The plain fact is it hard to write useful requirements. And a project with poor requirements is destined for delays and cost overruns. If you wish to keep your requirements meaningful, stick to the following rules of thumb:

Describe the business need (not the technical solution)
Keep it brief (it a single capability takes more than a couple pages then your too verbose)
Describe everything (if there are four reports, describe each in with their own requirement)

When you avoid the pitfalls from poor requirements, your project will be constructed on a solid foundation. The chances of success are will be far greater.

Friday, January 04, 2008

SDLC

Agile. Waterfall. CMM. Spiral. Test Driven. Blah, blah, blah. I believe I have researched a dozen software development methodology and have tried many of them in practice. Unfortunately I have yet to find one that works. One that combines rapid development with high quality and delivers projects within their targeted timeframes.

I suppose if it was easy, then there wouldn't be entire shelves at Barnes and Noble devoted to it. It it was easy, then everyone would deliver with great success. I'm becoming convinced that no single methodology is portable through development shops. And that successful SDLC (System Development Life Cycle) is a painful trial-and-error process that adopts aspects of several disciplines.

I've tried the standard waterfall. It generally has two problems. First, it is nearly impossible to completely define all the functions of a system prior to providing any code. And second it is very difficult to prevent scope creep.

In the first case, a business analyst has to be imaginative enough to define all aspects of the system. Then she must be able to accurately write these into requirements that are understandable by developers. I've never seen this done well. When overdone, the volumes of requirements become impossible of shift through, when underdone entire aspects of logic are left undocumented.

Waterfall project tend to be very long, running months or even years. This causes the inevitable panic when business analysts realize a pet feature is not included. The panic often results in scope creep, which in turn causes the project to run longer.

We've tried incremental methodologies, such as Agile too. Ok, you say, Agile isn't a methodology, but a class of methodologies. But does anyone really implement strict Extreme or Scrum or EVO?

Regardless, Agile methods have their built-in weakness too. For instance, how do you know when you are done? Or, as in the case with my current projects, resources get diverted into new critical projects, leaving others unfinished.

When it's all said and done, some hybrid method seems to work best. Concrete, but overlapping, phases are necessary for project management. Within each phase, iterative cycles with feedback have great benefit. Who knows, maybe I'll develop my own methodology, and then write a book (that no one will read).

Tuesday, April 24, 2007

What happened to OOP?

When recruiting engineers I always start with a discussion on object oriented programming. I try not to completely surprise the candidate, so I always list the four topics: abstraction, encapsulation, polymorphism, and inheritance. Then I ask the candidate to define each term, and tell me how (and why) it is used in real world situations.

I have to admit that I am amazed at the percentage of developers who do not know the fundamentals of OOP. Even developers coming right out of school struggle with this conversation. This despite the fact that most of our entry level developers are coming out of Masters of Computer Science programs.

It has caused me to wonder if OOP is coming out of favor. If this is the case, then what is replacing it? Gang of four patterns? Something else?

You can say I am an old school developer. I learned to code when the style was structured programming. There was no concept of OOP when I earned my Computer Science degree. I was introduced to OOP several years later, when building my first applications for Windows. I immediately saw the maintain beauty of maintaining a single piece of reusable code, this was a logical extension of function libraries.

OOP was a natural evolution of structured programming, and yet I was amazed at the number of my colleagues that did not make the switch. And those who didn't were relegated to mainframe jobs and maintenance of legacy system. The best engineering opportunities were given to those who were evangelists of object oriented programming.

But as Internet development took off during the first dot com boom, a couple of trends started. One was the adoption of Visual Basic, and the other was design patterns from the gang of four (Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides).

In my mind, Visual Basic is and was a horrible trend in the practice of software development. The language, especially in its' early versions, encourage poor programming practices. And Microsoft's CASE style designer tools only exacerbated the problem. Departmental developers from corporations picked up the easy to learn language, and churned out applications that were impossible to maintain. Although recent versions of the Basic language implement OOP constructs, traditional VB developers generally do not use them.

Design patterns should have been an advancement in OOP development. In practical use, however, programmers often use patterns without practicing OOP themselves. Much the same way that procedural developers use Java or .Net and still fail to implement OOP.

The problem for me, as a manager and leader of engineering teams, is finding people who write code that is easy to maintain. To a degree, I equate maintainability with re-usability because code that is reused is not rewritten. And code that is reused is tested frequently. Coders that do not understand or practice disciplined OOP will fall into the copy-and-paste trap. When this happens, many versions of similar code appear throughout the source code, creating a maintenance nightmare.

The irony here is that descriptions of object oriented programming are very common. Wikipedia, for instance has an entry for OOP that a developer could review and understand in a couple of minutes. I hope too, that Computer Science programs strive to instill these basic concepts to their students. In the meantime I continue my search for solid OOP engineers who will help evolve our products.

Thursday, April 12, 2007

Recruiting headaches; how not to get hired

I spend much of my time recruiting people for my technology teams. Currently we have openings for software engineers, QA Analysts, and a DBA. Thankfully I have corporate resources to post ads and review resumes. After candidates successfully navigate through HR, I do a short phone screen to gauge the candidate's ability to carry a conversation, and to match their knowledge with their experience (as it appears on paper).

I believe the best developers and testers have a strong academic knowledge of their profession. From this knowledge, good coding habits or solid testing methodology is learned. So my phone conversations always start with a discussion of fundamentals.

For software engineers, the conversation begins with a discussion of object oriented programming. Because I often catch candidate by surprise with this line of questioning, I always list the specific terms we will discuss. Abstraction, Encapsulation, Inheritance, and Polymorphism. I have been applying these concepts for nearly twenty years and expect every competent developer understands how they work in real world applications.

Of course some candidates can not describe OOP concepts. I usually give a lot of latitude on abstraction and encapsulation. These tend to be more conceptual than inheritance and polymorphism, which are implemented with specific language constructs. Imagine my surprise, though, when a candidate recently told me he did not know any of the terms.

When a candidate struggles with my OOP discussion, I try to give him some relief by asking him to define inheritance. Inheritance, after all, is fundamental to all modern programming and is easy to describe; simply define the word. But I was shocked when the candidate admitted that he did not know of inheritance.

This story should end right here, for at this point I politely ended my line of questioning and suggested that we did not have an appropriate "fit". The candidate, however, wasn't quite finished. He assured me that given proper requirements he could finish any project. Then he claimed that academic concepts weren't that important and if necessary he could easily look up the information he needed.

Of course he was wrong. A developer can not possibly build a class library or reusable object without understanding inheritance. In fact you can not write a meaningful Java, c++, or .Net application without inheriting an object. And you can not possibly know when to use an interface without understanding polymorphism. You can not implement objects without understanding encapsulation, and your objects will be a mess if you do not apply abstraction. Most of all, you can not hope to get hired into a position if downplay knowledge an interviewer deems important.
My quality assurance conversions cover testing methods in a similar manner as the OOP concepts. For QA, I have twelve types of tests that I expect a candidate to discuss. In the course of the discussion, I ask that the candidate describe how he applied each type of test in his experience.

I am guilty of acquiring my definitions of the tests from the web. I can't even remember where they came from, but we have a list of a few dozen QA terms. From this list I have flagged twelve to drill candidates. I always start by asking the candidate to describe Black Box Testing.

It was surprising when a recent candidate quoted the exact phrase "not based on any knowledge of internal design or code." Surprising because that phrase is an exact match of the definition on my sheet. I assumed this was a coincidence and continued.

But the next item was quoted exactly too. And the third, Unit Test was again quoted exactly, using the phrase "the most 'micro' scale of testing." Nobody talks like that. She could have at least used the word "smallest".

At this point I stopped her and asked what she was reading from. She denied reading the responses. I mentioned that I was reading my terms from a sheet that we had acquired long ago, and asked where she had gotten her material. I do not believe it is possible for anyone to have memorized those specific terms using such precise language. Again, she denied using reference material.

Generally speaking, I would not fault a person for using notes or reference material during a phone screen. But I expect they can show how an academic concept makes sense in real situations. I would also expect a person to acknowledge using reference material when it has become obvious to the interviewer. She had the opportunity to look resourceful, but instead looked like a cheat.

In both these cases, the candidate's resume looked great but they weren't a good fit for our team. These candidates simply didn't apply common sense to our conversation. Want to know how not to get hired? Simple, be unprepared, lie, or ridicule the interviewer's questions; it's guaranteed to keep you out of the position.

Tuesday, March 06, 2007

Searching for Code

Have you tried Krugle yet? Krugle is a search engine of open source code and technology pages. It is an indepensible tool to any developer. Need a special snippet of code for a specific task? Type in your search keywords and review the results.

Krugle searches code, tech pages, and projects. Code and projects return essentially the same thing; links to source code. Tech pages will find your search term in blogs and newsgroups.

I find the code search especially helpful. I have searched for grid computing, sorting, and EBCDIC; and have found useful code in each case. For me, the projects are most interesting, as I want a complete solution.

The site has a sweet Web 2.0 interface that includes plenty of Ajax. It also has allows sharing of notes. One point of contention though, I tagged several files with comments, but was unable to find them (my comments) later.

Google also has a code search capability too, but I believe it is inferior. In Google's case, the search results will highlight key words in source code files. True to Google form, its' code search is very spartan. Google does not have the Web 2.0 capabilities of Krugle, but it is fine for quick and dirty searches for very specific algorithm.

Kudos to Krugle. I suggest you add it to your bag of tricks.

Friday, February 09, 2007

Implementing Agile Development

Sometime late last year I was introduced to the concept of Agile Development. Actually, I had been formulating iterative techniques over the past 10 years. Craig Larman's book simply reinforced my thinking. That is, software is best developed in manageable increments.

The bigger problem is moving a team steeped in serial waterfall methods to interative methods. Some people simply don't get it; they don't get collaboration; they don't get accepting change; and they don't get emphasizing software over documentation. All of which is surprising because most development teams never receive adequate documentation, constantly deal with change, and usually brain storm to solve problems.

For us de-emphasizing documentation shouldn't be so bad, after all, we don't receive good requirements anyway. But strangely enough, there are developers who still believe that one big master documents is necessary for successful projects. These people are wrong. It is wasteful and expensive to attempt to write a complete specification prior to engineering the software.

I saw this first hand a DoubleClick, Inc several years back. The firm hired a team (larger than my current department) dedicated to writing specs. The documents produced by this group were huge, numbering hundreds of pages. And the details were debated ad-nausium, leading to stagnation. What was produced was documents, what wasn't produced was working software.

A greater problem for us is managing multiple projects. Our technologies are not constructed on a common code base. Therefore each project becomes its' own set of increments. Currently we have five projects under development. That's a lot of work for a staff of seven plus four consultants. With all these concurrent projects, managing increments becomes difficult. Afterall, it isn't practical to deliver an increment every week. Or is it?

We seem to do alright with collaboration, but there is room for improvement. It helped to implement daily stand-up, or scrum, meeting. The meetings are short and focus on the goals for the day. We have not attempted to implement strict pair programming, althought there is very frequent teaming on tasks.

I remain a strong proponent of agile and iterative development. Over the next couple of months we will be able to take an objective look at the results of these methods.

Wednesday, January 24, 2007

Strategies for Performance

Like many growing technology companies, we frequently wrestle with performance and scalability of our offerings. In our case, we're tied pretty tightly to Microsoft; a hold over from a time when the products were built using Access. We've since moved on to Visual Basic, c#, .Net, and SQL Server, but we are not platform agnostic.

We're also taking steps to move off a pure client-server architecture, and to an n-tier architecture delivered through a browser. Typically our applications have a small number of users who submit long running queries. There are two primary stress points for performance, loading the data repository, and running queries (reports).

We are attacking the problem across several fronts. First, we're throwing hardware at it. Second, we are upgrading the OS and database platform. And finally, we are optimizing the applications. Note that we are not considering using a server farm for the application servers. We believe the low number of hits to the web apps make scaling the application server a lower priority.

Throwing hardware at the problem is the easiest and quickest way to scale. In our case, that means moving the application server to a separate box, and purchasing more power. More memory, more speed, and more processors. We all know, however, that this type of solution simply covers up bottlenecks in the application.

We are also stepping up to SQL Server 2005. In the standard edition, which most of our customers deploy, SQL Server 2005 will use 4 CPUs and as much memory as the OS can give it. Some of our customers are CPU and memory bound when using SQL Server 2000. Stepping up to 2005 is a significant boost. SQL Server 2005 also runs on Windows 2003 64 bit.

The 64 bit OS appears, in our sample testing, to give a huge performance boost. Unfortunately, it also gives us problems with some of our applications. Most significantly, we have not successfully deployed .Net 1.1 on the OS. Therefore, all our web applications must be migrated to Visual Studio 2005. We found that moving our web applications from VS 2003 to VS 2005 required some work. We are still trying to work through problems with deployment projects for these applications. Our Visual Basic legacy clients flat out do not run in the Terminal Services environment.

Finally, we are confronting the code in the applications themselves. The products have evolved from a client-server architecture. The software requires an active user that performs key functions synchronously. We will move file operations and reporting to asynchronous classes. This frees up the UI and gives the user a responsive experience. But asynchronous does not make queries run faster. Improving query performance requires reviews of the execution plan, indexes, and index views.

It is tedious work, but it will payoff in greater revenue.

What did we learn today?