Archive for May, 2009

David Montero: A Death in Swat

May 25, 2009

My brother, David Montero, is a journalist who has done numerous stories for Frontline/World. A little bit about his next segment:

In “A Death in Swat,” reporter David Montero investigates the mysterious murder of Musa Khan Khel, a Pakistani journalist who covered the army’s failed campaign against the Taliban in the Swat Valley and was killed not long after a peace deal was negotiated-with speculation pointing as much to the intelligence services, as to the Taliban, for responsibility.

Musa’s reporting generated a growing sense of outrage that the war in Swat was not going well. … People are still dying, and the Taliban are growing stronger,” Montero says. “For someone you know to have been killed, to end up dead in a field, it just shows you how hard this story is getting to report.” While the investigation into Musa’s death continues, Montero finds Musa’s younger brother has taken his place, reporting the latest chapter of the army’s anti-Taliban campaign in Swat for Pakistani TV.

The original press release is here:

A SPECIAL EDITION OF FRONTLINE/WORLD ON THE GROWING TALIBAN THREAT TO THE PAKISTANI STATE

It’s airing on television Tuesday May 26 at 9/10pm EST, his story is second in the hour.

Active & Passive Code Generation

May 21, 2009

Code generation is extremely important to me and thus an integral part of IOTK.  As a single developer on a number of products operating in a start up environment, I need to focus my attention on the real challenges of a particular product.  Nowadays I find myself spending a lot more time focused on the UI and page integration since I’ve standardized most of how I write my back end code.

I’ve standardized it so much that the IOTK Code Generator can write (by way of rough estimation) half of the code that I would normally do by hand.  This gives me the freedom to focus on what’s different or special about a particular product over having to first recreate the basics of what appears in nearly all of the products I have built to date.

In order to accomplish this I had to make the Code Generator entirely active – the code that it produces can be recreated any time (and is with each build of the product), can be generated from both metadata and the Oracle data dictionary, and can be passively extended (using object orientation) so that the generated code is never directly modified.

In the last few hours I’ve read a number of articles discussing the differences between active and passive code generation.  I didn’t come across a single article or posting that pointed out really good reasons to build a passive system and, in a small number of cases, the authors were arguing that any code generation at all was a negative.  I certainly don’t agree with that but I do agree that a passive code generator is a deficiency.

Essentially the difference between active and passive code generation is that passively generated code is modified by the programmer and thus can never be re-generated.  Actively generated code, on the other hand, is never modified by a programmer and can be recreated at will with no impacts to any part of the system that relies on it.  My programmer brain says that determining which to build or use is a pretty straightforward decision – giving a framework and a computer the ability to maintain vast amounts of code over the life of a product is a huge benefit.  That hasn’t always been the case and for many years I was writing systems that generated “stubs” and large amounts of passively generated code.  I took a different course in IOTK and I don’t regret it so I’ve walked away with 2 conclusions:

  • Absolutely use a code generator!  Let your computer and framework help you enforce its standards by writing as much of your code for you as possible.  More importantly, since we’ll spend more time maintaining a system than building it, let your computer maintain as much of the code as possible as well.
  • If you want to move fast and have a high quality, consistent product, don’t use passive code generation.

The “base” level entities (classes, functions, PL/SQL packages, unit tests, etc.) are what the Code Generator will create in IOTK.  Where a programmer would need to extend the functionality of one these entities, a class is used so the base class can literally be extended.  For example, IOTK will create a base level ut_Data_Mapper for the tables you request.  One such ut_Data_Mapper might be called ex_Base_Employee.  Perhaps you want to create a function that looks up and loads the ut_Data_Mapper by social security number.  You would create the following:

//
// +-------------+
// | ex_Employee |
// +-------------+
//

class ex_Employee
extends ex_Base_Employee
{
    static function make()
    {
        return new self();
    }

    // +----------------+
    // | Public Methods |
    // +----------------+

    final public function lookup_by_social_security_num($ssn)
    {
        $this->import(ut_Record_Set::make(
                             __METHOD__,
                             "select " . $this->get_select_cols() . "
                                from ex_employee
                               where social_security_num = :ssn")
                             ->bind(":ssn", $ssn)
                             ->fetch_row_hash()
                             ->one());

        return $this;
    }
}

In this case I’m adding something to the base level object to accomplish my requirement. I find, though, that often I’m removing things from what the Code Generator has created. In essence it is providing the rule and I’m coding the exceptions – IOTK might create an ut_Data_Mapper object with all 8 columns of a table but my form only requires 4 so I will write code to remove the 4 that aren’t used.

ex_Base_Employee is 70 lines of code (not including the unit tests generated for it, amounting to hundreds of lines of code) and it sits on top of the ut_Data_Mapper class which is hundred or thousands of lines of code. For the entire system to work as I need it to, my effort was 32 lines of code (I’m including blank lines, comments, etc.). Writing 32 lines of code and having the system manage hundreds to thousands clearly illustrates the benefit of the Code Generator.

You can download the Code Generator documentation here.

IOTK Stats

May 20, 2009

Quick check of where IOTK is:

  • 1,227 unit tests assure its quality
  • 9 Web sites are run by it
  • 4 more are in active development
  • 3 back end developers and 3 front developers are using it
  • 3 teams located in New York, California, and Montreal have built products using it
  • 1 high availability firewall/load balancer configuration is deployed (using solid state disks)
  • 7 firewalls are managed by it
  • 2 platforms are currently live on it
  • It is being used for both online video and real estate
  • 1 hosting service is using it to manage DNS
  • The Schema Diff Tool has been executed successfully 82 times

…and growing!

Discovery & Unit Tests

May 16, 2009

I wanted to pass along my advice for two situations I encounter pretty often that I have learned are big mistakes in project management. The first has to do with the process of discovery. Often when we’re planning our tasks for an iteration, someone will say something like, “I am not quite sure how to do that task. I’ll need some discovery time.” When the project manager asks how much they’ll need, their response is something like, “Give me 2 days.”

I’ve worked on some complex projects and some large ones but I have never seen anyone need days to figure out how to move forward with a particular task. You might need two days to learn how to do something or experiment with a potential solution but just to figure out what to do next, days is wrong. This time is usually wasted and I would imagine most people use discovery the same way I do: I perform a number of Google searches to figure out what is available to solve the task at hand or what other people have written about it.  It’s rare that I don’t find some information that leads me in a particular direction.  I might end up with 3 or 4 ideas – I’ve done the discovery by than – now I need to move forward with figuring out which of the solutions is right and what implementation is best.

I’m also a little weary of “discovery time” as something that is industry speak for spending time on unnecessary projects or unfocused work (the way “refactoring” is code for going back and doing things the right way because I cut corners the first time).  If you don’t know what to do or where to go next you need to figure that out quickly and start figuring out which direction to move in.  There is only so much planning that can be done before you need to start learning the real lessons of the project or task simply by doing what you need to do.

My rule around discovery time is this: you get 30 minutes.  After 30 minutes you need to report to the team 1 of 2 things:

  • What you found, what directions you are going to investigate, and how much time you think you need for that.
  • That you haven’t found anything yet and need another 30 minutes.

The importance of the second part is that you might be in a stand up meeting discussing someone’s third 30 minutes.  The person might clearly need help and people can make suggestions, “Did you search for this?”, “Did you ask so and so?”, “How about reading this paper or talking to this person.”  By the fifth 30 minutes you have to start to wonder about the task – can it be done?  Can it be done in a timely manner?  Is the person on the task the right person if they are struggling with finding a direction?  Perhaps the task is complex enough that it needs more resources?  Perhaps it’s altogether wrong?

I’ve used the 30 minute rule for a while now and I can’t think of a situation where a person needed more than 2 blocks of 30 minutes before they were prepared to offer a direction and an estimate for it.  The other aspect of this that I think is important is that it communicates to the person doing the discovery that they have to concentrate on finding a solution because they are going to be in front of their peers explaining what they did.  They have to be thorough because they are going to explain to others their process and if they are talking to people that understand their specific job function it’s going to be difficult to fake it.

The second situation that I have found difficult to explain to literally everyone (especially developers that have not written unit tests before) usually sounds something like this: “Well, I’m short on time so I’m going to stop unit testing so I can hit my deadline.”  Unit tests definitely can feel like “extra work” when a deadline is looming and you’ve got a lot of code to write.  But the mid-to-long term ramifications of not having unit tests are even slower development and much worse quality.

I usually get very interested in a project when someone tells me they have to cut out unit testing.  This leads me to draw a few conclusions:

  • The project is very under-staffed (this is the rare case).
  • The developers in question don’t know how to write unit tests efficiently (sometimes the case).
  • It’s crunch time so all of the slacking that was done in the beginning of the project is starting to show (the dominant case).

Interestingly enough, no developer I’ve worked with has ever debated the need for unit tests – everyone knows they are critical.  But it is very easy to abandon writing them when the pressure is on.

Non-technical people generally understand testing is good for quality. But it’s difficult to explain that unit tests also help increase speed because developers can make changes and determine if their changes work by running all of the unit tests including the new ones they’ve written. Paying clients are often the most interested in unit testing because it sounds like “extra” work for which they have to pay. Sadly, I have had a number of people ask me if I thought producing high quality software was “just over kill.” I want to address that but in a posting for another time.

ioerr & ORA-04068

May 12, 2009

iotk-explain-oracle-error.php is a facility that helps to provide substantive documentation for common Oracle errors that you might encounter while using IOTK.  It was added to the framework because of the often abstract or unrelated messages given by Oracle’s error reporting facility oerr.

For example, if you receive an ORA-04068 error you can execute the following Oracle command:

oerr ora 4068

oerr will produce the following documentation:

04068, 00000, "existing state of packages%s%s%s has been discarded"
// *Cause:  One of errors 4060 - 4067 when attempt to execute a stored
//          procedure.
// *Action: Try again after proper re-initialization of any application's
//          state.

For someone new to IOTK, Oracle, or this error it might take some time to figure out exactly what to do. But after using IOTK for awhile, the solution is quite simple: restart Apache.

Instead of executing oerr to get a description of the issue, you can execute IOTK’s ioerr using the same command line parameters passed to oerr:

ioerr ora 4068

Will produce the following documentation:

ORA-4068
--------

Oracle's (oerr ora 4068) official description:
existing state of packages%s%s%s has been discarded

PROBLEM DESCRIPTION
-------------------

This issue occurs when interacting with stored procedures through a Web
browser.  You will receive this error after hitting submit on a form.  Prior
to receiving this error you either rebuilt your application or a part of it
that included reloading PL/SQL.

This error is occurring because Oracle has determined that the cached version
of your stored procedure code no longer matches changes to the database.  In
essence the application is out of sync with the database.

Here is what Oracle will report:

    exception 'ut_Oracle_Error' with message '
    ==================================================
    Array
    (
         => 4068
        [message] => ORA-04068: existing state of packages has been discarded
    ORA-04061: existing state of package "IOTK.CO_PKG" has been invalidated
    ORA-04065: not executed, altered or dropped package "IOTK.CO_PKG"
    ORA-06508: PL/SQL: could not find program unit being called: "IOTK.CO_PKG"
    ORA-06512: at line 3
        [offset] => 0
        [sqltext] => [Full text of query that was being executed]
        [p_id] => inout
        [p_local_time] => 1241799814
        [p_name] => IOTK
    )
    ==================================================

RESOLUTION
----------

Restart your Web server and than go back to your browser and hit the reload
button.

Restarting your Web server will cause Oracle to re-cache the stored procedure
code.  Reloading the page in your browser will cause your browser to re-post
the form.

NOTES
-----

It is common to get this error after rebuilding your application or performing
a push to production.  To ensure this issue does not occur, always restart your
Web server after either of these occurrences.

In total, documentation is provided for 18 errors ranging from issues with straightforward solutions (like this one) to more complex issues with solutions requiring code changes or upgrades of Oracle.

I will document all of the errors that I’ve tracked and documented while building and using IOTK. I suspect a few of my solutions will be helpful to people. Here are the specific errors that I have documented:

ORA-01000
An issue caused by a memory leak in specific versions of older PHP OCI 8 drivers. Proud to say I helped provide reproducible test cases for this one for Oracle.

ORA-01001
ORA-01031
ORA-01036
ORA-12154
ORA-12528
ORA-12541
ORA-01745
ORA-01861
ORA-02429
ORA-28002

ORA-03114
If accompanied by an ORA-07445 and core dump you might need to upgrade Oracle to 11g. This was a nasty one.

ORA-00054
ORA-06550
ORA-06553
ORA-00904

OCI-22303
An issue that used to arise when calling oci_new_collection() to bind user defined types to stored procedures. This is essentially another memory leak in the PHP OCI 8 driver, one for which I also provided reproducible test cases to Oracle.

Oracle Schema Diff Tool

May 11, 2009

Built into IOTK is an Oracle schema difference tool. It takes a source Oracle schema and a target Oracle schema and writes the exact SQL commands necessary to modify the target so that it is identical to the source. Presently it can detect differences in:

  • Reference tables
  • Reference data
  • Tables
  • Indexes
  • Sequences
  • Functions
  • Procedures
  • Packages
  • Types
  • Columns
  • Constraints

I have found that this type of tool is invaluable for increasing the speed of development while maintaining a high level of quality and consistency.  All too often, modifications to the database of an application are tracked by a single person or each developer is required to record their changes so they can manually be applied to QA or production later.  People make mistakes or, under the pressure of a forthcoming deadline, forget to record things accurately if at all.  I advised a group that told me that their “database schema diff tool” was an Excel spreadsheet that was “checked out” and modified with DDL statements to be executed at the next release.  Their check out process was a meeting in which the person that wanted to modify the spreadsheet verified that no one else was either using it or had left it open on their computer.

A developer whom I respect highly once told me that computers are great at exactly the types of operations that make IOTK’s schema difference tool possible.  I would not have been able to imagine the amount of stress and anxiety such a tool can relieve a team of until I found that I could never build a product without one.  Thus the reason I wrote IOTK’s versions.

Before I give more details on the schema diff tool, let me explain how I typically do releases.  Everything I work on is managed by the Subversion revision control system.  The main line of my code is called “trunk” and it’s the active copy of the software against which all of the developers involved in the project are actively developing.  The team will develop a product or a set of features for a pre-determined amount of time, called a release cycle.  During the coding of the release, the trunk repository is very active with updates and additions.

At the end of the release, a copy of the code is created by what is called branching.  A branch is a free standing repository that is not affected by changes made to trunk.  Branching essentially creates a stable version of the software because ad hoc modifications to it are frozen.  The branch is than pushed to a QA environment where thorough testing is done.  In the event a defect is found trunk is modified to correct the problem and the changes are patched to the release branch.  This guarantees that the fix is present both in the active line of development (and thus future releases) and the current (soon to be production) branch.

Once QA has been completed you know that what is on the QA environment is exactly what you want in production.  Because QA has no legacy value, it can always be thrown away and rebuilt from scratch guaranteeing that how it operates is the new way in which you want production to operate.  In essence you have the environment that is currently live and the one you want to be live.  You just need to know what’s different between them.

I achieve this in two ways.  First I use rsync to determine the difference in the code files and I have rsync transmit only those differences to the production environment.  Once I run my “sync scripts” I know that the production environment contains exactly the code and logic I want it to.  However, it might not have the proper database structures to operate against.  That’s what the schema difference tool is for.  It will literally tell me exactly what SQL commands to execute to get production to look just like QA.

Below is what the schema difference tool looks like when you run it:

nobody@qa:/tmp$ iotk-schema-diff.php source/p@inst target/p@inst

// +--------------------------------------------------------------------------+
// | Executing Schema Diff                                                    |
// +--------------------------------------------------------------------------+

+ The source schema is source/p@inst.
+ The target schema is target/p@inst.
+ Upgrade scripts will be written to:
    ./schema_diff

+ Would you like to proceed? [y|n]

Pressing “y” will start the process. I have used the tool extensively on small to medium sized schemas and it will typically finish in 1-2 minutes. When it is done, the directory ./schema_diff will contain the following files:

10-start.sql       30-columns.sql      50-functions.sql   70-ref_tables.sql
15-ref_tables.sql  35-constraints.sql  55-procedures.sql  75-finish.sql
20-tables.sql      40-ref_data.sql     60-indexes.sql
25-types.sql       45-sequences.sql    65-packages.sql

You can safely execute each of these files in SQL*Plus on the target schema. You execute these files by following the numbers: execute 10-start.sql first and 75-finish.sql last. Once I have completed doing this I like to take two last steps:

  1. I add the above files to the branch for archiving.  If there are issues later on with a release I can look into whether or not the issue was caused by something executed from these files.  I can also use these files to manually roll back the database (if possible).  To date I have never had to do that.
  2. I execute the schema diff tool again to make sure that it produces no output.  This means the source and target are the same.

Building this type of tool is surprisingly easy, though time consuming, due to the Oracle Data Dictionary.  Since it is a relational model describing a database’s structure, it is easy enough to execute SQL statements against it to determine differences or to pull data back into an application to compute differences.

The time I spent on it, though, was worth it.  I can develop my application’s without being burdened with having to remember what changes I made or how I am going to push to production.  I do my thing and let the servers sort it out.

Part of my development of the tool was also building the unit testing framework around it.  I literally build two schemas as part of the unit tests and load a source with known objects and a target with some of the same objects and some different ones.  The similarities and differences are enough to test every part of the schema difference code.  The unit tests than execute the schema diff tool’s processes to ensure the resultant upgrade scripts are correct.

Current IOTK Features (May 2009)

May 1, 2009

I thought it might be interesting to give a brief overview of what features are available in IOTK right now. Since I am continuously developing more will be added and I will attempt to document new features like this from time to time.

API
The IOTK style of programming separates the “back end” PHP API from the HTML, user interface layer, etc. As a result, the data returned from your API functions can be delivered in any format. With minimal effort you can publish an API over HTTP, JSON, REST, XML, etc.

Amazon Web Services
Integration with Simple Storage Service (S3) for things like CDN delivery or file back ups.

Bin
A script repository that contains a large number of scripts for managing your code releases, for providing information about what an Oracle database is doing, and an Oracle error library that explains common Oracle errors in plain English and provides the IOTK standard fix for an issue.

Build
A structured and automated build system.

CLI
A command line interface interface for making script writing and back end job creation easier. One of its main goals is to allow for command line code and functions to be placed inside of your API so that it can be properly unit tested. This system handles things like PID locking, output logging, and providing a mechanism for logging messages and errors to a monitored database.

Code Generator
You build your data structures (SQL) and the Code Generator will handle writing most of the PHP and PL/SQL for you. This is not “stub” code – you can rerun the Code Generator as often as you want and it will not affect the code it created in the past. Furthermore, the system is entirely object oriented so you can expand the Code Generator’s functionality without having to touch what it creates.

Comments
Data structures and code for putting comments into your applications. This feature has been tested for 2 million users commenting on 1 million entities generating 6 million comments. It scales.

Contacts
Contact (address book) management including integration with a third party service for scraping almost all modern email services. The “tell a friend” feature can potentially generate a lot of leads if people use the importer to bring in their entire address book from their email account. This feature has been tested for 3 million entities (users, organizations, address books, etc.) generating 8 million contacts. It scales.

Content
A system for storing, searching, and retrieving large blocks of text content.

DataStore
A generalized file management system that has specialized support for images and video.

Dictionary
Localization support for text.

DNS
A zone file management and compilation system designed for managing a large number of domains and guaranteeing the quality and accuracy of the individual DNS entries.

Documentation
Over 55 documents made up of hundreds of slides that cover best practices, coding standards, scalability issues, product development process, release process, and technical training. In addition a full example API is provided that is written in IOTK standard code that covers the entire system. This code is heavily commented and is provided for more senior developers that just want code references.

Encryption
A simple library for handling encryption of data and armored tokens to make data exchange extremely safe.

Event Log
Typical “feed” functionality common on social sites today. It can also be used to track any type of event within the system you are building.

Examples
A full example API that implements every part of IOTK using standardized IOTK code that is heavily documented.

Form
An HTML form extraction layer that allows for the programmatic creation of forms. This systems also handles all of the tasks involved in form processing and can handle more complicated types like dates, file uploads, etc.

FW
A high availability firewall/load balancer implementation that sits on top of iptables. I am going to write more about this particular feature soon but high availability load balancers are essential for excellent up time and being able to scale requests over many servers. There is, however, another interesting reason I built this which has to do with SPAM prevention challenges that many social sites face today.

Image
A robust image manipulation and delivery system. Put in front of a content delivery network (CDN) this system can save money and time by storing a single image in the file system while being able to deliver it, on the fly, in any orientation. This system sits on top of the ImageMagick graphics library.

Location
Location and geo based data structures and functionality.

Mail Merge
The ability to associate large email lists with email message templates. The templates can be customized based on profile data from the email list.

Oracle
Full integration with Oracle. This has been heavily tested against their amazing 11.0.7.1.0 database.

PDF
A library for converting HTML documents to PDF documents.

Rate
Allows users to rate entities in your system like restaurants or other users. This feature has been tested for 2 million users rating 1 million distinct entities generating 4 million ratings. It scales.

Scheduler
A system for programmatically creating crontab entries from within your API. Eases system administration and back end job processing.

Schema Diff Tool
Compares two Oracle databases and writes the exact SQL scripts required to upgrade the target Oracle database instance so that it is functionally compatible with the source. Extremely powerful tool that makes code roll outs take a tenth of the time and raises the quality on all releases. Using IOTK there is no reason for a person to track what database changes they are making. The system is designed to allow computers to figure it out and inform you what steps to take to get your database up to the latest release.

SEO
Both an API and documentation around good search engine optimization (SEO) practices. A few of the important ones are built into the IOTK framework (like auto-generation of a sitemap.gz file).

Text Search
The ability to add keyword searching to existing tables and LOB based columns.

Tiny URL
Code for shortening URLs so that you can email links without worrying about them breaking lines. More relevant for today to create links that won’t eat up your 140 characters per Twitter status update.

Unit Test
A unit testing framework. Presently IOTK is covered by over 1,200 tests.

User Interface
A host of functionality for standardizing AJAX interactions, enforcing HTML form rendering and processing, enforcing navigation and link creation (for SEO purposes), and for providing a general request handling mechanism that is entirely SEO driven and separates URLs from their mechanism of delivery.

Version
A feature management system for ensuring backwards compatible of your API while allowing you to upgrade libraries and code.

Video
A full video manipulation and trancoding library that sits on top of ffmpeg and mencoder.

XML
An XML processing and creation library.