IOTK Stats

September 30, 2009 by iotk

Updated check of where IOTK is:

  • 1,336 unit tests assure its quality
  • 14 sites are run by it
  • 5 more are in active development
  • 3 back end developers and 5 front end developers are using it
  • 4 teams have built products using it
  • 2 high availability firewall/load balancer configurations are deployed
  • The IOTK Firewall/Load Balancer is handling a maximum of over 10 Mb/sec
  • 4 platforms are currently running on it
  • It is being used for online video, real estate, and digital music
  • Facebook Connect, Sign in with Twitter, and Login with MySpaceID are integrated into IOTK

…and continuing to grow!

Testimonial

September 11, 2009 by iotk

From one of the developers using IOTK:

“New code generator is indeed kick ass, lots of saved time…”

Thank you! I put the hard work into it precisely for this reason. This developer has good perspective, too, because until now he has been developing against IOTK without the Code Generator. He can fully appreciate the decrease in development time that he’s experiencing now.

ioerr & ORA-00257

July 30, 2009 by iotk

iotk-explain-oracle-error.php is a facility that helps to provide substantive documentation for common Oracle errors that you might encounter while using IOTK. It was added to the framework because of the often abstract or unrelated messages given by Oracle’s error reporting facility oerr.

For example, if you receive an ORA-00257 error you can execute the following Oracle command:

oerr ora 257

oerr will produce the following documentation:

00257, 00000, "archiver error. Connect internal only, until freed."
// *Cause:  The archiver process received an error while trying to archive
//       a redo log.  If the problem is not resolved soon, the database
//       will stop executing transactions. The most likely cause of this
//       message is the destination device is out of space to store the
//       redo log file.
// *Action:  Check archiver trace file for a detailed description
//        of the problem. Also verify that the
//       device specified in the initialization parameter
//       ARCHIVE_LOG_DEST is set up properly for archiving.

Instead of executing oerr to get a description of the issue, you can execute IOTK’s ioerr using the same command line parameters passed to oerr:

ioerr ora 257

Will produce the following documentation:

ORA-257
-------

Oracle's (oerr ora 257) official description:
archiver error. Connect internal only, until freed.

PROBLEM DESCRIPTION
-------------------

Attempts to service requests by querying or making transactions against Oracle
produce the following error:

exception 'ut_Oracle_Error' with message '
==================================================
Array
(
     => 257
    [message] => ORA-00257: archiver error. Connect internal only, until freed.
    [offset] => 0
    [sqltext] =>
)
==================================================

As a result of this issue, Oracle is essentially useless for retrieving or
manipulating data.

RESOLUTION
----------

o Log into Oracle as sysdba by executing the following as the oracle user:

    sqlplus / as sysdba

o Execute this command at the SQL*Plus prompt:

    SQL> show parameter db_recovery

  (Note, "SQL>" is the SQL*Plus prompt.)

o You should see something like the following:

    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    db_recovery_file_dest                string      /path/to/flash_recovery_area
    db_recovery_file_dest_size           big integer 2G

o Exit SQL*Plus and execute:

    df -k

o You need to determine if the path set for "db_recovery_file_dest" (in this
  case "/path/to/flash_recovery_area") is 100% full.  If so, Oracle can no
  longer write archive logs.  If the disk is full you need to free up space.

o If the disk is not full you most likely need to increase the amount of space
  Oracle is allowed to use for archive logs, defined by "db_recovery_file_dest
  _size".  In this case, the value is set to "2G" or 2 gigabytes.  Oracle might
  need more than 2 gigs for archive logs.  You need to increase this.

o Log into Oracle as sysdba by executing the following as the oracle user:

    sqlplus / as sysdba

o Execute the following:

    SQL> alter system set db_recovery_file_dest_size = 5g;

  (In this case, I'm increasing "db_recovery_file_dest_size" from 2 gigs to
   5 gigs thus giving Oracle 3 more gigs to use for archive logs.)

o When this completes successfully you should see:

    System altered.

o Next, execute the following:

    SQL> alter system switch logfile;

o When this completes successfully you should see:

    System altered.

The system should be recovered.

In total, documentation is provided for 21 errors ranging from issues with straightforward solutions (like this one) to more complex issues with solutions requiring code changes or upgrades of Oracle.

David Montero: Winner of the Daniel Pearl Award

July 15, 2009 by iotk

My brother found out recently that he was awarded the South Asian Journalist Association’s (SAJA) Daniel Pearl Award for outstanding reporting about South Asia for his documentary entitled “State of Emergency” about the Taliban’s rise in Pakistan. It is the highest award that the SAJA gives each year. Congratulations, David! We’re all extremely proud! And in case that wasn’t enough of an honor, Dave found out that the documentary was nominated for an Emmy Award. Check it out here:

The 30th Annual Emmy Awards for News & Documentary The Nominations

ioerr & ORA-03137

July 11, 2009 by iotk

iotk-explain-oracle-error.php is a facility that helps to provide substantive documentation for common Oracle errors that you might encounter while using IOTK. It was added to the framework because of the often abstract or unrelated messages given by Oracle’s error reporting facility oerr.

For example, if you receive an ORA-03137 error you can execute the following Oracle command:

oerr ora 3137

oerr will produce the following documentation:

03137, 00000, "TTC protocol internal error : [%s] [%s] [%s] [%s] [%s] [%s] [%s] [%s]"
// *Cause:  TTC protocol internal error.
// *Action: Contact Oracle Support Services.

The right solution for this problem may not require you to call Oracle Support Services. My experience with them is that getting resolution to an issue is a rarity and I would call a number of consulting firms I’ve worked with in the past before calling them. The solution might be quite simple though: disable statement caching.

Instead of executing oerr to get a description of the issue, you can execute IOTK’s ioerr using the same command line parameters passed to oerr:

ioerr ora 3137

Will produce the following documentation:

ORA-3137
--------

Oracle's (oerr ora 3137) official description:
TTC protocol internal error : [%s] [%s] [%s] [%s] [%s] [%s] [%s] [%s]

PROBLEM DESCRIPTION
-------------------

Occasionally you get this error when you attempt to execute queries.  The error
appears both in the PHP error log and in the Oracle alert log and looks like
this:

ORA-03137: TTC protocol internal error : [12333] [10] [84] [101] [] [] [] []

RESOLUTION
----------

Find out what the value is for the PHP configuration setting:

    oci8.statement_cache_size

You find this either by looking in the php.ini file, if you know where it is,
or by executing the following command on the command line:

    php -i | grep statement_cache_size

If the value for the statement_cache_size is not 0, change it to zero and
re-execute your tests/queries.

NOTES
-----

We encountered this error using the following software packages:

    Oracle 11.1.0.6.0
    Oracle Instant Client 11.1.0.1
    PHP 5.2.4
    OCI 1.3.5
    APC 3.0.14
    openSuSE 10.3

In total, documentation is provided for 19 errors ranging from issues with straightforward solutions (like this one) to more complex issues with solutions requiring code changes or upgrades of Oracle.

I will document all of the errors that I’ve tracked and documented while building and using IOTK. I suspect a few of my solutions will be helpful to people. Here are the specific errors that I have documented:

ORA-01000
An issue caused by a memory leak in specific versions of older PHP OCI 8 drivers. Proud to say I helped provide reproducible test cases for this one for Oracle.

ORA-01001
ORA-01031
ORA-01036
ORA-12154
ORA-12528
ORA-12541
ORA-01745
ORA-01861
ORA-02429
ORA-28002

ORA-03114
If accompanied by an ORA-07445 and core dump you might need to upgrade Oracle to 11g. This was a nasty one.

ORA-03137
ORA-00054
ORA-06550
ORA-06553
ORA-00904

OCI-22303
An issue that used to arise when calling oci_new_collection() to bind user defined types to stored procedures. This is essentially another memory leak in the PHP OCI 8 driver, one for which I also provided reproducible test cases to Oracle.

Apache Process Consuming All Memory

June 20, 2009 by iotk

My team and I recently encountered a difficult to debug issue that was causing one of our servers to crash.  I’m hoping that by writing about it here I might save someone the time we spent reproducing and ultimately resolving the issue, even if it was a bit obscure.

The server in question ran just fine for days until a single Apache process would start consuming all of the available memory.  Once the server’s memory was around 60-70% utilized by this Apache process, kswapd would kick in forcing all processes to swap to disk.  Eventually the machine became so slow that we would have to either restart Apache (if we could) or reboot the machine.  The interesting thing about this issue was that Apache did not crash nor did any part of the system log an issue – no core dumps, error log entries, or PHP warnings or errors.

By chance a project manager was testing on one of my developer’s local environments when his machine exhibited the exact same behavior as the production server.  This was interesting to us because it meant the issue had nothing to do with the operating system (one was RedHat Enterprise 5 the other openSuSE 10.3) and it meant that we could reproduce the issue on a machine other than one in production.

It took us 2 days to track the issue down.  We attempted to quickly resolve it by upgrading all of the software involved in our stack but nothing worked.  But this provided us with a clue – it had to be something in the code that we had written.  Indeed it was.

The following are the versions of software involved in our configuration:

  • RedHat Enterprise 5 or openSuSE 10.3 (both 64-bit)
  • Apache 2.2.3
  • PHP 5.2.4
  • OCI 1.3.2
  • Oracle 11.1.0.7.0
  • Oracle Instant Client 11.1.0.1
  • IOTK

The issue turned out to be a query with 7 cursors, something like this:

select [columns],
       cursor (select [columns]
                 from table_a) cursor_1,
       cursor (select [columns]
                 from table_b) cursor_2,
       cursor (select [columns]
                 from table_c) cursor_3,
       cursor (select [columns]
                 from table_d) cursor_4,
       cursor (select [columns]
                 from table_e) cursor_5,
       cursor (select [columns]
                 from table_f) cursor_6,
       cursor (select [columns]
                 from table_g) cursor_7
  from parent_table
 where id = :id

The parent table had a number of 1-to-many relationships and the display often called for all of the information to be displayed on the same screen. To avoid too many round trips, this is the solution the developer came to. The query was fetching what represented the core of the product, the asset the site was selling so it was used a lot. What’s interesting is that there was no rhyme or reason to why it crashed – it would run fine for some time and then just fall over.

By simply removing this query the problem was resolved. We chose to break this query into 3 and use lazy loading in the object that represented these entities. We created 1 query that contained 2 cursors only (2 cursors in a single query has never given us a problem), another query (lazy loaded) contained 4 queries joined using “union all”, and the last query (also lazy loaded) was a single query.

I’m curious to know how a single query can go on a tear like this and consume so much memory.

David Montero: A Death in Swat

May 25, 2009 by iotk

My brother, David Montero, is a journalist who has done numerous stories for Frontline/World. A little bit about his next segment:

In “A Death in Swat,” reporter David Montero investigates the mysterious murder of Musa Khan Khel, a Pakistani journalist who covered the army’s failed campaign against the Taliban in the Swat Valley and was killed not long after a peace deal was negotiated-with speculation pointing as much to the intelligence services, as to the Taliban, for responsibility.

Musa’s reporting generated a growing sense of outrage that the war in Swat was not going well. … People are still dying, and the Taliban are growing stronger,” Montero says. “For someone you know to have been killed, to end up dead in a field, it just shows you how hard this story is getting to report.” While the investigation into Musa’s death continues, Montero finds Musa’s younger brother has taken his place, reporting the latest chapter of the army’s anti-Taliban campaign in Swat for Pakistani TV.

The original press release is here:

A SPECIAL EDITION OF FRONTLINE/WORLD ON THE GROWING TALIBAN THREAT TO THE PAKISTANI STATE

It’s airing on television Tuesday May 26 at 9/10pm EST, his story is second in the hour.

Active & Passive Code Generation

May 21, 2009 by iotk

Code generation is extremely important to me and thus an integral part of IOTK.  As a single developer on a number of products operating in a start up environment, I need to focus my attention on the real challenges of a particular product.  Nowadays I find myself spending a lot more time focused on the UI and page integration since I’ve standardized most of how I write my back end code.

I’ve standardized it so much that the IOTK Code Generator can write (by way of rough estimation) half of the code that I would normally do by hand.  This gives me the freedom to focus on what’s different or special about a particular product over having to first recreate the basics of what appears in nearly all of the products I have built to date.

In order to accomplish this I had to make the Code Generator entirely active – the code that it produces can be recreated any time (and is with each build of the product), can be generated from both metadata and the Oracle data dictionary, and can be passively extended (using object orientation) so that the generated code is never directly modified.

In the last few hours I’ve read a number of articles discussing the differences between active and passive code generation.  I didn’t come across a single article or posting that pointed out really good reasons to build a passive system and, in a small number of cases, the authors were arguing that any code generation at all was a negative.  I certainly don’t agree with that but I do agree that a passive code generator is a deficiency.

Essentially the difference between active and passive code generation is that passively generated code is modified by the programmer and thus can never be re-generated.  Actively generated code, on the other hand, is never modified by a programmer and can be recreated at will with no impacts to any part of the system that relies on it.  My programmer brain says that determining which to build or use is a pretty straightforward decision – giving a framework and a computer the ability to maintain vast amounts of code over the life of a product is a huge benefit.  That hasn’t always been the case and for many years I was writing systems that generated “stubs” and large amounts of passively generated code.  I took a different course in IOTK and I don’t regret it so I’ve walked away with 2 conclusions:

  • Absolutely use a code generator!  Let your computer and framework help you enforce its standards by writing as much of your code for you as possible.  More importantly, since we’ll spend more time maintaining a system than building it, let your computer maintain as much of the code as possible as well.
  • If you want to move fast and have a high quality, consistent product, don’t use passive code generation.

The “base” level entities (classes, functions, PL/SQL packages, unit tests, etc.) are what the Code Generator will create in IOTK.  Where a programmer would need to extend the functionality of one these entities, a class is used so the base class can literally be extended.  For example, IOTK will create a base level ut_Data_Mapper for the tables you request.  One such ut_Data_Mapper might be called ex_Base_Employee.  Perhaps you want to create a function that looks up and loads the ut_Data_Mapper by social security number.  You would create the following:

//
// +-------------+
// | ex_Employee |
// +-------------+
//

class ex_Employee
extends ex_Base_Employee
{
    static function make()
    {
        return new self();
    }

    // +----------------+
    // | Public Methods |
    // +----------------+

    final public function lookup_by_social_security_num($ssn)
    {
        $this->import(ut_Record_Set::make(
                             __METHOD__,
                             "select " . $this->get_select_cols() . "
                                from ex_employee
                               where social_security_num = :ssn")
                             ->bind(":ssn", $ssn)
                             ->fetch_row_hash()
                             ->one());

        return $this;
    }
}

In this case I’m adding something to the base level object to accomplish my requirement. I find, though, that often I’m removing things from what the Code Generator has created. In essence it is providing the rule and I’m coding the exceptions – IOTK might create an ut_Data_Mapper object with all 8 columns of a table but my form only requires 4 so I will write code to remove the 4 that aren’t used.

ex_Base_Employee is 70 lines of code (not including the unit tests generated for it, amounting to hundreds of lines of code) and it sits on top of the ut_Data_Mapper class which is hundred or thousands of lines of code. For the entire system to work as I need it to, my effort was 32 lines of code (I’m including blank lines, comments, etc.). Writing 32 lines of code and having the system manage hundreds to thousands clearly illustrates the benefit of the Code Generator.

You can download the Code Generator documentation here.

IOTK Stats

May 20, 2009 by iotk

Quick check of where IOTK is:

  • 1,227 unit tests assure its quality
  • 9 Web sites are run by it
  • 4 more are in active development
  • 3 back end developers and 3 front developers are using it
  • 3 teams located in New York, California, and Montreal have built products using it
  • 1 high availability firewall/load balancer configuration is deployed (using solid state disks)
  • 7 firewalls are managed by it
  • 2 platforms are currently live on it
  • It is being used for both online video and real estate
  • 1 hosting service is using it to manage DNS
  • The Schema Diff Tool has been executed successfully 82 times

…and growing!

Discovery & Unit Tests

May 16, 2009 by iotk

I wanted to pass along my advice for two situations I encounter pretty often that I have learned are big mistakes in project management. The first has to do with the process of discovery. Often when we’re planning our tasks for an iteration, someone will say something like, “I am not quite sure how to do that task. I’ll need some discovery time.” When the project manager asks how much they’ll need, their response is something like, “Give me 2 days.”

I’ve worked on some complex projects and some large ones but I have never seen anyone need days to figure out how to move forward with a particular task. You might need two days to learn how to do something or experiment with a potential solution but just to figure out what to do next, days is wrong. This time is usually wasted and I would imagine most people use discovery the same way I do: I perform a number of Google searches to figure out what is available to solve the task at hand or what other people have written about it.  It’s rare that I don’t find some information that leads me in a particular direction.  I might end up with 3 or 4 ideas – I’ve done the discovery by than – now I need to move forward with figuring out which of the solutions is right and what implementation is best.

I’m also a little weary of “discovery time” as something that is industry speak for spending time on unnecessary projects or unfocused work (the way “refactoring” is code for going back and doing things the right way because I cut corners the first time).  If you don’t know what to do or where to go next you need to figure that out quickly and start figuring out which direction to move in.  There is only so much planning that can be done before you need to start learning the real lessons of the project or task simply by doing what you need to do.

My rule around discovery time is this: you get 30 minutes.  After 30 minutes you need to report to the team 1 of 2 things:

  • What you found, what directions you are going to investigate, and how much time you think you need for that.
  • That you haven’t found anything yet and need another 30 minutes.

The importance of the second part is that you might be in a stand up meeting discussing someone’s third 30 minutes.  The person might clearly need help and people can make suggestions, “Did you search for this?”, “Did you ask so and so?”, “How about reading this paper or talking to this person.”  By the fifth 30 minutes you have to start to wonder about the task – can it be done?  Can it be done in a timely manner?  Is the person on the task the right person if they are struggling with finding a direction?  Perhaps the task is complex enough that it needs more resources?  Perhaps it’s altogether wrong?

I’ve used the 30 minute rule for a while now and I can’t think of a situation where a person needed more than 2 blocks of 30 minutes before they were prepared to offer a direction and an estimate for it.  The other aspect of this that I think is important is that it communicates to the person doing the discovery that they have to concentrate on finding a solution because they are going to be in front of their peers explaining what they did.  They have to be thorough because they are going to explain to others their process and if they are talking to people that understand their specific job function it’s going to be difficult to fake it.

The second situation that I have found difficult to explain to literally everyone (especially developers that have not written unit tests before) usually sounds something like this: “Well, I’m short on time so I’m going to stop unit testing so I can hit my deadline.”  Unit tests definitely can feel like “extra work” when a deadline is looming and you’ve got a lot of code to write.  But the mid-to-long term ramifications of not having unit tests are even slower development and much worse quality.

I usually get very interested in a project when someone tells me they have to cut out unit testing.  This leads me to draw a few conclusions:

  • The project is very under-staffed (this is the rare case).
  • The developers in question don’t know how to write unit tests efficiently (sometimes the case).
  • It’s crunch time so all of the slacking that was done in the beginning of the project is starting to show (the dominant case).

Interestingly enough, no developer I’ve worked with has ever debated the need for unit tests – everyone knows they are critical.  But it is very easy to abandon writing them when the pressure is on.

Non-technical people generally understand testing is good for quality. But it’s difficult to explain that unit tests also help increase speed because developers can make changes and determine if their changes work by running all of the unit tests including the new ones they’ve written. Paying clients are often the most interested in unit testing because it sounds like “extra” work for which they have to pay. Sadly, I have had a number of people ask me if I thought producing high quality software was “just over kill.” I want to address that but in a posting for another time.