Archive for the ‘IOTK’ Category

IOTK Stats

September 30, 2009

Updated check of where IOTK is:

  • 1,336 unit tests assure its quality
  • 14 sites are run by it
  • 5 more are in active development
  • 3 back end developers and 5 front end developers are using it
  • 4 teams have built products using it
  • 2 high availability firewall/load balancer configurations are deployed
  • The IOTK Firewall/Load Balancer is handling a maximum of over 10 Mb/sec
  • 4 platforms are currently running on it
  • It is being used for online video, real estate, and digital music
  • Facebook Connect, Sign in with Twitter, and Login with MySpaceID are integrated into IOTK

…and continuing to grow!

Testimonial

September 11, 2009

From one of the developers using IOTK:

“New code generator is indeed kick ass, lots of saved time…”

Thank you! I put the hard work into it precisely for this reason. This developer has good perspective, too, because until now he has been developing against IOTK without the Code Generator. He can fully appreciate the decrease in development time that he’s experiencing now.

IOTK Stats

May 20, 2009

Quick check of where IOTK is:

  • 1,227 unit tests assure its quality
  • 9 Web sites are run by it
  • 4 more are in active development
  • 3 back end developers and 3 front developers are using it
  • 3 teams located in New York, California, and Montreal have built products using it
  • 1 high availability firewall/load balancer configuration is deployed (using solid state disks)
  • 7 firewalls are managed by it
  • 2 platforms are currently live on it
  • It is being used for both online video and real estate
  • 1 hosting service is using it to manage DNS
  • The Schema Diff Tool has been executed successfully 82 times

…and growing!

Oracle Schema Diff Tool

May 11, 2009

Built into IOTK is an Oracle schema difference tool. It takes a source Oracle schema and a target Oracle schema and writes the exact SQL commands necessary to modify the target so that it is identical to the source. Presently it can detect differences in:

  • Reference tables
  • Reference data
  • Tables
  • Indexes
  • Sequences
  • Functions
  • Procedures
  • Packages
  • Types
  • Columns
  • Constraints

I have found that this type of tool is invaluable for increasing the speed of development while maintaining a high level of quality and consistency.  All too often, modifications to the database of an application are tracked by a single person or each developer is required to record their changes so they can manually be applied to QA or production later.  People make mistakes or, under the pressure of a forthcoming deadline, forget to record things accurately if at all.  I advised a group that told me that their “database schema diff tool” was an Excel spreadsheet that was “checked out” and modified with DDL statements to be executed at the next release.  Their check out process was a meeting in which the person that wanted to modify the spreadsheet verified that no one else was either using it or had left it open on their computer.

A developer whom I respect highly once told me that computers are great at exactly the types of operations that make IOTK’s schema difference tool possible.  I would not have been able to imagine the amount of stress and anxiety such a tool can relieve a team of until I found that I could never build a product without one.  Thus the reason I wrote IOTK’s versions.

Before I give more details on the schema diff tool, let me explain how I typically do releases.  Everything I work on is managed by the Subversion revision control system.  The main line of my code is called “trunk” and it’s the active copy of the software against which all of the developers involved in the project are actively developing.  The team will develop a product or a set of features for a pre-determined amount of time, called a release cycle.  During the coding of the release, the trunk repository is very active with updates and additions.

At the end of the release, a copy of the code is created by what is called branching.  A branch is a free standing repository that is not affected by changes made to trunk.  Branching essentially creates a stable version of the software because ad hoc modifications to it are frozen.  The branch is than pushed to a QA environment where thorough testing is done.  In the event a defect is found trunk is modified to correct the problem and the changes are patched to the release branch.  This guarantees that the fix is present both in the active line of development (and thus future releases) and the current (soon to be production) branch.

Once QA has been completed you know that what is on the QA environment is exactly what you want in production.  Because QA has no legacy value, it can always be thrown away and rebuilt from scratch guaranteeing that how it operates is the new way in which you want production to operate.  In essence you have the environment that is currently live and the one you want to be live.  You just need to know what’s different between them.

I achieve this in two ways.  First I use rsync to determine the difference in the code files and I have rsync transmit only those differences to the production environment.  Once I run my “sync scripts” I know that the production environment contains exactly the code and logic I want it to.  However, it might not have the proper database structures to operate against.  That’s what the schema difference tool is for.  It will literally tell me exactly what SQL commands to execute to get production to look just like QA.

Below is what the schema difference tool looks like when you run it:

nobody@qa:/tmp$ iotk-schema-diff.php source/p@inst target/p@inst

// +--------------------------------------------------------------------------+
// | Executing Schema Diff                                                    |
// +--------------------------------------------------------------------------+

+ The source schema is source/p@inst.
+ The target schema is target/p@inst.
+ Upgrade scripts will be written to:
    ./schema_diff

+ Would you like to proceed? [y|n]

Pressing “y” will start the process. I have used the tool extensively on small to medium sized schemas and it will typically finish in 1-2 minutes. When it is done, the directory ./schema_diff will contain the following files:

10-start.sql       30-columns.sql      50-functions.sql   70-ref_tables.sql
15-ref_tables.sql  35-constraints.sql  55-procedures.sql  75-finish.sql
20-tables.sql      40-ref_data.sql     60-indexes.sql
25-types.sql       45-sequences.sql    65-packages.sql

You can safely execute each of these files in SQL*Plus on the target schema. You execute these files by following the numbers: execute 10-start.sql first and 75-finish.sql last. Once I have completed doing this I like to take two last steps:

  1. I add the above files to the branch for archiving.  If there are issues later on with a release I can look into whether or not the issue was caused by something executed from these files.  I can also use these files to manually roll back the database (if possible).  To date I have never had to do that.
  2. I execute the schema diff tool again to make sure that it produces no output.  This means the source and target are the same.

Building this type of tool is surprisingly easy, though time consuming, due to the Oracle Data Dictionary.  Since it is a relational model describing a database’s structure, it is easy enough to execute SQL statements against it to determine differences or to pull data back into an application to compute differences.

The time I spent on it, though, was worth it.  I can develop my application’s without being burdened with having to remember what changes I made or how I am going to push to production.  I do my thing and let the servers sort it out.

Part of my development of the tool was also building the unit testing framework around it.  I literally build two schemas as part of the unit tests and load a source with known objects and a target with some of the same objects and some different ones.  The similarities and differences are enough to test every part of the schema difference code.  The unit tests than execute the schema diff tool’s processes to ensure the resultant upgrade scripts are correct.

Current IOTK Features (May 2009)

May 1, 2009

I thought it might be interesting to give a brief overview of what features are available in IOTK right now. Since I am continuously developing more will be added and I will attempt to document new features like this from time to time.

API
The IOTK style of programming separates the “back end” PHP API from the HTML, user interface layer, etc. As a result, the data returned from your API functions can be delivered in any format. With minimal effort you can publish an API over HTTP, JSON, REST, XML, etc.

Amazon Web Services
Integration with Simple Storage Service (S3) for things like CDN delivery or file back ups.

Bin
A script repository that contains a large number of scripts for managing your code releases, for providing information about what an Oracle database is doing, and an Oracle error library that explains common Oracle errors in plain English and provides the IOTK standard fix for an issue.

Build
A structured and automated build system.

CLI
A command line interface interface for making script writing and back end job creation easier. One of its main goals is to allow for command line code and functions to be placed inside of your API so that it can be properly unit tested. This system handles things like PID locking, output logging, and providing a mechanism for logging messages and errors to a monitored database.

Code Generator
You build your data structures (SQL) and the Code Generator will handle writing most of the PHP and PL/SQL for you. This is not “stub” code – you can rerun the Code Generator as often as you want and it will not affect the code it created in the past. Furthermore, the system is entirely object oriented so you can expand the Code Generator’s functionality without having to touch what it creates.

Comments
Data structures and code for putting comments into your applications. This feature has been tested for 2 million users commenting on 1 million entities generating 6 million comments. It scales.

Contacts
Contact (address book) management including integration with a third party service for scraping almost all modern email services. The “tell a friend” feature can potentially generate a lot of leads if people use the importer to bring in their entire address book from their email account. This feature has been tested for 3 million entities (users, organizations, address books, etc.) generating 8 million contacts. It scales.

Content
A system for storing, searching, and retrieving large blocks of text content.

DataStore
A generalized file management system that has specialized support for images and video.

Dictionary
Localization support for text.

DNS
A zone file management and compilation system designed for managing a large number of domains and guaranteeing the quality and accuracy of the individual DNS entries.

Documentation
Over 55 documents made up of hundreds of slides that cover best practices, coding standards, scalability issues, product development process, release process, and technical training. In addition a full example API is provided that is written in IOTK standard code that covers the entire system. This code is heavily commented and is provided for more senior developers that just want code references.

Encryption
A simple library for handling encryption of data and armored tokens to make data exchange extremely safe.

Event Log
Typical “feed” functionality common on social sites today. It can also be used to track any type of event within the system you are building.

Examples
A full example API that implements every part of IOTK using standardized IOTK code that is heavily documented.

Form
An HTML form extraction layer that allows for the programmatic creation of forms. This systems also handles all of the tasks involved in form processing and can handle more complicated types like dates, file uploads, etc.

FW
A high availability firewall/load balancer implementation that sits on top of iptables. I am going to write more about this particular feature soon but high availability load balancers are essential for excellent up time and being able to scale requests over many servers. There is, however, another interesting reason I built this which has to do with SPAM prevention challenges that many social sites face today.

Image
A robust image manipulation and delivery system. Put in front of a content delivery network (CDN) this system can save money and time by storing a single image in the file system while being able to deliver it, on the fly, in any orientation. This system sits on top of the ImageMagick graphics library.

Location
Location and geo based data structures and functionality.

Mail Merge
The ability to associate large email lists with email message templates. The templates can be customized based on profile data from the email list.

Oracle
Full integration with Oracle. This has been heavily tested against their amazing 11.0.7.1.0 database.

PDF
A library for converting HTML documents to PDF documents.

Rate
Allows users to rate entities in your system like restaurants or other users. This feature has been tested for 2 million users rating 1 million distinct entities generating 4 million ratings. It scales.

Scheduler
A system for programmatically creating crontab entries from within your API. Eases system administration and back end job processing.

Schema Diff Tool
Compares two Oracle databases and writes the exact SQL scripts required to upgrade the target Oracle database instance so that it is functionally compatible with the source. Extremely powerful tool that makes code roll outs take a tenth of the time and raises the quality on all releases. Using IOTK there is no reason for a person to track what database changes they are making. The system is designed to allow computers to figure it out and inform you what steps to take to get your database up to the latest release.

SEO
Both an API and documentation around good search engine optimization (SEO) practices. A few of the important ones are built into the IOTK framework (like auto-generation of a sitemap.gz file).

Text Search
The ability to add keyword searching to existing tables and LOB based columns.

Tiny URL
Code for shortening URLs so that you can email links without worrying about them breaking lines. More relevant for today to create links that won’t eat up your 140 characters per Twitter status update.

Unit Test
A unit testing framework. Presently IOTK is covered by over 1,200 tests.

User Interface
A host of functionality for standardizing AJAX interactions, enforcing HTML form rendering and processing, enforcing navigation and link creation (for SEO purposes), and for providing a general request handling mechanism that is entirely SEO driven and separates URLs from their mechanism of delivery.

Version
A feature management system for ensuring backwards compatible of your API while allowing you to upgrade libraries and code.

Video
A full video manipulation and trancoding library that sits on top of ffmpeg and mencoder.

XML
An XML processing and creation library.

Table Builder

April 24, 2009

An important aspect of IOTK is its ability to force a standard way of writing code. I have learned that this promotes higher quality software, decreases debugging time, and makes it much easier for developers to pick up the work of others.

There are several ways I am trying to accomplish this, one of which is having committed to PHP as the language to use for all aspects of IOTK. API code, command line interface programs (CLI), even certain types of configuration files are written entirely in PHP. I want things to be consistent because I can move more quickly if I’m constantly working in a familiar, standard environment.

Until recently IOTK really consisted of 3 languages: the PHP API, the SQL used for DDL, and the PL/SQL used for DML. Each of these languages is very different requiring their own standards and knowledge of their ins and outs.

An essential part of IOTK is its Code Generator facility that is able to write a lot of the PHP and PL/SQL that is required to interact with a table. I will spend more time explaining the Code Generator another time. But because of it I found myself spending more and more time editing SQL.

So the goal in creating the Table Builder was to create a PHP interface that allows me to “describe” a table but to separate what I want from the code that produces it. That is precisely what the Table Builder does. I am now able to construct a table using code that looks like this:

co_Table::make('new_table')
    ->int('id', true)
        ->pk()
    ->int('parent_id')
        ->fk('parent_table', true)
    ->varchar('value', 50, true)
    ->date_col('date_created', true, 'sysdate')
    ->iot()
    ->seq()
    ->idx(array('parent_id', 'date_created'), 1);

The indentation is deliberate and I will explain each line:

  • co_Table::make(‘new_table’) Makes a new table object to be used to define a table named “new_table”.
  • ->int(‘id’, true) Defines a column of type integer with the name “id”.  The “true” parameter indicates that the column should be set to not null.
  • ->pk() co_Table keeps track of the active column – the last column that was defined.  Certain co_Table functions can be called that specifically apply to the active column.  pk() is one of those functions.  It tells co_Table to define the primary key, in this case, as the “id” column.  The indentation used for ->int() and ->pk() is used to make it visually clear that the primary key definition belongs to integer column that was just created.
  • ->int(‘parent_id’) Defines a column of type integer with the name “parent_id”.
  • ->fk(‘parent_table’, true) Is similar to pk() except that it defines a foreign key constraint.  “parent_table” is the name of the parent table in which the foreign key can be found and true indicates that the “on delete cascade” clause of the constraint should be enabled.
  • ->varchar(‘value’, 50, true) Defines a column of type varchar2 with the name “value”.  It has a a maximum length of 50 characters and is defined as not null (the true).
  • ->date_col(‘date_created’, true, ‘sysdate’) Defines a column of type date with the name “date_created.”  The true indicates that the column is not nullable and “sysdate” is the default value to be assigned to the column.
  • ->iot() Defines the entire table as an index organized table.
  • ->seq() Signifies that a sequence should be created for this table.  Note that co_Table will handle naming the sequence in the IOTK standard manner.
  • ->idx(array(‘parent_id’, ‘date_created’), 1) Signifies that an index should be created using the columns “parent_id” and “date_created” and compressed by the first column (the 1).

Calling ->get_definition() on the resultant co_Table object will produce the exact SQL required to create the entities I have described above.  You can literally cut and paste the output in SQL*Plus and it is always outputted in the IOTK standard way for DDL.

Here is the code I would normally type to create the entities described above and the output that co_Table would produce:

create table new_table(
    id integer not null
        constraint new_table_pk primary key,
    parent_id integer default null
        constraint new_table_parent_fk foreign key
            references parent_table on delete cascade,
    value varchar2(50) not null,
    date_created date default sysdate not null)
organization index;

create sequence new_table_seq nocache;

create index new_table_prnt_id_dt_crtd_idx
    on new_table (parent_id, date_created)
        compress 1;

A lot less typing goes into the co_Table implementation. As mentioned previously, co_Table will handle naming entities to make sure they are Oracle compliant as well as standardized according to IOTK.

co_Table supports all of the features you would need to create tables in the IOTK framework including multi-column primary and foreign keys, check constraints, LOB columns, unique constraints, etc. In addition, it defines a number of functions for IOTK defined “data types” – ways to validate and represent data throughout the IOTK framework – like booleans, passwords, email addresses, and IOTK specific types like DataStore and Content IDs. Many of these data types include not just column definitions but constraint definitions as well for ensuring the data meets the integrity requirements set for them.

One missing piece is how you might use co_Table to actually load your SQL. It seems silly to have to create a PHP file than execute a script to create SQL that you would cut and paste into SQL*Plus. In fact, none of that happens when you’re using IOTK. One of the core facilities (and perhaps the most important part of the system) is the DB Builder. It automates the ordered tear down and rebuilding of all of the database objects defined by your application. All you have to do is create your SQL and put it in the right place and the DB Builder will handle finding and loading it or detecting that it has changed and reloading it. More on the DB Builder some time in the near future.

I have yet to deploy the Table Builder into a real product yet as it is currently the project I’m actively developing. I have high hopes for it and should have it completed in the next few days. In the mid-term I will investigate how to modify IOTK (and facilities like the Table Builder) to be able to produce either Oracle database code or, say, MySQL database code based on a configuration setting. This is a potentially powerful feature that is enabled by abstracting how the database entities are defined (described) and how they are actually written.