Partitioned tables

Many databases have examples of tables that can naturally be partitioned based on a single column. For instance the stock held in branches can be partitioned by branch identifier. To help smooth out I/O and to reduce contention it is often advantageous to split these partitions across multiple tablespaces in effect considering each partition as a distinct table but still allowing a view over all the partitions. In classic KDB databases this was the chosen solution. However other databases have alternative implementations.

Database partitioning scores over separate tables when SQL is involved as table names cannot be parameterized in SQL and the number of SQL statements required is multiplied by the number of partitions reducing dramatically the effectiveness of SQL caching.

If partitioning is used, KCML treats the partition to be a separate table. It uses a naming convention to determine if partitioning is appropriate when creating or opening a table. In the current implementation if a table name ends in an underscore and exactly 4 digits then it is considered to be a partitioned table with the 4 digits as the partitioning column value. Thus BR_01_HEADR_0001 and BR_01_HEADR_1001 are two partitions in the base table BR_01_HEADR. It is also possible to have a further suffix after the partition number provided it is not 4 digits so BR_01_HEADR_0001_UEXIT is a partition of the table BR_01_HEADR_UEXIT.

The SQL CREATE TABLE statement has been extended to support a PARTITION clause defining the partitioning column and fixing its value for that instance. The CREATE TABLE might look something like

CREATE TABLE BR_01_HEADR_1001 (ORDERNO CHAR(8), BRANCH INTEGER(2), ...) PARTITION BY BRANCH VALUES (1001)

When KI_OPEN is used to open a table with a name containing partition number then it opens the base table and but allows access only to the rows which have the appropriate value for the partitioning column as in

CALL KI_OPEN h, "BR_01_HEADR_1001", "W" TO s

However if you need access to all the rows across all the partitions you can open the base table directly:

CALL KI_OPEN h, "BR_01_HEADR", "W" TO s

The column schema and indexing must be identical on all partitions however each partition will have its own distinct default row.

Partitioned indexes

Each partition can be considered to have its own private index called a partitioned index. Effectively the partition number is prefixed to the key. You would use these indexes when you have opened a partitioned table directly. You can add an partitioned index like this:

CREATE UNIQUE INDEX BR_01_HEADR_0001_A1 ON BR_01_HEADR_0001 (ORDERNO)

Creating an index on any partition effectively creates it on all the partitions. If you create the same index on another partition then KCML will not do anything as it will already be there. Thus, given the above SQL statement, the following is redundant:

CREATE UNIQUE INDEX BR_01_HEADR_0002_A1 ON BR_01_HEADR_0002 (ORDERNO)

Similarly dropping a partition index has no effect unless it is the last partition in the table.

Partitioned indexes can also be used with the base table. KCML will expect to get the partition number from the row in KI_READ_RANDOM or KI_READ_NEXT.

Global non-partitioned indexes

It is also possible to create an index on the whole table that is not organized by partition. These are called global non-partitioned indexes and they are created on the base name of the table e.g.

CREATE UNIQUE INDEX BR_01_HEADR_A2 ON BR_01_HEADR (ORDERNO)

These indexes can be used with the base table or on a partitioned table. Because the index will be bigger, access will be slower than with a partitioned index.

Serial columns

Auto incrementing columns (set with the SERIAL clause in CREATE TABLE) can be for the whole table (A) or by partition (P). The latter is recommended in clustered databases where particular database nodes can better cache different partitions.

Enabling partitioning

Partitioning is on by default for databases that support it. It can be controlled per connection with the _KDB_CONN_ATTR_PARTITION connection attribute and the KI_CONNECT_SET_ATTR call. Several values can be set:

Value	Purpose
0	The default. Use Oracle partitioning if supported or simulate it if not available. The value returned for this attribute for Oracle will reflect whether or not it was enabled.
1	Do not use partitioning and create partitions as distinct tables.
2	Simulate partitioning even if supported by Oracle.

After a connection is established the partitioning in use can be checked with KI_CONNECT_GET_ATTR.

The PARTITIONS environment variable can also be set before a connection to set a system wide default. Setting it to "TRUE" gives the default. Setting it to "FALSE" will disable partitions and setting it to "FAKE" will force simulated partitions.

Simulated partitioning

KCML will put the rows for partitioned tables into a single table with the base name. The partitioning column will be used to select each table. An index will apply to all partitions and while you can issue individual CREATE INDEX and DROP INDEX statements, the index is created on the first such call and dropping an index on a partition has no effect. DROP TABLE for a partition will be slow because it will be implemented as a DELETE for the rows of the partition. This arrangement allows for the memory saving in the SGA by reusing the same SQL for all partitions but some operations such as dropping a partition will be much slower as they involve a selective delete. Because the partitions are not discrete objects there will be more contention and simulated partitioning cannot take advantage of parallel operations. Simulated partitioning is the default for PostgreSQL databases.

Native KDB Implementation

Partitioning is not supported in native KDB. Each partitioned table is a separate database table and there is no mechanism to open the base table.

Oracle Implementation

Oracle supports an optional feature called list partitioning which splits the table, its indices and its metadata across mutiple partitioned tables and KCML can exploit this feature if enabled. The various data segments and index segments for each partition can be managed separately and in particular you can create and drop per partition indices. Partitions of the same table can be placed in different tablespaces. If one partition index is damaged it does not affect the other partitions.

Oracle partitioning potentially can make dramatic memory savings in the SQL cache portion of the SGA as it allows KCML to in effect parameterize table names. It also allows for much more parallelism.

If not supported by the Oracle installation KCML can simulate partitioning by using a single table as in the general SQL database case.

PostgreSQL implementation

The default is to use simulated partitions though partitioning can be disabled by setting the _KDB_CONN_ATTR_PARTITION environment variable immediately after connecting with KI_CONNECT.