Arthur Whitney
Kdb+ is an RDBMS and a general purpose programming language: http://kx.com/q/d/q.htm
* OLTP from 100 thousand to 1 million records per second per cpu.
* OLAP from 1 million to 100 million records per second per cpu.
* Solaris, Linux and Windows. 32bit and 64bit.
The realtime OLTP+OLAP part of the database is generally in memory with an optional update log. There is no limit to the size of the OLAP historical databases on disk. Typical customers run realtime analysis on 100 million transactions per day and historical analysis on the accumulated billions of records.
Unzip kdb+ (and ) in QHOME (default: ).
Executable is . Put on PATH.
The simplest database is just a process which may be initialized with a script. Each process handles any number of concurrent http and kdbc (jdbc/odbc) clients. For example, start a trade.q database listening on port 5001.
View the database with a browser: http://localhost:5001
Kdb+ returns text (html, txt, csv or xml) to http queries.
Kdb+ returns data to java, .net, c, jdbc/odbc or q clients. The results are atom, list, dict or table.
Data is faster and more flexible than text. You should get 100,000 single record selects, inserts, updates [and deletes] per second. You can insert or update a million records per second in bulk. Kdb+ is generally faster than the network.
All but boolean and byte have nulls. The int, float, char and symbol literal nulls are: . The rest use type extensions, e.g. . Each type also a list notation, e.g. in the following nested list:
There must be no spaces in symbol lists, e.g. . Symbols can have any non-zero characters (e.g. `$"a-b") but identifiers must be alphanumeric.
The q
is similar to the sql
It will check primary key violations. The q
is insert for tables and upsert (update existing and insert new) for keyed tables.
q is an extension of sql capabilities. Expressions are short and simple.
These all run at several million records per second. In general:
These are similar to (but more powerful than) the sql
In sql the and clauses are atomic and the and clauses are atomic or aggregate if grouping. In q the and clauses are uniform and the and clauses are uniform or aggregate if grouping ( ). All clauses execute on the columns and therefore q can take advantage of order. Sql can't tell the difference. Sql repeats the expressions in the and the clause is one boolean expression. The q clause is a cascading list of constraints which nicely obviates some complex sql correlated subqueries and also gets rid of some parentheses. q relational queries are generally half the size of the corresponding sql. Ordered and functional queries do things that are difficult in sql. See http://kx.com/q/e for examples. is a special token that indicates record handle.
Create, select and update will create names from expressions if there is no assignment. The name is last token (first token if operation is +, -, *, %, & or |) in the expression or if this token isn't a valid name, e.g.
is short for
It is common to want to do arithmetic with tables. Extending arithmetic to tables replaces complicated sql coalescing and outer joins.
Foreign keys are useful in q, e.g. given
then
See http://kx.com/q/e/tpcd.q for more examples.
Joins generally run at 10 to 100 million records per second.
See http://kx.com/q/e/tpcd.txt for an 8-way join
Following is a map of how to do sql things in q. There is no reverse map since q is a much richer language than sql. With respect to CJDate "The SQL Standard"
Server functions and procedures are written in q. Why not java? q is faster, smaller and better than java. For example, a server function to subtract a client table p from a server table P is
obviates common correlated subqueries, e.g.
see also http://kx.com/q/e/tpcd.q
Choose layout and optimizations to fit the important queries:
think about the queries. think about disk. e.g. if you want fast access on terabytes by time and customer store it both ways.
Tables are:
large are usually sorted and parted so that important queries are one seek.
64bit can generally handle realtime tables around 10 times bigger than 32bit.
64bit has an advantage on medium (and large) because queries on 32bit are dominated by mapping.
One day's worth of tick(trades, quotes, orders, ..) is a few GB and fits in RAM.
Queries are around 10 microseconds.
dayend append (and `g#mas) is fast.
once data is cached in memory
Queries are around 1 millisecond.
Queries run from disk at 10ms per date/sym/field.
The partition field is and is virtual.
Kdb+taq is more than 2500 partitions of daily trades and quotes. Each new day is more than 1GB. [Queries on partitioned databases can run in parallel.] To set a subview -- handy for test and development:
Each database runs in memory and/or disk map-on-demand -- possibly partitioned. There is no limit on the size of a partitioned database but on 32-bit systems the main memory OLTP portion of a database is limited to about 1GB of raw data, i.e. 1/4 of the address space. The raw data of a main memory 64bit process should be limited to about 1/2 of available RAM.
A kdb+ historical database is a file system directory. Small tables are simply binary files inside this directory. Medium-sized tables (<100 million rows) are splayed (a file for each column) across a subdirectory. Big tables (billions of rows) are horizontally partitioned (often by date) and then splayed across a directory. An example layout for kdb+taq is:
Kdb+ databases run 24*7. After a new date is added a simple reset message is sent to the server. Small tables are in memory. The big table columns are mapped in and out on demand. To avoid swapping RAM should be at least 10* size of the largest column, e.g. if one table in one partition can have 100 million rows there should be 8GB (10*8(byte float)*100,000,000) of RAM. [Address usage is 20* size of the largest column so on 32bit systems tables shouldn't have more than 16.7 million rows.] The number of partitions, tables and columns doesn't matter. Some customers have 1000's of partitions. Some have 1000's of tables. Some have 1000's of columns in the tables.
Tables with many columns are good splayed table candidates because the bulk of these tables will always be out of memory (queries rarely require access to more than a few columns at a time). The performance characteristics of splayed tables are independent of the number of columns in the table. They depend only on the number of columns that must be in memory at any one time, which in turn depends on the specific queries being executed. Columns of Kdb+ splayed tables are stored contiguously on disk and laid out contiguously in real memory when accessed. This organization gives optimal disk access performance. The operating system caches accessed columns in real memory. Consequently, if the real memory hit-rate is high, the performance is real memory performance. (This is often the case because many database applications have a few heavily used queries that reference the same few columns). There can be one or many instances of the database. All instances point to the same data. All processes will reside on the server including a gateway should one be required. The system can be implemented with or without a gateway process. In a system without a gateway, users connect directly to a database instance and send queries, with a gateway the users connect and send queries to the gateway process which allocates the query based on which instance is free. This can be handy for running more than one query at a time but overall throughput can of course suffer to the extent that the queries are competing for disk. But this is a way to let an important user get something quick while some bigger analysis is going on in another instance.
Partitioned databases can run in parallel. But if we run in parallel it is best to have parallel i/o. Each kdb+ process can easily consume 100MB per second. Therefore the fastest multi-terabyte kdb+ databases use parallel i/o with guaranteed no contention. Customers do this with DAS (direct attached storage). U320 scsi controllers are fast and inexpensive. Customers sometimes use SAN (200MB/sec total) or NAS (80MB/sec total) because the IT departments insist on it. That is ok. But this is the limiting factor. If you use NAS be sure to have at least 1Gbit ethernet.
If you use NAS there is no point in going parallel -- i.e. one kdb+ process can easily consume everything the filesystem throws at it. 4 slaves will saturate SAN. With DAS (direct attached storage) we can have one slave per disk array and ideally at least 2 disk arrays per cpu core.
If all queries access just one partition/date then there is no point in going parallel. If you have DAS and have queries that span more than one partition then the best setup is to have one or more disk arrays per slave process (thus no contention) with approximately 2 slaves processes per cpu. (one is crunching while the other is reading). But be sure to have enough RAM per process so that there is no swapping, e.g. a machine with 4cpu/32GBram/4controllers(2channels each)/8diskarray(14*146GB drives each) can have 8 slave processes on 50 million row (per partition) tables and 4 slave processes on 100 million row (per partition) tables. The o/s independent way to allocate partitions to different disk-arrays is to use a file (par.txt) that lists the different directories, e.g.
where par.txt is
and these directories are:
Partitions are round-robin allocated mod N where N is the number of partitions. Round-robin is to get maximum parallelization on date ranges.
Log/commit strategies are none, file (-l) or synch (-L). None is good for test, readonly, readmostly, trusted, duplicated or cache db's. File (100,000 transactions per second) is useful if you trust (or duplicate) the machine. Synch is 100 times per second with SCSI disk and 10,000 times per second with SSD(solid state disk). In any case a single drive can commit more than 40MB of updates per second. A logging database is [script/][state/]log, e.g.
i.e. put code in d.q in directory .u(utilities). all update messages are logged. To send a message from the console use 0, e.g. . An error after a partial update will cause rollback. d can start empty. d, d.q and d.log should all be in the same launch directory. In general only log in production. Keep the database as simple as possible.
The windows version comes with an odbc loader, e.g.
You should get 10MB/sec -- if the source rdbms and network are up to it. Also,
$bcp t out t.txt -c
It is trivial to incrementally archive the large parallel databases: simply move the old directories away and send the reset message.
Servers can control access with a file of usr:pwd's, e.g.
Clients connect with:
Even when there is no server -u the q client still sends username. All incoming messages reflect:
Clients can only do file i/o in the server directory.
Kdb+ is a parallel/grid programming system.
The overhead is conditional threads - 1 microsecond - so use accordingly. Test overhead with:
Ideal parallel cpu candidate has high cpu%data, e.g.
Peach is also good for going after many different drives at once even if you only have one cpu.