Dimeric Software logo

Scalability

This document describes several specific ways in which scalability has been designed into DSUtil and VDB. Rather than simply stating that our libraries are scalable, we believe in pointing out some of the algorithms, data structures, and other design decisions that make these libraries scalable. Each point is discussed in further detail following the list.

The following data structures and algorithms are just a small sample of what DSUtil and VDB have to provide. Having these features and utilities means that we can use the right algorithm or data structure where where we could have just as well used a less efficient alternative. Using the right algorithm or data structure means that we're getting optimal efficiency. While it may seem like a small consideration, using the right data structures and algorithms can have as much to do with performance and scalability as other more obvious considerations. Details make a difference.


Contents

  1. VDB - connection pooling
  2. VDB - automatic adaptive query preparing and caching
  3. VDB - streamlined parameterized queries
  4. DSMap - hash tables instead of linear searching
  5. DSStreams - buffered streams and putback streams
  6. TinyMemoryStream - a memory stream that starts small, but grows geometrically
  7. ILockManager - efficiently protect an unknown number of resources
  8. DSObject - abstract factories for creating resources on demand


#1: VDB - connection pooling

Database connections are resource intensive, particularly on the database server. Using connection pooling allows your application server to dramatically reduce the number of connections it makes to the database. Essentially, your application server shares connections among all users. You only need as many connections as there are concurrent requests, instead of needing a connection per user. For example, if you have 100 users logged into your application server, but there are at most 10 concurrent requests from those users, you only need 10 connections to the datbase instead of 100. As the number of concurrent requests changes, so do the number of connections to the database. By reducing the number of connections to the database, you reduce the load on the database, improving scalability for all applications connected to the database.


#2: VDB - automatic adaptive query preparing and caching

Using prepared queries can significantly improve performance. It can also reduce the load on the database server. Instead of forcing the database to re-parse and prepare a query plan each time you execute the same SQL, the database only has to do that work once if a query is prepared before being executed repeatedly. Requiring the database does less work to achieve the same result improves scalability.

The catch is that preparing queries can actually hurt performance and decrease scalability. The problem is that preparing a query holds resources in the database. The database has to store the query plan for the query. Resources used in storing this information cannot be used elsewhere. So, when you prepare a query, you should be reasonably certain that it's the right thing to do. If you make the wrong choices, you may be limiting your scalability.

It gets worse when you consider that your application server may be doing different things at different times of day. While it might make sense to prepare a certain query in the morning, it may not be used much in the afternoon much, and would just be dead weight.

Also, in order to prepare a query, you need to be executing the same SQL repeatedly. Not just similar SQL, but the exact same SQL. Often, this means using parameterized SQL. In traditional Delphi programming this can be annoying and error prone.

The final insult comes when you realize that, because prepared queries are associated with a particular connection, using prepared queries in conjunction with connection pooling is difficult, if not downright impossible.

In traditional Delphi database programming, you might just write off prepared queries altogether, or at best, only use them when you know that you have a performance issue and you know for certain that using a prepared query will help. Typically, only then will you go to the trouble of using a prepared query. For example, if you have a loop that inserts a large number of records, you might prepare an insert statement with parameters.

Fortunately, VDB provides a simple answer that hardly requires any thought on your part. VDB automatically caches and prepares queries that are executed repeatedly. VDB maintains a query cache inside each IDatabase object that holds recently executed queries. By default, if a query is executed more than once, within a minute, through the same connection, it will prepared. This works especially well with connection pooling because prepared queries are cached with the connection, they can be shared among users, just like connections are. And because the query cache has a timeout, the cache of prepared queries can vary during the day. If certain SQL is executed frequently during a certain part of the day, it will be prepared. If it is rarely executed at another time, it won't be prepared. This requires no profiling or analysis on the part of the programmer. Of course, you can tweak this behavior by explicitly preparing queries, etc., but the default behavior is much superior to the default behavoir of the underlying component libraries.

Finally, because VDB provides a vastly simplified syntax for using parameterized queries, you can use it anywhere you would normally use dynamically generated SQL. See the next point for details.


#3: VDB - streamlined parameterized queries

Using parameterized SQL in VDB is far simpler than using parameterized SQL in traditional Delphi database programming. The following illustrates inserting a "name" and a "value" from a string list.

  for i := 0 to StringList.Count - 1 do begin
    Name := StringList.Names[i];
    Value := StringList.Values[Name];
    Database.ExecSQL(
      'insert some_table (name, value) values (:name, :value)',
      [ParamValue('name', Name), ParamValue('value', Value)]);
  end;

Not only is this code easy to write (and read) but it works well with automatic query preparing and caching. See the previous point for more. See the VDB documentation for details on using parameterized SQL in VDB.


#4: DSMap - hash tables instead of linear searching

A "map", "dictionary" or "associative array" is a common abstract data type, and a hash table is a common, and efficient, data structure to use to implement a map. Many programmers have just such a utility in their bag of tricks. What makes the IMap interface (defined in DSMap.pas) particularly nice is that it is a map from Variant to Variant. This means that it can be used in many situations where you'd otherwise have to invent a specialized class. Whether you need a map from integer to interface, or from string to datetime, the IMap can handle it. We've profiled, tweaked, and optimized the map for the highest performance.


#5: DSStreams - buffered streams and putback streams

The DSStreams.pas unit defines an interface-based, layered streaming library. Summarizing this library in a few paragraphs wouldn't do it justice. See the DSUtil documentation for a detailed discussion. An example of how the DSStreams layered approach enhances efficiency is the buffering layer. DSStreams makes it a snap to buffer data access to any underlying stream, whether it's a file, socket, pipe, etc. Using a buffered approach means that fewer data access calls are made to the underlying stream. Even using a small buffer (e.g. 4k) can significantly reduce data-access call overhead. Reduced overhead means improved efficiency, which translates into greater scalability.


#6: TinyMemoryStream - starts small, but grows geometrically

The TinyMemoryStream is a memory stream that starts small, but grows geometrically. The built-in TMemoryStream starts larger, but grows linearly. Each increase may cause an entirely new buffer to be allocated, with the existing contents of the stream copied into the new location. If you were to write one byte at a time to a TMemoryStream, it would increase its capacity 8k at a time. For example, if you were to write 64k to a TMemoryStream, one byte at a time, it would reallocate its internal storage 8 times. For 128k, it would reallocate 16 times. For 256k it would reallocate 32 times, and so on. The TTinyMemoryStream initial allocation is 1k and it increases by a customizable growth factor--typically 2. So writing 64k would require 7 reallocations, 128k would require 8, while 256k would require 9. While buffering in memory is best avoided, when it is necessary, it is best to use a scalable solution like TTinyMemoryStream. When small, it uses little memory. When large it reallocates less frequently. Both aspects enhance scalability. Note: The TTinyMemoryStream is the default memory stream used by DSStreams.


#7: ILockManager - efficiently protect an unknown number of resources

The LockManager is an object that can provide synchronization for an artbitrary number of resources. The lock manager has two methods: AcquireLock which returns a named synchronization object and ReleaseLock which "releases" the synchronization object. A reference count is used to control the lifetime of the synchronization object. The first time it is "acquired" it is created. The last time it is "released" it is freed. The code that uses the lock manager simply refers to a synchronization object in the lock manager by name, but the synchronization objects held by the LockManager are released while the synchronzation objects are not actively referenced. This means that a lot more resources can be protected with many fewer, transient, synchronization objects. Because synchronization objects are themselves resources, this reduces overall resource use, and, therefore, increases scalability.


#8: DSObject - abstract factories for creating resources on demand

The DSObject unit defines an object factory framework that can be used to delay creation of objects until they are needed. This is particularly helpful for heavy-weight objects, like database connections. VDB defines a database connection factory (that implements IObjectFactory) for delayed creation of a database connection from a profile string. This is only one example where this pattern can deliver greater scalability by minimizing resource allocation until the last possible moment.

   Privacy Policy © 2002 Dimeric LLC. All Rights Reserved. Contact Us   
> Powered by VDB, DSP and Rinse <