Skip to main content

skip to main content

developerWorks  >  Information Management  >

Making DB2 Use All the Memory

developerWorks
Document options

Document options requiring JavaScript are not displayed


New site feature

Check out our new article design and features. Tell us what you think.


Rate this page

Help us improve this content


Level: Introductory

Blair Adamache (adamache@ca.ibm.com), DB2 Technology Development, IBM Toronto Lab

03 Jul 2001

Whether your deployment is on one PC or a network of servers, this article helps you maximize the available memory. Learn about the DB2 memory model in AIX and Solaris, and how to create buffer pools.

©2001 International Business Machines Corporation. All rights reserved.

Introduction

Ever heard of creative tension? It's one of those pseudo-spiritual philosophies that asserts forces opposed to one another can create things as a byproduct of their struggle. Sort of like the battle between good and evil in comic books. Now I wouldn't say that all software engineers are good, or that all hardware engineers are evil, but there is a creative tension between them. As Joseph Campbell said, "Don't let your Romantic aversion to science blind you to the Buddha in the computer chip." And what could be more romantic than an entire table surging off disk and into memory like the tide?

Part of the time, software engineers lament the slow pace of hardware advances: machines with slow disks, small memory sets, and crawling clock speeds. (When the hardware catches up, maybe we'll forget that Java" applications were ever slow.) As a new generation of hardware appears, the operating system folks adapt first, leaving frustrated users with a 32-bit architecture running 16-bit or (gasp) 8-bit DOS applications. The pressure now shifts to the software engineers: when will they recompile their applications and take advantage of the new data types and memory addressability provided by the new hardware? In the final analysis, when you compare BASIC on an 8086 with C++ running on a 24-way SMP, your "Hello World" program takes about the same amount of time to write as run.

However, databases are about more than printing "Hello World" to the display. As much as Web server software looks forward to higher speed lines, database software looks forward to soaking up every advance in disk speed, capacity, and addressable memory. While application programmers may have grumbled about having to recompile 16-bit programs (which already worked well) for 32-bit machines, the database engineer loves the thought of having required data in memory and not on disk, before it is sorted, aggregated, or sent to the user. I/O is the killer of so many ambitious workloads – it's the reason you spread a terabyte of data over five terabytes of disk (more disk = more spindles, which means more parallel I/O – at least in the benchmark world).

64-bit architectures are becoming standard in the RISC and Sparc worlds now, allowing commercial UNIX®'s like AIX®, HP-UX and Solaris to offer lots of memory to your favorite relational database. 32 bits of memory addressability equals approximately 4 gigabytes, and many large UNIX machines ship with 20 to 100 gigabytes of memory, which you want to make sure gets used. The Intel world is not far behind: Linux and Windows 2000 running on 64-bit Intel chips are a reality in operating system, compiler, and database software laboratories today, and will soon be for sale on a Web site near you.

So if the hardware and operating systems are ready for large memory, and databases can take advantage of it, how do you put these together and make it work? With DB2® Version 7, the first thing to recognize is that, out of the box, DB2 assumes 32-bit memory and hardware. To take advantage of larger memory, you have to tell DB2 that it's available and how to use it. Don't blame DB2 -- most DB2 clients and many DB2 servers will be on 32-bit Intel machines for years to come. And even if DB2 detects 96 GB on your machine, who's to say that you want DB2 to use it all, rather than share this memory with other applications?

When it's time to use this memory, you have several choices. The most obvious is to create 64-bit DB2 instances. This is supported today by DB2 Version 7 on AIX, Solaris, and HP-UX. If you have Version 7.1, you'll have to download Fixpak 1 to install the 64-bit DB2 libraries. If you have Version 7.2 or later, you won't need to install a fixpak to create 64-bit DB2 instances. To create a 64-bit DB2 instance, use the db2icrt command, specifying the parameter -w with a value of 64. For example:

db2icrt -w 64 -u db2fenc1 db2inst1

A manual describing the use of DB2 in 64-bit environments is available at: http://www-4.ibm.com/cgi-bin/db2www/data/db2/udb/winos2unix/support/document.d2w/report?fn=db2q9e71frm3toc.htm.



Back to top


1 + 1 = 2. 2 to the power of 32 = a lot.

Each 32-bit DB2 instance is capable of addressing 4 GB of memory. Normally, you want to devote most of this memory to bufferpools. However, the memory segmentation that occurs on AIX, HP-UX and Windows will limit the size of your largest bufferpool to less than 4 GB. Even Solaris, which has a very clean memory model in the 32-bit world, can have no more than 3.35 GB of addressable memory for DB2 bufferpools; the rest of the memory under the 4 GB line must be devoted to other DB2 uses of shared memory. (Fortunately, memory models are cleaner for all operating systems in the 64-bit world.) On HP-UX, the largest bufferpool you can create with a 32-bit DB2 instance is about 800 MB. You can exploit more than 1 GB on HP-UX only by running multiple instances with Memory Windows with 32-bit HP-UX. (HP Memory Windows are described in the DB2 Release Notes.) Bufferpools are limited to 3 GB on Windows, 1.75 GB on AIX, and approximately 1 GB on Linux. Figures 1 and 2 below illustrate the 32-bit memory models for DB2 on AIX and Solaris, respectively.


Figure 1. DB2 32-bit Memory Model on AIX
DB2 32-bit Memory Model on AIX

Figure 2. DDB2 32-bit Memory Model on Solaris
DDB2 32-bit Memory Model on Solaris

On large memory systems running 32-bit DB2, the simplest way to get a lot of memory devoted to bufferpools is to run multiple logical DB2 instances in a DB2 Enterprise-Extended Edition (EEE) configuration. You only need to run one instance of the operating system; this will help save overhead and allow the DB2 instances to communicate with one another through shared memory rather than over TCP/IP or a communications switch. With DB2's shared-nothing architecture, each instance can happily address 4 GB of memory with its own partition of the database. On most DB2 TPC-H benchmarks -- which typically have DB2 EEE running decision support queries across a database of 300 GB or more -- a large SMP is divided up with 4 GB for each DB2 node (each node is a database partition running its own DB2 instance).

There are three other methods with which DB2 can exploit large memory machines. On AIX, Solaris, and Windows, DB2 supports Extended Storage (also known as ESTORE). This allows DB2 to use memory above the maximum usable memory in the 32-bit memory model for system temporary tables (used for sorts) and read-only user data. It's up to DB2 to figure out which data can be considered read-only as it gets data off disk, but you'll need to configure DB2 to use Extended Storage.

Let's tackle the classic case: you are designing a database in which you want one table to be in memory as much as possible. To start with, you update the database manager configuration and tell it how many extended storage segments (num_estore_segs) to use. This is set to zero by default. The value you choose for n will depend on how big the table is, how much memory you have available, and how much you want to devote to this particular table. Let's say we're running Solaris with 6 GB of memory -- the 2 GB above the 4 GB line will be used for Extended Storage (also known as estore):

update db cfg for sample using num_estore_segs n

You define the size of the estore segments with the Extended Storage Memory Segment Size (estore_seg_sz) database configuration parameter:

update db cfg for sample using estore_seg_sz 32000

Now you create a bufferpool. For our example, we'll use an 8K page size, although 16K and 32K page sizes are also allowed. (If this were on Windows, you would have to use page sizes larger than 4K to use memory above 2 GB.) The bufferpool must be enabled for extended storage, which is done with the EXTENDED STORAGE keywords. highmem is the name I chose for this bufferpool. The size n depends on the amount of memory you want this bufferpool to occupy:

CREATE BUFFERPOOL highmem SIZE n
PAGESIZE 8K EXTENDED STORAGE

Now you create a tablespace and assign it to the bufferpool:

CREATE TABLESPACE highmem_tbsp PAGESIZE 8K
      MANAGED BY SYSTEM
      USING ('C:\highmemdir)
      BUFFERPOOL highmem

Note that the page size of the tablespace must match the page size of the bufferpool, and that the bufferpool is identified by name. If you only create one table in this tablespace, and this tablespace is the only one in the bufferpool, you increase the chance that data in this table will stay in memory as it is accessed. Sorts on the table may still spill, so ensure that you have a system temporary tablespace with a matching page size created:

CREATE SYSTEM TEMPORARY TABLESPACE highmem_temp PAGESIZE 8K
MANAGED BY SYSTEM
USING ('C:\highmemtemp') BUFFERPOOL highmem

Now you're ready to create your table in the tablespace:

create table memory_hog (col1 int) in highmem_tbsp



Back to top


AWEsome or AWEful? You be the judge.

Windows 2000 is able to map data above the 4 GB line in the 32-bit world, through Microsoft Address Windowing Extensions (AWE). DB2 Version 7.2 (or Version 7.1 with Fixpak 3) supports this. Windows 2000 Advanced Server provides support for up to 8 GB of memory, while Windows 2000 Data Center Server provides support for up to 64 GB of memory.

In the final analysis, you may need to make a choice that goes against the grain of high performance database design. Presumably, the database you would consider for the techniques above requires optimum performance – after all, why put it on a machine with lots of memory if performance isn't critical? Such databases tend to have user data stored in Database Managed Storage (DMS), using raw devices for containers. The conventional wisdom says: If you manage one database with a high profile and demanding users, devote a lot of time to tuning and planning so you can predict data growth and allocate adequate space on the raw device containers. In this mindset, System Managed Storage (SMS) table spaces are appropriate only for system data (temporary table spaces and the catalog table space) and databases that have to be low-maintenance because they are too numerous to warrant a lot of investment in design and maintenance.

What if you had the worst of both worlds: a database that had to be fast, but one where you were unable to predict data growth or spend a lot of time on maintenance? You might question why you wouldn't spend DBA resources on something if it's important, but bosses aren't always logical. Luckily, relational database architects are not the only ones spending a lot of time deciding how to use large memory machines: the operating system mavens study this problem as well. You'll need to make a leap of faith, putting all the data in SMS table spaces and using the file system for containers. (You can do this with DMS as well, but you'll still have to plan for data growth when you define the file sizes for the containers). Now it's the operating system's job to look at all that memory above the 4 GB line and use it for file system caching. Fortunately, with a prevailing number of databases running on large SMPs and able to achieve their performance goals by leveraging applications, operating systems designers are motivated to tap the potential of large memory machines to keep frequently accessed data in memory. And that's exactly what a database bufferpool is supposed to do: keep the data you access often in memory and leave the rest on disk only.



Back to top


Don't try this at home! (Do it at work, on the boss' system.)

It's probably wise to conclude an article on memory by telling you not to forget the rest of the system. After all, if you have 96 GB of memory, you probably have more than 96 GB of data sitting on disk -- waiting to overwhelm your system and send it into paging purgatory. The focus of this article has been on using as much memory as possible. After all, you paid for it, so why not use it and get your money's worth? In the spirit of this philosophy, don't forget the rest of the hardware you bought. You want the software to use all the memory, but you also want all processors to be doing their share -- and all disks as well. Just as an application restricted to 640 K on a 1 MB PC is underutilizing the available resources (and is therefore slower than it might be), if one of your processors is running at 100%, and the rest are at 25%, you have a dilemma -- a workload that is CPU-bound on one processor, while the rest of the processors do nothing. (A scenario not unlike those construction sites you see where one guy is shoveling while his coworkers stand around and watch.) This is where DB2 EEE is your ally; its shared-nothing architecture is designed to neatly slice all work and data equally across the memory, CPUs, and disks available to you. For this reason, the shared-nothing architecture is ideal for decision support. For transactional workloads, you have to watch for hot spots: if a subset of CPUs or disks is working too hard, why is that happening? Are heavyweight clients all connecting to the same node? Fortunately, EEE allows you to spread client connections across all nodes. Should you move data into many small tables on the nodes where they get updated rather than a large table hashed across all nodes? The downside of small tables means using UNIONs when you want a result set that needs rows from every node. Also, don't be afraid to run OLTP on a shared-nothing architecture. At the writing of this article (April, 2001) the top six TPC-C results were all on clustered shared-nothing databases.

Workload balancing also means spreading table spaces out across your disks, so that multiple disks can return data at the same time and thus enable parallel I/O. Here, we have creative tension between three variables: disks, memory, and processors. With any luck, you have plenty of each, and you can balance the workload among them to build a system that's worth the time you've spent on it and the money you've invested in it. As for the hardware that doesn't get used? Well, at least it allowed a hardware sales rep to score the big commission that paid for his vacation to Hawaii. And, if nothing else, should you ever find yourself in Hawaii fishing while your little boat gets buffeted by the surf, that extra hardware plus a piece of rope will make a great anchor.

This article is originally published on DB2 Developer Domain, and must be reprinted with permission. The views expressed are those of the author, and not IBM.


IBM, AIX, and DB2 are trademarks or registered trademarks of IBM Corporation in the United States, other countries, or both.

Windows and Windows NT are registered trademarks of Microsoft Corporation in the United States, other countries, or both.

Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Other company, product, and service names may be trademarks or service marks of others.

IBM copyright and trademark information



About the author

Blair is a seventeen-year veteran of the IBM Toronto Lab. His first SQL statement (written in 1986 using 10base, the predecessor to FoxPro) eventually led to a full time job in relational database technology at IBM, beginning in 1987. In his current role Blair manages the DB2 Service Team, with the responsibility of ensuring that DB2 customers continue to enjoy the best software support in the database industry. Blair has many years of experience with customers through stints in marketing, service, development, and management.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top