Wednesday 26 December 2012

An Introduction To MySQL Storage Engines


-->
Introduction
MySQL provides support for thirteen different storage engines which act as varying table type handlers. Most people who use MySQL on a regular basis already know about the two most common storage engines, MyISAM and InnoDB. Most of the time, the default storage engine as defined by the store_engine option in the MySQL config file is typically MyISAM, and this is usually what most people go with. In fact, many people do not even take the time to select a storage-engine, and just use the default. You can also assign a storage engine to a specific table with the following syntax: CREATE TABLE tablename (column1, column2, [etc...]) ENGINE = [storage_engine].

What is a Storage Engine?
For those of you who do not know, a storage engine is what stores, handles, and retrieves information from a table. There is no "perfect" or "recommended" storage engine to use, but for most applications the default MyISAM is fine. In MySQL there are 10 different storage engines, though all of them may not be available to you. To get a list of supported storage engines on your 
MySQL Server, you can do:

mysql> show engines;MyISAM
MyISAM is the default storage engine of MySQL until version 5.5. This engine is an implementation of the well defined database storage ISAM architecture and has been available since MySQL 3.x.
 

Key features
• Non transactional
• Fast insert rate
• Btree based indexes
• Supports FULLTEXT index
• 16k data pages
• 4k index pages
 

Limitations
• Does not support transactions
• Table level locking on DML and DDL statements
• Not crash safe
• Relies on File System level cache to cache data pages
 

Important Parameters
• key_buffer_size This buffer is used for holding data from MyISAM indexes. MyISAM can support multiple index caches and enables pinning of specific table indexes per named buffer.
• table_cache This buffer holds information of open tables that are used when running queries. Will applicable to all storage engines, due to additional files, it is important to tune this parameter when there a lot of tables and complex queries.
• bulk_insert_buffer_size This buffer is used for improving INSERT statements with a large number of VALUES or INSERT . . . SELECT as well as LOAD DATA INFILE.
• myisam_recover This parameter defines the storage engine default recovery mode. A recommended value is FORCE, BACKUP
A MyISAM table is represented as 3 separate files in the file system located in the defined data directory for the MySQL instance. These files are:
 

• table.frm This is the table format definition file
• table.MYD This is the MyISAM data file
• table.MYI This is the MyISAM index file

MyISAM has three different row formats. By default MyISAM will determine whether to use Fixed or Dynamic format based on the column definitions specified in the table. The Fixed row format provides a calculation to determine the offset of the row within the data file, and can provide a small improvement in performance. You can force MyISAM to used Fixed format however this can lead to an increased disk footprint.  

InnoDB:
Compared to MyISAM, InnoDB provides many more feature to increase performance. There is some additional time spent during initial setup, as opposed to MyISAM, but the benefits far outweigh the time spent. One major difference is the ability to do row-level locking, as opposed to table-level locking, to increase performance time. This allows parallel INSERT/UPDATE/DELETE queries to be ran on the same table, unlike MyISAM where each query has to wait its turn to run.

Additionally, InnoDB provides foreign key functionality. This allows you to ensure that dependent data in table one is present before inserting data into table two. Likewise, it prevents data in table one from being deleted if there is data in table two that depends on it.

InnoDB also provides caching for data and indexes in memory, as well as on disk, which provides a large increase in performance gain. For those low on RAM this might not be the ideal solution, but with memory as cheap as it is this is becoming less of an issue.

Memory Storage Engine:
The Memory storage engine, also known historically as the Heap storage engine as the name implies is an in memory only table that does not provide data persistence. The Memory storage engine is actually used internally by the mysql kernel when a temporary table is required.
 

Key features
• Very fast, in memory
• Non transactional
• Support the hash index by default
• Btree indexes are also supported
• Ideal for primary key lookups

Limitations
• Does not support transactions
• Table level locking on DML and DDL statements
• Data is not persistent
• Does not support TEXT/BLOB data types
• Fixed row width
• No ability to limit the total amount of memory for all Memory tables

Important Parameters
• max_heap_table_size – Defines the maximum size of a single Memory table.
• tmp_table_size – Defines the maximum size of the table when used for internally temporary tables.
• init_file – Defines a SQL file of commands that are executed when the MySQL instance is started. Used as a means to seed Memory tables.

Memory tables are confined to a maximum table size as defined by the max_heap_table_size system variable. In the following example we can demonstrate the table full error message, and correct the maximum size, based on our size of the comparison Fixed MyISAM table size to show the data in a Memory table

Blackhole Storage Engine:
The Blackhole storage engine represents the same general characteristics of it’s celestial namesake. Information is accepted via DML statements, however this information cannot be retrieved as it is never actually stored.

Key features
Very fast
No disk space requirements
Ideal for benchmarking
Row level locking

Limitations
Does not support transactions
Does not store data

Important Parameters
N/A

While it may seem that this table is impractical in a production MySQL system it does serve specific benefits in isolated cases. The two primary reasons for using Blackhole are for security and for replication performance. When important information is stored in the MySQL table outside a firewall for example credit card details on a public website, the use of Blackhole and replication can ensure this data is never actually stored in the external database, accessible for possible attack. When used in replication, it can be used in a relay slave to speed up performance. It may also be used for selective tables on a master or slave servers to preserve the structure of the database schema and DML statements, however the data is never stored.

MERGE

The MERGE storage engine was added in MySQL 3.23.25. It enables users to have a collection of identical MyISAM tables to be handeled by a single table. There are constraints to this type, such as all tables needing to have the same definition, but the usefullness here quickly becomes apparently. If you store sales transactions on your site in a daily table, and want to pull a report for the month, this would allow you to execute a query against a single table to pull all sales for the month.

BDB (BerkeleyDB)

The BDB handles transaction-safe tables and uses a hash based storage system. This allows for some of the quickest reading of data, especially when paired with unique keys. There are, however, many downfalls to the BDB system, including the speed on un-index rows, and this makes the BDB engine a less than perfect engine choice. Because of this, many people tend to overlook the BDB engine. I feel, however, that it does have a place in database design when the right situation calls for it.

EXAMPLE

This storage engine was added in MySQL 4.1.3. It is a "stub" engine that serves no real purpose, except to programmers. EXAMPLE provides the ability to create tables, but no information can be inserted or retrieved. 

CSV

CSV, added in MySQL 4.1.4, stores data in text files using comma-separated values. As such this is not an ideal engine for large data storage, tables requiring indexing, etc. The best use-case for this is transferring data to a spreadsheet for later use. 

No comments:

Post a Comment