Saturday, February 16, 2008

Transaction in .Net

What is a Transaction?
Frequently in a database application you come across a situation where you need to execute two or more SQL commands in such a way that if any one of the statements fails, then no other statement will be able to change the database. The classic example of this is transferring money from a bank account into another bank account:

UPDATE Accounts SET Balance = Balance – 10 WHERE Customer = 1;
UPDATE Accounts SET Balance = Balance + 10 WHERE Customer = 2;

If the first SQL statement was to execute and the second SQL statement was to fail, then ten dollars would be removed from the first customer’s account but will never be added to the second customer’s account. This would, naturally, be bad – ten dollars has just disappeared entirely.

One way to deal with this would be to handle the situation in your database access code. You could do this by catching System.Data.SqlClient.SqlException when you do your database access. The situation is not as easy as it first appears, however. The SQL statements could have failed in a number of places:

Before the first SQL statement is executed.
After the first SQL statement is executed.
After the second SQL statement is executed.
This means that you have to determine what the balances of both the accounts were before the transfer and, after the exception was thrown, determine which account’s current balance doesn’t match the starting balance, and execute an UPDATE statement. This can, potentially, be a lot of code – especially if your example is non-trivial.

The situation is made even worse by the fact that SQL Server is a multi-user environment. At any given point of time, other users could be accessing the Accounts table – and if they access that table between your statement failing and your corrections to the table, they could very well access invalid data. This would be extraordinarily bad, and could lead to immensely hard to find bugs in your code. (bugs relating to data synchronization across several threads or processes are so difficult to find, that they have their own name: ‘Heisenbugs’)

Modern database theory postulates that in a perfect transaction world, a database would have a series of properties known as ACID. These properties are:

Atomic – All statements in a group must execute, or no statement in a group must execute.
Consistent – This follows naturally from atomic – a group of SQL statements must take the database from a known starting state to a known ending state. If the statements execute, the database must be at the known ending state. If the statements fail, the database must be at the known starting state.
Isolated – A group of statements must execute independently from any other statement groups being executed at the same time. If this wasn’t the case, it would be impossible for statement groups to be consistent – the known ending state could be altered by a code you have no control over or knowledge of. This is one of those concepts that are great in theory, but total isolation has important performance implications in the real world. More on how SQL Server implements this is explained later.
Durable – Once the group of SQL statements execute, the results need to be stored in a permanent media – if the database crashes right after a group of SQL statements execute, it should be possible to restore the database state to the point after the last transaction committed.
In SQL Server, ACID-ness is provided by the concept of transactions. Simply put, a transaction is a way to group SQL statements together so that, when executed, the transaction obeys the ACID principles. A transaction is enabled on a single connection to the database (over the lifetime of a SqlConnection object, in .NET parlance) and will apply to all commands executed over that connection until the transaction ends. Once you have a transaction, there are two things you can do with it. You can either commit the transaction to the database at the end of the transaction, or you can abandon the transaction and rollback the changes made in the transaction.

In terms of Transact-SQL, there are three importance commands to manage a transaction. BEGIN TRANSACTION will begin a transaction, COMMIT TRANSACTION will commit the transaction to the database, and ROLLBACK TRANSACTION will roll the transaction back. These statements can actually be a little more complicated – feel free to refer to the MSDN documentation regarding these statements.

.NET Transactions
Within .NET, transactions are managed with the System.Data.SqlClient.SqlTransaction class. Again, a transaction exists over a SqlConnection object – and thus all the SqlCommand objects you create using that connection. Let's look at a quick example:

Collapsepublic class TransactionDemo
{
public TransactionDemo()
{

}

[STAThread]
public static void Main()
{
Demo1();
}

private static void Demo1()
{
SqlConnection db = new SqlConnection("connstringhere");
SqlTransaction transaction;

db.Open();
transaction = db.BeginTransaction();
try
{
new SqlCommand("INSERT INTO TransactionDemo " +
"(Text) VALUES ('Row1');", db, transaction)
.ExecuteNonQuery();
new SqlCommand("INSERT INTO TransactionDemo " +
"(Text) VALUES ('Row2');", db, transaction)
.ExecuteNonQuery();
new SqlCommand("INSERT INTO CrashMeNow VALUES " +
"('Die', 'Die', 'Die');", db, transaction)
.ExecuteNonQuery();
transaction.Commit();
}
catch (SqlException sqlError)
{
transaction.Rollback();
}
db.Close();
}
}

As you can see from this example, we first open a connection to the SQL database. We then call the BeginTransaction method on the connection, keeping a reference to the object that it returns. At this point, the connection is bound to the SqlTransaction object that was returned. This means that any SqlCommand executed on that connection will be within the transaction. You can see, within the try block, we create and execute three SqlCommand objects. You’ll notice, though, that in this case we’re using the strings, SqlConnection, SqlTransaction, overload of the SqlCommand constructor. This is because the SqlCommand object requires passing in the transaction bound to a connection – failing to do so will cause an exception to be thrown. This is, in my opinion, a weakness of the model – since a transaction is bound on a per-connection basis, and a SqlCommand object should be able to simply pull the SqlTransaction object out of the supplied SqlConnection.

In the example above, the first two SqlCommand executes are perfectly valid – TransactionDemo exists in the database. The CrashMeNow table, however, does not. Since the table does not exist, a SqlException object will be thrown on ExecuteNonQuery object. It is important to realize that having a transaction "does not replace the standard exception handling mechanism". If you think your statements might not execute, you have to catch SqlException and, within your catch block, rollback the transaction.

There are essentially, two operations you can use on the SqlTransaction object. Rollback will cancel your transaction, undoing all the changes that have been made. Commit will cause the transaction to be written to the database permanently. Either case will end the transaction.

If you execute the code above and look at the TransactionDemo table, you’ll see that no rows have been added – the transaction was rolled back after the exception was thrown. If you remove the offending line of SQL, though, and run the program and look again, you’ll see that two rows have been added. That is, essentially, Transactions in action.

Advanced Transactions – isolation levels
This is not, though, the limit of transactions. As I stated earlier in my description of ACID properties, transactions don’t necessarily meet the strictest definition of Isolated. This is because the isolation level of a transaction is configurable by you, when you create the transaction.

Why would you want to do this? Performance. While full isolation is wonderful in theory, so was communism. In reality, exclusively locking a set of rows while a transaction works on them may be unfeasible due to performance – you don’t necessarily want to stop every reader in the system from reading a table because your transaction locked the whole thing.

To alleviate this concern .NET (through SQL Server), provides the ability to specify isolation levels when you create a transaction. Doing this simply requires supplying a System.Data.IsolationLevel value to the BeginTransaction method. The available values for SQL Server 2000 are:

ReadUncommitted – This is, essentially, no isolation. Anyone can read the data placed in a table or updated immediately after the SQL statement causes the change – no commit is required. This could lead to a process having out-of-date data: it may be using a version of the data that was then rolled back out of the table!
ReadCommitted – This is slightly more isolated. In this case, a transaction can only read data from the table that has already been committed. When a transaction wants to update data, it acquires a shared lock on that data and (if successful getting the lock) updates the data. Transactions outside of that transaction cannot update the data in that table until the locking transaction commits. This is only slightly more isolated, however: a SQL statement executed twice within a transaction could return a different result-set if a second transaction changes and commits the data the SQL statement executes on between the two statements. This is the default isolation level for SqlTransaction.
RepeatableRead – Slowly getting more isolated. In this case, a shared lock is applied on all data queried within a transaction. This means that no other transaction can alter the data used in your transaction. This prevents the case where data you had queried once changes on subsequent queries. It does not, though, prevent the case where rows are added to the table that may be returned in subsequent queries.
Serializable – Locks are placed on ranges of the tables you are using, preventing other users from changing your data or adding new rows underneath you. This is the most isolated isolation level, but it does come with the drawback of locking more data than your transaction may strictly need.
In SQL Server 2005, a new isolation level will be added: snapshot isolation. In snapshot isolation, rows are versioned once they are accessed in a transaction. This essentially means that once a transaction accesses a set of values, they are guaranteed to remain the same until you commit or rollback the transaction. Other transactions starting in the middle of the first will get a ‘copy’ of the original database to operate on. Before any transaction commits, though, SQL Server will test to ensure that the original data they were operating on is the same as the current data in the database. If this is the case, the transaction will commit. Otherwise, the transaction will be rolled back and the user will have to try the batch once again.

Conclusion
Transactions are useful for several other things. First, they provide a way to rollback a group of SQL statements if a single one should. Remember that failure can mean more than just an error being returned. A failure can also be logically related - in the example above, perhaps the account being transferred from doesn’t have enough money to do the deposit. In that case, you can roll back the transaction when you discover that fact. Secondly, they provide a way to isolate the data your transaction is working on so that you don’t have to worry about surprises. In all cases, though, you should examine what isolation level you really need and be aware of the performance implications of all of them.

No comments: