Transactions
From Neo4j Wiki
Contents |
[edit] The basics
All operations that work with the graph (be it read or write operations) must be performed in a transaction. With Neo4j you get fully ACID transactions as well.
There is no fetch/store cycle other than the transaction itself. This means that the Neo4j storage interaction cycle looks like this:
- Begin a transaction.
- Operate on the graph.
- Mark the transaction as successful (or not).
- Finish the transaction.
Consider logical units of work and performance when designing the transactions. So by setting the begin and finish points of the transactions you divide the storage interactions into logical units. Fortunately, the Transaction class makes this very easy. Here's a piece of code with the idiomatic use of transactions in Neo4j:
Transaction tx = graphDb.beginTx();
try
{
... // operations that work with the graph
tx.success();
}
finally
{
tx.finish();
}
Let's walk through this example line by line. First we retrieve a Transaction object by invoking GraphDatabaseService.beginTx() method. This creates a new Transaction instance which has internal state to keep track of whether it is successful or not. The default state is unsuccessful. Then we wrap all operations that work with the graph in a try-finally block. At the end of the block, we invoke the tx.success() method to indicate that the transaction is successful. As we exit the block, the finally clause will kick in and tx.finish() will commit the transaction (since it is now marked as successful).
Finally, transactions are thread confined.
[edit] Controlling success
Neo4j has a pessimistic approach regarding the success of a transaction in that you must explicitly set it to successful to be able to commit it. Controlling whether a transaction is successful or not is most naturally done using normal exception handling. So using the above example we can see that if there were to be thrown an exception from somewhere in the try-block, tx.success() wouldn't be called. This would cause the state of the transaction to be unsuccessful when tx.finish() was called, which in turn would cause the transaction to be rolled back.
Another slightly different way is to use the failure() method to mark the transaction as failed so that it must be rolled back. This is mostly viable in nested transactions.
Important: unless success() is invoked, the transaction will not be committed upon finish().
[edit] Nested transactions
Neo4j does not support true nested transactions. Instead, Neo4j uses Flat nested transactions. This means that if you're already in a transaction when calling beginTx() Neo4j won't begin a new transaction for you, but instead hook in to the one which is already running. What you get back is something like a placebo transaction which works like this:
- success() has no effect. It's only the top level transaction which cares about that state.
- failure() works the same as for the top level transaction - it will force the entire transaction (wherever it was started) to be rolled back when finished.
- finish() has no effect. It's only the top level transaction which can commit/roll back the transaction.
[edit] Nested transactions example 1
Here's an example of a nested transaction scenario.
public void methodA()
{
Transaction tx = graphDb.beginTx(); // real transaction
try
{
Node node1 = graphDb.createNode();
methodB();
Node node3 = graphDb.createNode()
tx.success();
}
finally
{
tx.finish(); // will commit the creation of node1 and node3 as well as node2 (created in methodB)
}
}
public void methodB()
{
Transaction tx = graphDb.beginTx(); // placebo transaction when called from methodA
try
{
Node node2 = graphDb.createNode();
tx.success(); // has no effect when called from methodA!
}
finally
{
tx.finish(); // has no effect when called from methodA!
}
}
[edit] Nested transactions example 2
Here's an example of how an exception in a nested transaction could cause the top level transaction to be rolled back.
public void methodA()
{
Transaction tx = graphDb.beginTx(); // real transaction
try
{
Node node1 = graphDb.createNode();
methodB();
Node node3 = graphDb.createNode()
tx.success();
}
catch ( SomeException e )
{
// Exception handling code of your choice
}
finally
{
tx.finish(); // will cause all changes (node1, node2 and node3) to be rolled back
// if the SomeException is thrown.
}
}
public void methodB() throws SomeException
{
Transaction tx = graphDb.beginTx(); // placebo transaction when called from methodA
try
{
Node node2 = graphDb.createNode();
if ( ... )
{
throw new SomeException( "Something went wrong" );
}
tx.success(); // has no effect when called from methodA!
}
finally
{
tx.finish(); // has no effect when called from methodA!
}
}
Note that failure() doesn't need to be used to cause a transaction to be rolled back. Here you can see how natural it is to control the success of the transaction, using normal exception handling.
[edit] Nested transactions example 3
This is like example 2, but using failure() instead of exception handling. It's not used very often and should only be used when a certain condition arises which makes it impossible/invalid to commit the transaction and you can't/won't signal that failure with an exception.
public void methodA()
{
Transaction tx = graphDb.beginTx(); // real transaction
try
{
Node node1 = graphDb.createNode();
methodB();
Node node3 = graphDb.createNode()
tx.success();
}
finally
{
tx.finish(); // will cause all changes (node1, node2 and node3) to be rolled back.
}
}
public void methodB()
{
Transaction tx = graphDb.beginTx(); // placebo transaction when called from methodA
try
{
Node node2 = graphDb.createNode();
if ( ... )
{
tx.failure();
}
// has no effect when called from methodA. Also because tx.failure() has been called!
tx.success();
}
finally
{
tx.finish(); // has no effect when called from methodA!
}
}
See #Best practices for when and where to wrap your code in transactions.
[edit] Isolation
By default a read operation (f.ex. reading a property value on a node) will always read the last comitted value and will never block. This means that a scenario like this is possible:
- Thread (T1) starts a transaction and reads a property value from node (N). (T1) then goes on to do some other work in the same transaction.
- Thread (T2) starts a transaction and changes that property value for node (N) and commits the transaction.
- (T1) reads the property value (still in the same transaction) again from node (N) which will now have the value which (T2) assigned.
A write operation will grab a write lock on the specific node or relationship and block other transactions from modifying that resource as long as the transaction lives. Such locks will be released when the transaction finishes.
You can grab and release read/write locks yourself if the situation should require it (using on standard API, see LockManager. Remember these characteristics though:
- A thread can grab a read lock on a resource (node or relationship) in the current transaction if there are no write locks taken for that resource. I.e. there can be many different threads having read locks on the same resource at any given moment. If there should be a write lock taken the thread grabbing the read lock will wait for the resource to become available.
- A thread can grab a write lock on a resource (node or relationship) in the current transaction if there are no other locks taken for that resource. If there should be a lock taken the thread grabbing the write lock will wait for the resource to become available.
- A thread can always grab a lock if it has already grabbed (and not yet released) it in the current transaction.
[edit] Deadlocks
Deadlock can occur if your code manages its own synchronization in a way that's conflicting with Neo4j. It can also occur under normal circumstances and is a state you might have to be aware of and handle in certain parts of your application. See a simple and classic example.
It can happen because the Neo4j graph database grabs write locks on resources (nodes and relationships) it modifies and holds those locks until the transaction is finished. If thread (A) want's to grab a write lock, but is held by another thread (B), thread (A) waits indefinately for it to become available. If in turn thread (A) already holds a lock which thread (B) wants, thread (B) would have to wait indefinately for it. If such a scenario would happen, Neo4j sees it and throws a DeadlockDetectedException rather than grabbing that final lock that would deadlock those two threads (A) and (B).
To avoid it you can either:
- Rewrite your code, making sure that such scenarios won't happen.
- Run your deadlock-prone code in a try-catch(DeadlockDetectedException) block and just rerun the entire transaction if such an exception is caught, like this:
for ( int i = 0; i < 10; i++ )
{
Transaction tx = graphDb.beginTx();
try
{
doGraphDbOperations();
tx.success();
}
catch ( DeadlockDetectedException e )
{
// Log this occurence of deadlock or something
}
finally
{
tx.finish();
}
}
Note that Neo4j components, such as index handles rollbacks correctly so rerunning transactions like this won't be a problem.
Both these options are viable for handling deadlocks in Neo4j.
[edit] Best practices
So you're asking: "which parts of my code should be wrapped in transactions". The answer is the boring "it depends...", but there are a few pointers which I think covers most of the cases.
[edit] Library code
For library code (code which is intended to be used by other components/apps) it's good to have all the public methods managing its own transactions, f.ex.
public class MyClass
{
public void doStuff( ... )
{
Transaction tx = graphDb.beginTx();
try
{
// Do your graph stuff here
tx.success();
}
finally
{
tx.finish();
}
}
private String myInternalMethod( ... )
{
// Do your graph stuff here, we can assume we're in a transaction
}
}
Where it's a common practice to use dependency injection to inject the GraphDatabaseService into your object. Wrapping public methods like that gives the most flexibility since code which uses such a method can call it w/ or w/o wrapping it in its own transaction depending on the situation.
[edit] Application code
For application code (top level code which uses other libraries to form a product or service of some sort) it's generally a good idea to manage transactions in a few key places so that the majority of the application code can completely ignore the transaction management. F.ex:
- In a Model-View-Controller setup a good fit is to wrap each "user action" in its separate transaction, i.e. in the Controller. This will help keeping the domain data consistent since each "user action" (often consisting of several Neo4j operations) will either completely succeed (all those operations will be committed) or not succeed at all (none of the operations will be committed). See example.
- Even if an MVC model isn't used it's generally a good idea to manage transactions at a similar level.
[edit] Big transactions
[edit] The problem
The state of all operations made in a transaction in Neo4j is kept in memory. This, of course, limits the amount of (uncommitted) operations a transaction can hold before the JVM runs out of memory. However if you're using a reasonable heap size a transaction should be able to hold several hundreds of thousands operations before such boundaries are hit.
[edit] Solutions
Problems with big transactions running out of memory only really occurs when doing large batch jobs, f.ex. when doing a first-time insertion of a big data set. In such cases the batch inserter could be the solution since it doesn't use transactions, but instead writes directly to the persistence layer.
If you can't use the batch inserter in your particular use case you'll have to instead commit your transaction regularly. It's most commonly done with a certain predefined interval, f.ex. for every 50 000 operations or so. This has the drawback that you cannot, at a late stage in your batch job, roll back the entire job since some of them has already been committed. See below code for an example of such a solution:
public void doBigBatchJob()
{
Transaction tx = graphDb.beginTx();
try
{
Traverser myTraverser = ... // Traverse some huge data set
int counter = 0;
for ( Node node : myTraverser )
{
// Do some operations, f.ex. set a property, create node/relationship
if ( ++counter % 50000 == 0 )
{
// Commit the transaction every now and then
tx.success();
tx.finish();
tx = graphDb.beginTx();
}
}
tx.success();
}
finally
{
tx.finish();
}
}
[edit] Examples
[edit] MVC example
So in an MVC environment it's quite natural to manage your transactions in the Controller. So if we're using a basic servlet environment the transactions are managed there. This will take that load of both the Model and the Views. There might be cases where views may need to manage transactions as well, f.ex. when using JSP pages which communicates directly with the model.
Assume a simple model:
public class Person
{
private final Node underlyingNode;
public Person( Node underlyingNode )
{
this.underlyingNode = underlyingNode;
}
public String getName()
{
return ( String ) underlyingNode.getProperty( "name" );
}
public void setEmailAddress( String emailAddress )
{
underlyingNode.setProperty( "email", emailAddress );
}
}
public class Controller extends HttpServlet
{
@Override
public void doGet( HttpServletRequest req, HttpServletResponse resp )
{
Transaction tx = graphDb.beginTx();
try
{
// Execute actions depending on the request, f.ex:
// 'set person P:s email address to xxx@gmail.com'
// Forward to an appropriate View.
tx.success();
}
finally
{
tx.finish();
}
}
}

