Design Guide
From NeoWiki
So you have read through and understood the Getting Started In One Minute Guide, completed the Getting Started Guide and have a basic understanding of the Neo API but still don't really know how to begin using Neo in your project? Then this is the guide for you. In this document we will provide you with howtos and design guidelines on how to deal with a number of common scenarios that you will be facing in a normal project.
Over the summer of 2008, we're going to make a real effort to shape up our documentation. As part of that project, we will retrofit this guide into several other documents. We've decided to leave it here until then, but please note that there may still occassionally be some "raw ends." If you find something you disagree with, please join in the discussions on the mailing list or feel free to edit it here directly.
Expect this document to continue to grow with more interesting topics as they emerge, and if you think there are topics that deserves attention, feel free to suggest them.
Contents |
[edit] How to wrap nodes in POJOs
- POJO is an acronym for Plain Old Java Object
[edit] Basic structure
Let us consider an example. You are building a new shiny community website with e-commerce functionality and this time want to store all your customers in Neo. You have concluded that you need a Customer interface to represent your customers. So how do we implement this in Neo?
public interface Customer
{
public void setFirstName( String firstName );
public String getFirstName();
// ...
}
A common pattern is to wrap a Neo node within a POJO that implements the domain interface. In our Customer example, this means:
import org.neo4j.api.core.Node;
public class CustomerImpl implements Customer
{
private final Node underlyingNode;
private static final String KEY_FIRST_NAME = "firstName";
CustomerImpl( Node underlyingNode )
{
this.underlyingNode = underlyingNode;
}
public void setFirstName( String firstName )
{
underlyingNode.setProperty( KEY_FIRST_NAME, firstName );
}
public String getFirstName()
{
return ( String ) underlyingNode.getProperty( KEY_FIRST_NAME );
}
// ...
}
So we have a POJO called CustomerImpl that wraps a Neo node. CustomerImpl is a wrapper object that delegates all operations to the underlying node.
The code that creates the Customer object is responsible for first creating a new Node and supplying it in the constructor when creating the Customer object. You might prefer centralizing the creational code in a factory rather than spreading it around in the application. See the section below for more tips on how to do that.
[edit] How to reach the enclosing POJO from the node instance
So you have a Node instance and you know what type it's supposed to be, but don't know how to transform it? Then maybe you've been guilty of adding state to your POJOs. Because if no state exists in your POJO other than the Node itself all you have to do is:
Customer customer = new CustomerImpl( node );
[edit] Coding relationships between objects
Now consider the possibility that your customers buy things on your website. You want to model this by creating an Order object and connecting that object to the Customer.
public interface Order
{
public void setOrderNumber(long orderNumber);
public long getOrderNumber();
public Customer getCustomer();
}
The Order implementation would have the underlying node and implement the set/getOrderNumber() the same way as shown in Customer implementation for set/getFirstName() but what about getCustomer() method? How do we connect an order to a customer? By using relationships! We want to create a relationship as illustrated by the ASCII art below:
(Customer Node)---CUSTOMER_TO_ORDER--->(Order Node)
This is the same as saying that the Customer has a relationship to a node representing an Order, meaning the Customer "ordered" the Order.
To code this, we first need to define the CUSTOMER_TO_ORDER relationship type in our realization of the RelationshipType interface (adding of relationship types has been covered in Getting_Started_Guide or API documentation for NeoService).
import org.neo4j.api.core.Node;
public class OrderImpl implements Order
{
private final Node underlyingNode;
// ...
public Customer getCustomer()
{
Node customerNode = underlyingNode.getSingleRelationship(
RelationshipTypes.CUSTOMER_TO_ORDER, Direction.INCOMING ).getStartNode();
return new CustomerImpl( customerNode );
}
}
So what happens here is every time a call is made to getCustomer() in OrderImpl, it retrieves an incoming relationship of type CUSTOMER_TO_ORDER and fetches the starting node of that relationship. The starting node on a relationship of type CUSTOMER_TO_ORDER represent a customer so we can just create the customer passing that node into the CustomerImpl constructor.
What if there is no such relationship to our Order node? getSingleRelationship() will return null and we'll get a NullPointerException! In this example, it seems reasonable that an Order has exactly one Customer (never zero, never two). And if for some reason this is not the case, it represents a fatal error that should generate an unchecked exception. getSingleRelationship() is a convenience method designed for this commonly occuring scenario and thus does the Right Thing™ by returning null.
- Actually
getSingleRelationship()behaves correctly "out-of-the-box" in two scenarios:
- The first is when we have a domain invariant stating that a concept should at all times have an association to exactly ONE instance of another concept. This is true of our Customer/Order example where, if we find ourselves without a Customer for an Order, we're in big trouble and should raise an unchecked exception. (See item 40, Effective Java.) In this scenario, we simply use
getSingleRelationship()and assume that it will return a valid relationship. If it returns null, well, there we have our unchecked exception.
- The second is when we have either ZERO or ONE associations to another concept. You could imagine in our example that an Order could optionally be put in a Cart. Sometimes it wouldn't (ZERO relationships), sometimes it would (ONE relationship) but it would never belong to two or more Carts. In this scenario, we use
getSingleRelationship()as above but inject a check for whether it returnsnulland handles that appropriately. IfgetSingleRelationship()finds more than one relationship of the given type and direction, it will raise an unchecked exception.
For more information, see the API documentation of getSingleRelationship().
[edit] Adding a relationship between wrapped nodes
A common need is to connect two objects. For example, we would need to connect an Order object to a Customer object.
Example:
To add an Order to a customer object we define the following method in the Customer Interface:
public void addOrder( Order order );
As we have described above, we have a relationship type CUSTOMER_TO_ORDER that is used to relate a Customer node to an Order node. So our implementation should then look something like:
public void addOrder( Order order )
{
underlyingNode.createRelationshipTo( orderNode, RelationshipTypes.CUSTOMER_TO_ORDER );
}
But wait, we have a problem here: We actually do not have access to the orderNode since we have the wrapped Order object, not the node.
This problem will be so frequent that we will have to make the inner node available.
Solution: As long as you place all Neo wrapper classes in the same package, we can define the following method in them:
Node getUnderlyingNode()
{
return underlyingNode;
}
This makes the underlying node available to all classes in the inheritance hierarchy and in the same package as the class. With this in mind, we can now rewrite our addOrder method so that it works:
public void addOrder( Order order )
{
Node orderNode = ( ( OrderImpl ) order ).getUnderlyingNode();
getUnderlyingNode().createRelationshipTo( orderNode, RelationshipTypes.CUSTOMER_TO_ORDER );
}
[edit] Going from collection of nodes to collections of wrappers
Going from a node to its enclosing wrapper POJO is trivial with a "properly" designed POJO domain layer, as displayed elsewhere in this section. Going from a collection of nodes to a collection of their wrapper POJOs is equally easy, but by necessity requires a bit more code. The basic idea is to iterate through the collection and create a wrapper for every node and put into a new collection.
For example, if we wanted to provide a method that returns all orders from a specific customer, it could look something like this:
public Collection<Order> getOrders()
{
Collection<Node> orderNodes = getOrderNodes();
return wrapNodeCollection( orderNodes );
}
private Collection<Node> getOrderNodes()
{
Traverser traverser = underlyingNode.traverse(
Traverser.Order.BREADTH_FIRST, StopEvaluator.DEPTH_ONE,
ReturnableEvaluator.ALL_BUT_START_NODE,
RelationshipTypes.CUSTOMER_TO_ORDER, Direction.OUTGOING );
return traverser.getAllNodes();
}
private Collection<Order> wrapNodeCollection( Collection<Node> orderNodes )
{
Collection<Order> orderCollection = new LinkedList<Order>();
for ( Node node : orderNodes )
{
orderCollection.add( new OrderImpl( node ) );
}
return orderCollection;
}
However, if you're dealing with larger numbers (millions) it's more efficient to use the iterator idiom. Neo makes use of the iterator idiom internally which makes it even more efficient. The basic idea in this case is to create an Iterator which wraps a traverser (which is a java Iterable) that represents our collection. Rewritten with this in mind, the new getOrders() looks like this:
public Iterator<Order> getOrders()
{
return new Iterator<Order>()
{
private final Iterator<Node> iterator = underlyingNode.traverse(
Traverser.Order.BREADTH_FIRST,
StopEvaluator.DEPTH_ONE,
ReturnableEvaluator.ALL_BUT_START_NODE,
RelationshipTypes.CUSTOMER_TO_ORDER,
Direction.OUTGOING ).iterator();
public boolean hasNext()
{
return iterator.hasNext();
}
public Order next()
{
Node nextNode = iterator.next();
return new OrderImpl( nextNode );
}
public void remove()
{
iterator.remove();
}
};
}
Does this seem like an ample opportunity for a utility that creates and manages Neo-backed POJOs and their collection views of each other? You're right. It is. But more about that later.
[edit] Checking for equality between two wrapper POJOs
Maybe you have an Order instance you wonder whether that instance is equal to some other instance. Just override equals method by properly forwarding it to the underlying node.
@Override
public boolean equals( Object obj )
{
if (obj instanceof OrderImpl)
{
return getUnderlyingNode().equals(
( (OrderImpl) obj ).getUnderlyingNode() );
}
return false;
}
@Override
public int hashCode()
{
return getUnderlyingNode().hashCode();
}
[edit] Organizing your Nodespace
Of course, there are probably many valid ways to make a good structure of your nodes for a given data model, and the good news with Neo is that the structure can easily evolve to meet new demands and requirements. A basic structure will be presented here that we believe is a convenient way to represent a traditional data model. We have in the pipe line a Neo_Meta_Model that will be included in Neo 1.1 using OWL to define your domain model.
[edit] Subreferences
In our data structure we have two major types of data: Customers and Orders. The question is how we add them to the node space. Starting out in Neo, we have a reference node, which you can think of being a known entry point into the node space. One idea is to add all Customer and Order nodes directly onto the reference node but it will have the drawback of making that node cluttered with lots of relationships as the application grows in size.
Instead we propose to create a subreference node and add it to the starting node. The subreference node will be the connection point for nodes of equal types.
Lets take our example with Customers. Instead of connecting the Customer node direcly onto the start node, we create a new node which will be our subreference node for customers.
The node space would then be:
(Start Node)---CUSTOMERS--->(Customer Subref Node)---CUSTOMER--->(Customer Node 1)
|
---CUSTOMER--->(Customer Node 2)
The benefits of this approach are:
- Easier to follow and understand the node space.
- The subreference node can be used to gather global data (about customers and orders).
We will see more of how the subreference node will be wrapped in the chapter about creating a Neo independent API.
[edit] How to create a Neo independent API
It is generally a very good idea to hide implementation details between components so that they are not tightly coupled to each other. As an example there is no point in most cases to let the business layer implementation depend on the inner workings of the data layer. A lot has been written and said about the importance of structuring your code to minimize coupling, and if you are feeling uncertain in this area we recommend spending some time reading up on the topic.
In this section, we want to highlight and show some traditional ways of doing this.
[edit] Use Interfaces
As we have already seen, we need to make our beans somewhat dependent on Neo. For example, the constructor takes a Neo Node object as parameter. So it would be a good design principle to extract all public methods into an interface, and use that interface in all calls.
[edit] Make use of Factories
In order to not spread out object creation throughout your code, it is a good idea to gather that in a few factory classes. For example, to create order classes that wraps a Neo Node, we could create a factory class called OrderFactory which has the responsibility of creating Orders.
[edit] Define an interface for the Factory
Lets define an Interface called OrderFactory.
public interface OrderFactory
{
public Order createOrder();
}
[edit] The Factory can wrap a Subreference Node
As we saw earlier we can group nodes of the same type under something we called a subreference node. This node can hold common information regarding the subnodes, for example an id counter for generating unique ids. We will see how this can be done further down.
So if we have structured our nodes under a subreference node, we could then actually wrap the subreference node inside the factory implementation for that type of object.
public class OrderFactoryImpl implements OrderFactory
{
private final NeoService neo;
private final Node orderFactoryNode;
public OrderFactoryImpl(NeoService neo)
{
this.neo = neo;
Relationship rel = neo.getReferenceNode().getSingleRelationship(
MyRelationshipTypes.ORDERS, Direction.OUTGOING);
if (rel == null)
{
orderFactoryNode = neo.createNode();
neo.getReferenceNode().createRelationshipTo(orderFactoryNode,
MyRelationshipTypes.ORDERS);
} else
{
orderFactoryNode = rel.getEndNode();
}
}
public Order createOrder()
{
Node node = neo.createNode();
orderFactoryNode.createRelationshipTo(node,
MyRelationshipTypes.ORDER);
return new OrderImpl(node);
}
}
The constructor sets up the subreference node if it didn't already exist. In our case it creates a node that is related to the start node (global reference node) using an ORDERS relationship.
The createOrder method creates a new node and connects it to the subreference node using the Relationshiptype ORDER. This gives us the node graph:
(Start Node)---ORDERS--->(Orders Subref Node)---ORDER--->(Order Node)
[edit] A simple Id Generator
It's common to use internal ids for data objects. In a relational database these are often auto-generated by the database itself. In Neo this is also the case, but for Neo everything is wrapped in a node so we get uniques ids for all nodes. However, often we want to have a sequential counter for specific types of nodes, such as Customers, Orders etc.
To create an automated sequential id for orders we could extend the OrderFactoryImpl with:
private static final String KEY_COUNTER = "counter";
private synchronized long getNextId()
{
Long counter = null;
try
{
counter = (Long) orderFactoryNode.getProperty(KEY_COUNTER);
} catch (NotFoundException e)
{
// Create a new counter
counter = 0L;
}
orderFactoryNode.setProperty(KEY_COUNTER, new Long(counter + 1));
return counter;
}
The constant KEY_COUNTER holds the property name of the counter as it is found on the subreference node for the OrderFactory. The method getNextId is used to retrieve the next free never used id for orders. This method is synchronized and private so that it is only used by the createOrder method which we now have rewritten to:
public Order createOrder()
{
Node node = neo.createNode();
orderFactoryNode.createRelationshipTo(node,
MyRelationshipTypes.ORDER);
Order order = new OrderImpl(node);
order.setId(getNextId());
return order;
}
Right after creating the Order object, we fetch the next available id number and set it on the new order object (implies that the Order interface has a setId(long) method).
[edit] Summary
This is one way of making your code more structured and resiliant to change of the inner workings. If you go with this approach, your business logic should now exclusively use the general interfaces for the data objects and factories when it needs access to data.
[edit] Search
[edit] Searching by relations
Traversing nodes is where Neo shines, so we should use this as much as possible. A lot of searches is for a certain category or characteristic for an object. Basically, if you have a property for an object which has a limited set of possible values, you may consider making that property into a node space itself and then relate your object nodes to the proper value node.
For example: "Find all customers from Sweden."
Consider creating a Countries Node, where you add Country nodes.
(Start Node)--COUNTRIES-->(Countries Node)---COUNTRY-->(Sweden Node)
|
---COUNTRY-->(Denmark Node)
Now when you set the country for a customer, you set it to the country node:
(Customer Node)--LIVES_IN-->(Sweden Node)
This can be implemented the same way we did with customers and orders and when LIVES_IN relationship exist between customers and countries getting all customers from a country is easy.
public Iterable<Customer> getCustomers()
{
Iterable<Relationship> rels = countryNode
.getRelationships(MyRelationshipTypes.LIVES_IN);
ArrayList<Customer> custs = new ArrayList<Customer>();
for (Relationship rel : rels)
{
Node customerNode = rel.getStartNode();
Customer cust = new NeoCustomer(customerNode);
custs.add(cust);
}
return custs;
}
This method retrieves all relationships "LIVES_IN" where the current country is the end node. We then retrieve the start node and wrap it in a NeoCustomer object and add it to the returned list of customers.
[edit] Searching using traversing
The Traverser API is a very powerful tool that can be used to mine and modify data in the node space. For example to retrieve all customer nodes from the country Sweden we could create a simple traverser:
Traverser trav = swedenNode.traverse(Order.DEPTH_FIRST, StopEvaluator.DEPTH_ONE,
ReturnableEvaluator.ALL_BUT_START_NODE, LIVES_IN, Direction.INCOMING);
// iterate over traverser...
Or if we wanted all orders put by customers from Sweden we just add one more relationship type and modify the evaluators some:
Traverser trav = swedenNode.traverse(Order.DEPTH_FIRST, StopEvaluator.END_OF_NETWORK,
new ReturnableEvaluator()
{
public boolean isReturnableNode( TraversalPosition pos )
{
if (pos.notStartNode() && pos.lastRelationshipTraversed().isType(CUSTOMER_TO_ORDER ))
{
return true;
}
return false;
}
},
LIVES_IN, Direction.INCOMING,
CUSTOMER_TO_ORDER, Direction.OUTGOING );
// iterate over traverser...
[edit] Searching by index
Sometimes it is necessary perform a lookup of something to get an entry point in the node space. For example an administrator may enter a order id as input and in return expects details about the order tied to that id. If we have a lot of Orders traversing over them to find the right one may not be very efficient, instead we would like to add some type of index on the order id. This is an exact match lookup of the property order id and can be achieved by using various index utilities. Neo has a B-tree implementation with various index utilities under development. To access it add the following dependency to your pom:
<dependency>
<groupId>org.neo4j</groupId>
<artifactId>index-util</artifactId>
<version>0.2-SNAPSHOT</version>
</dependency>
You will now have access to org.neo4j.util.index.SingleValueIndex and org.neo4j.util.index.MultiValueIndex (see Component APIs) that implements the org.neo4j.util.index.Index interface. Each Index has a an underlying node. You pass in the underlying node in the constructor together with a name and the NeoService. If we go back to our Order example and want to index the order id we could first add a relationship from the subreference order node called INDEX.
(Start Node)---ORDERS--->(Order Subref Node)---Order--->(Order Node 1)
|
|--INDEX--->(Index Node)
So our OrderFactory implementation will instantiate a SingleValueIndex that can be used to index the order id in createOrder method and we can add a getOrderById method that will use the same map to retrieve the order.
private Index orderIndex; // ...
public Order createOrder()
{
Node node = neo.createNode();
orderFactoryNode.createRelationshipTo(node,
MyRelationshipTypes.ORDER);
long orderId = getNextId();
// add index
orderIndex.index(node, orderId);
Order order = new OrderImpl(node);
order.setId(orderId);
return order;
}
public Order getOrderById(int orderId)
{
// use index to get order
Node orderNode = index.getSingleNodeFor(orderId);
if (orderNode != null)
{
return new OrderImpl(orderNode);
}
// handle no such order id here
}
[edit] Searching by wildcard
Neo does not support wildcard search such as give me all customers with a first name that starts with A* out of the box. Instead we rely on other tools to solve that problem for us. An example of such a tool is Lucene. We are currently working on a component such as index-utils but for wildcard searching. Actually such a component already exist but is tied up in a commercial project (but we're confident that we can release that source under the AGPL also so stay tuned).
Explaining in detail how Lucene can be used to perform searches for Neo here is out of the scope of this document. Once the neo-wildcard-search-engine component is released examples of how to use it will of course be added here.
If you really need wildcard search now go ahead and have a look at the Lucene API and extend your Neo application with wildcard searching. Basically all you have to do is tell Lucene to index a field for the Neo properties and also have a field for the node id (per Lucene document).
[edit] Transaction handling
[edit] The basics
All operations that work with the node space (even read operations) must be wrapped in a transaction. Fortunately, the Transaction class makes this very easy. Here's the idiomatic use of transactions in Neo:
Transaction tx = neo.beginTx();
try
{
... // any operation that works with the node space
tx.success();
}
finally
{
tx.finish();
}
Let's walk through this example line by line. First we retrieve a Transaction object by invoking NeoService.beginTx() method. This creates a new Transaction instance which has internal state to keep track of whether the current transaction is successful. Then we wrap all operations that work with the node space in a try-finally block. At the end of the block, we invoke the tx.success() method to indicate that the transaction is successful. As we exit the block, the finally clause will kick in and tx.finish will commit the transaction if the internal state indicates success or else mark it for rollback.
If an exception is raised in the try-block, tx.success() will never be invoked and the internal state of the transaction object will cause tx.finish() to roll back the transaction. This is very important: unless success() is invoked, the transaction will fail upon finish().
Neo does not support true nested transactions. Instead, if a transaction is already running then opening a nested transaction will just hook on to the already running transaction and add all the work to that transaction. A more detail description about this can be found in Neo_Transactions.
[edit] Design of flat nested transactions
By flat nested transactions, we mean the scenario when all nested transactions are added to the scope of the top level transaction, which is what Neo supports.
The description in "The basics" section is enough to handle all cases except when an application error occurs in a nested transaction that requires a rollback of the entire transaction. If this rollback isn't automatically invoked by the Neo framework you need to handle it yourself. Just by avoiding to call the tx.success() method will not guarantee the rollback, because when the outer transaction calls the tx.success(), the whole transaction will succeed including the previous failed nested transaction.
Solution? Use Exception handling!
Handle all cases where you need to control rollback in a nested transaction with exceptions. Document that these exceptions should be treated by the caller that the transaction has failed.

