FAQ

From NeoWiki

Jump to: navigation, search

Contents

[edit] Frequently Asked Questions

[edit] Why do I get an OutOfMemoryError injecting data?

You are most likely creating too many new nodes, relationships and properties in a single transaction. Try to split the injection in more than one transaction. If you still get OutOfMemoryError it is probably because you are not calling finish() on the top-level transaction. It can also be that you have too big transactions.

[edit] How do I query/search for a property?

You need the index component. See http://components.neo4j.org/neo4j-index/

[edit] Why do I get a RuntimeException "Could not create data source..."?

You already have a running Neo4j instance to the same physical Neo4j database, i.e. the same physical path. Look at the thrown IllegalStateException and you should see the path in any of the nested exception messages.

[edit] How concurrent is Neo4j, when are locks taken?

By default concurrent reads can be performed, even if the data being read is modified in some other transaction. Concurrent writes to the same nodes or relationships will have to wait for each other. This means no locks during reads while a write operation will take a lock on the node or relationship being modified (and hold that lock until the transaction is committed or rolled back). Read more.

[edit] Why are my changes to the graph not persisted?

You either forgot to call tx.success() before transaction tx.finish() or tx.failure() has been invoked to mark the transaction as rollback only. Read more.

[edit] I am injecting a big data set into Neo4j and injection speed is not that fast. Why?

Neo4j is written to be fast for 1) many concurrent reads and 2) smaller concurrent transactional updates which is the common use case for most applications. Try to group more operations in a single transaction to get higher injection speed. This is typically only a problem during testing (I need to load all this data to test something) and not in production when data growth will be more suited for smaller transactional updates. Maybe the batch inserter can help you here.

[edit] How fast are all the different Neo4j API operations?

Characteristics are

  • constant time for add/remove/get property and create/delete/get node or relationship
  • linear time for getting relationships on a node.

Speed will be very much dependent on hardware but using todays standard hardware should result in 1000-3000 traversals or property gets per ms (reads) and about 10-100 inserts/updates per ms (writes). Modifying transactions comes with a lot of overhead, so when performing few operations in each transaction your write speed will drop dramatically and be linked to how fast your storage media is at performing flush operations.

Further details of performance tuning are explains in the Neo Performance Guide

[edit] How big graph can machine X handle?

With normal rotating media here are some guidelines: Laptop 1-2 GB RAM handles tens of millions of primitives. A standard server 4-8 GB RAM handles hundreds of millions of primitives. More expensive servers with 16-32GB RAM can handle billions of primitives. With Solid State Drives (SSDs) you can handle larger graphs on less RAM.

[edit] How does Neo4j play in an OSGi environment?

All components in APOC and Neo4j itself are behaving nicely in an OSGi environment - the jars are packaged as bundles, there is a sample test OSGi IMDB application in the examples section [1]. However, right now the API for Neo4j is not really tightened up, so from in the neo4j-kernel.jar not only the org.neo4j.graphdb package but all internal packages are exposed. This will be straightened out in a future release.

[edit] Can I access a running Neo4j instance from more than one machine?

Neo4j is at it's core an in-process database, accessible only from the JVM it runs in. However, with remote-graphdb, you can via RMI and a VERY similar client API use the same functionality even from other clients.

[edit] What ways of encoding/holding metadata and type of nodes and properties are there?

There are a number of interesting approaches to this, involving both holding the metadata in the graph and outside the graph (in code):

  • Use the navigational context

This approach builds on the basic assumption that you know the type of the properties if you know the type that the node represents an instance of (the "type of the node"). The type of the node is deduced from how the node is reached. From a node of a known type you know the type of each node at the other end of a relationship based on the type of the relationship. This means that given a start node if a known type you will know the type of all nodes you can reach from it. In order to know the type of a start node you can use different indexes for different types, so that the nodes in one specific index always represents the same type.

To differentiate between subtypes some of the other approches can be used.

See http://lists.neo4j.org/pipermail/user/2008-October/000848.html

  • RDF and OWL

basically, every node will maintain a relationship to its type node (your shadow node), something like x?--RDF:TYPE-->type_node which contains info on what the type is, what properties etc.

this is the concept of describing the type of things in code (Java in this case) and thus in code enforce the restrictions and type conversions on properties through the code. This does not capture any meta info in the graph but is easy to do.

  • Annotate the nodes with type info

in this approach, there is a "type" or "classname" property on any node that is used to derive the type to deserialize/serialize the object into, the rest of the meta info is contained in the upper code layers. Andreas Ronges JRuby bindings are using this approach.

  • Encode everything into a String property

this approach means shuffling everything into a string property, basically treating properties as BLOBs. Works in some cases, but certainly locks down your data in these properties.

[edit] How can I get the total number of nodes and relationships currently in Neo4j?

For the time being you can use the following non-official API:

EmbeddedGraphDatabase.getConfig().getNeoModule().getNodeManager().getNumberOfIdsInUse(Class);

where the class would be Node.class, Relationship.class or PropertyStore.class

Personal tools