Java Persistence/Persisting
Persisting
[edit | edit source]JPA uses the EntityManager
API for runtime usage. The EntityManager
represents the application session or dialog with the database. Each request, or each client will use its own EntityManager
to access the database. The EntityManager
also represents a transaction context, and in a typical stateless model a new EntityManager
is created for each transaction. In a stateful model, an EntityManager
may match the lifecycle of a client's session.
The EntityManager
provides an API for all required persistence operations.
These include the following CRUD operations:
The EntityManager
is an object-oriented API, so does not map directly onto database SQL or DML operations. For example to update an object, you just need to read the object and change its state through its set
methods, and then call commit
on the transaction. The EntityManager
figures out which objects you changed and performs the correct updates to the database, there is no explicit update operation in JPA.
Detached vs Managed
[edit | edit source]JPA defines two main states for an object for a given persistence context, managed and detached.
A managed object is one that was read in the current persistence context (EntityManager/JTA transaction). A managed object is registered with the persistence context and the persistence context will track changes to that object and maintain its object identity. If the same object is read again, in the same persistence context, or traversed through another managed object's relationship, the same identical (==
) object will be returned. Calling persist
on a new object will also make it become managed. Calling merge on a detached object will return the managed copy of the object. An object should never be managed by more than one persistence context. An object will be managed by its persistence context until the persistence context is cleared through clear
, or the object is forced to be detached through detach
. A removed object will no longer be managed after a flush
or commit
. On a rollback
, all managed objects will become detached. In a JTA managed EntityManager
all managed objects will be detached on any JTA commit or rollback.
A detached object is one that is not managed in the current persistence context. This could be an object read through a different persistence context, or an object that was cloned or serialized. A new object is also considered detached until persist
is called on it. An object that was removed and flushed or committed, will become detached. An object could be considered both managed in the context of one persistence context, and detached in the context of another persistence context.
A managed object should only ever reference other managed objects, and a detached object should only reference other detached objects. Avoid relating or mixing detached and managed objects, this will normally lead to issues, as your application could access two copies of the same object causing loss of changes or stale data. Incorrectly relating managed and detached objects is probably one of the most common issues users run into in JPA.
Persist
[edit | edit source]The EntityManager.persist()
operation is used to insert a new object into the database. persist
does not directly insert the object into the database: it just registers it as new in the persistence context (transaction). When the transaction is committed, or if the persistence context is flushed, then the object will be inserted into the database.
If the object uses a generated Id
, the Id
will normally be assigned to the object when persist
is called, so persist
can also be used to have an object's Id
assigned. The one exception is if IDENTITY
sequencing is used, in this case the Id
is only assigned on commit
or flush
because the database will only assign the Id
on INSERT
. If the object does not use a generated Id
, you should normally assign its Id
before calling persist
.
The persist
operation can only be called within a transaction, an exception will be thrown outside of a transaction. The persist
operation is in-place, in that the object being persisted will become part of the persistence context. The state of the object at the point of the commit of the transaction will be persisted, not its state at the point of the persist
call.
persist
should normally only be called on new objects. It is allowed to be called on existing objects if they are part of the persistence context, this is only for the purpose of cascading persist to any possible related new objects. If persist
is called on an existing object that is not part of the persistence context, then an exception may be thrown, or it may be attempted to be inserted and a database constraint error may occur, or if no constraints are defined, it may be possible to have duplicate data inserted.
persist
can only be called on Entity
objects, not on Embeddable
objects, or collections, or non-persistent objects. Embeddable
objects are automatically persisted as part of their owning Entity
.
Calling persist
is not always required. If you related a new object to an existing object that is part of the persistence context, and the relationship is cascade persist, then it will be automatically inserted when the transaction is committed, or when the persistence context is flushed.
Example persist
[edit | edit source]EntityManager em = getEntityManager();
em.getTransaction().begin();
Employee employee = new Employee();
employee.setFirstName("Bob");
Address address = new Address();
address.setCity("Ottawa");
employee.setAddress(address);
em.persist(employee);
em.getTransaction().commit();
Cascading Persist
[edit | edit source]Calling persist
on an object will also cascade the persist
operation to across any relationship that is marked as cascade persist. If a relationship is not cascade persist, and a related object is new, then an exception may be thrown if you do not first call persist
on the related object. Intuitively you may consider marking every relationship as cascade persist to avoid having to worry about calling persist on every objects, but this can also lead to issues.
One issue with marking all relationships cascade persist is performance. On each persist call all of the related objects will need to be traversed and checked if they reference any new objects. This can actually lead to O(n²)
performance issues if you mark all relationships cascade persist, and persist a large new graph of objects. If you just call persist
on the root object, this is ok. However, if you call persist
on each object in the graph, then you will traverse the entire graph for each object in the graph, and this can lead to a major performance issue. The JPA spec should probably define persist
to only apply to new objects, not already part of the persistence context, but it requires persist
apply to all objects, whether new, existing, or already persisted, so can have this issue.
A second issue is that if you remove
an object to have it deleted, if you then call persist
on the object, it will resurrect the object, and it will become persistent again. This may be desired if it is intentional, but the JPA spec also requires this behavior for cascade persist. So if you remove
an object, but forget to remove a reference to it from a cascade persist relationship, the remove
will be ignored.
I would recommend only marking relationships that are composite or privately owned as cascade persist.
Merge
[edit | edit source]The EntityManager.merge()
operation is used to merge the changes made to a detached object into the persistence context. merge
does not directly update the object into the database, it merges the changes into the persistence context (transaction). When the transaction is committed, or if the persistence context is flushed, then the object will be updated in the database.
Normally merge
is not required, although it is frequently misused. To update an object you simply need to read it, then change its state through its set
methods, then commit the transaction. The EntityManager
will figure out everything that has been changed and update the database. merge
is only required when you have a detached copy of a persistence object. A detached object is one that was read through a different EntityManager
(or in a different transaction in a JEE managed EntityManager
), or one that was cloned, or serialized. A common case is a stateless
SessionBean
where the object is read in one transaction, then updated in another transaction. Since the update is processed in a different transaction, with a different EntityManager
, it must first be merged. The merge
operation will look-up/find the managed object for the detached object, and copy each of the detached objects attributes that changed into the managed object, as well as cascading any related objects marked as cascade merge.
The merge
operation can only be called within a transaction, an exception will be thrown outside of a transaction. The merge
operation is not in-place, in that the object being merged will never become part of the persistence context. Any further changes must be made to the managed object returned by the merge
, not the detached object.
merge
is normally called on existing objects, but can also be called on new objects. If the object is new, a new copy of the object will be made and registered with the persistence context, the detached object will not be persisted itself.
merge
can only be called on Entity
objects, not on Embeddable
objects, or collections, or non-persistent objects. Embeddable
objects are automatically merged as part of their owning Entity
.
Example merge
[edit | edit source]EntityManager em = createEntityManager();
Employee detached = em.find(Employee.class, id);
em.close();
...
em = createEntityManager();
em.getTransaction().begin();
Employee managed = em.merge(detached);
em.getTransaction().commit();
Cascading Merge
[edit | edit source]Calling merge
on an object will also cascade the merge
operation across any relationship that is marked as cascade merge. Even if the relationship is not cascade merge, the reference will still be merged. If the relationship is cascade merge the relationship and each related object will be merged. Intuitively you may consider marking every relationship as cascade merge to avoid having to worry about calling merge on every objects, but this is normally a bad idea.
One issue with marking all relationships cascade merge is performance. If you have an object with a lot of relationships, then each merge
call can require to traverse a large graph of objects.
Another issues arises if your detached object is corrupt in some way. For example say you have an Employee
who has a manager
, but that manager has a different copy of the detached Employee
object as its managedEmployee
. This may cause the same object to be merged twice, or at least may not be consistent which object will be merged, so you may not get the changes you expect merged. The same is true if you didn't change an object at all, but some other user did, if merge
cascades to this unchanged object, it will revert the other user's changes, or throw an OptimisticLockException
(depending on your locking policy). This is normally not desirable.
I would recommend only marking relationships that are composite or privately owned as cascade merge.
Transient Variables
[edit | edit source]Another issue with merge
is transient variables. Since merge
is normally used with object serialization, if a relationship was marked as transient
(Java transient, not JPA transient), then the detached object will contain null
, and null
will be merged into the object, even though it is not desired. This will occur even if the relationship was not cascade merge, as merge
always merges the references to related objects. Normally transient is required when using serialization to avoid serializing the entire database when only a single, or small set of objects are required.
One solution is to avoid marking anything transient
, and instead use LAZY
relationships in JPA to limit what is serialized (lazy relationships that have not been accessed, will normally not be serialized). Another solution is to manually merge in your own code.
Some JPA providers provide extended merge
operations, such as allowing a shallow merge or deep merge, or merging without merging references.
Remove
[edit | edit source]The EntityManager.remove()
operation is used to delete an object from the database. remove
does not directly delete the object from the database, it marks the object to be deleted in the persistence context (transaction). When the transaction is committed, or if the persistence context is flushed, then the object will be deleted from the database.
The remove
operation can only be called within a transaction, an exception will be thrown outside of a transaction. The remove
operation must be called on a managed object, not on a detached object. Generally you must first find
the object before removing it, although it is possible to call EntityManager.getReference()
on the object's Id
and call remove on the reference. Depending on how you JPA provider optimizes getReference
and remove
, it may not require reading the object from the database.
remove
can only be called on Entity
objects, not on Embeddable
objects, or collections, or non-persistent objects. Embeddable
objects are automatically removed as part of their owning Entity
.
Example remove
[edit | edit source]EntityManager em = getEntityManager();
em.getTransaction().begin();
Employee employee = em.find(Employee.class, id);
em.remove(employee);
em.getTransaction().commit();
Cascading Remove
[edit | edit source]Calling remove
on an object will also cascade the remove
operation across any relationship that is marked as cascade remove.
Note that cascade remove only effects the remove
call. If you have a relationship that is cascade remove, and remove an object from the collection, or dereference an object, it will not be removed. You must explicitly call remove
on the object to have it deleted. Some JPA providers provide an extension to provide this behavior, and in JPA 2.0 there will be an orphanRemoval
option on OneToMany
and OneToOne
mappings to provide this.
Reincarnation
[edit | edit source]Normally an object that has been removed, stays removed, but in some cases you may need to bring the object back to life. This normally occurs with natural ids, not generated ones, where a new object would always get an new id. Generally the desire to reincarnate an object occurs from a bad object model design, normally the desire to change the class type of an object (which cannot be done in Java, so a new object must be created). Normally the best solution is to change your object model to have your object hold a type object which defines its type, instead of using inheritance. But sometimes reincarnation is desirable.
When done in two separate transactions, this is normally fine, first you remove
the object, then you persist
it back. This can be more complex if you wish to remove
and persist
an object with the same Id
in the same transaction. If you call remove
on an object, then call persist
on the same object, it will simply no longer be removed. If you call remove
on an object, then call persist
on a different object with the same Id
the behavior may depend on your JPA provider, and probably will not work. If you call flush
after calling remove
, then call persist
, then the object should be successfully reincarnated. Note that it will be a different row, the existing row will have been deleted, and a new row inserted. If you wish the same row to be updated, you may need to resort to using a native SQL update query.
Advanced
[edit | edit source]Refresh
[edit | edit source]The EntityManager.refresh()
operation is used to refresh an object's state from the database. This will revert any non-flushed changes made in the current transaction to the object, and refresh its state to what is currently defined on the database. If a flush
has occurred, it will refresh to what was flushed. Refresh must be called on a managed object, so you may first need to find
the object with the active EntityManager
if you have a non-managed instance.
Refresh will cascade to any relationships marked cascade
refresh, although it may be done lazily depending on your fetch type, so you may need to access the relationship to trigger the refresh.
refresh
can only be called on Entity
objects, not on Embeddable
objects, or collections, or non-persistent objects. Embeddable
objects are automatically refreshed as part of their owning Entity
.
Refresh can be used to revert changes, or if your JPA provider supports caching, it can be used to refresh stale cached data.
Sometimes it is desirable to have a Query
or find
operation refresh the results. Unfortunately JPA 1.0 does not define how this can be done. Some JPA providers offer query hints to allow refreshing to be enabled on a query.
- TopLink / EclipseLink : Define a query hint
"eclipselink.refresh"
to allow refreshing to be enabled on a query.
JPA 2.0 defines a set of standard query hints for refeshing, see JPA 2.0 Cache APIs.
Example refresh
[edit | edit source]EntityManager em = getEntityManager();
em.refresh(employee);
Lock
[edit | edit source]See, Read and Write Locking.
Get Reference
[edit | edit source]The EntityManager.getReference()
operation is used to obtain a handle to an object without requiring it to be loaded. It is similar to the find
operation, but may return a proxy or unfetched object. JPA does not require that getReference
avoid loading the object, so some JPA providers may not support it and just perform a normal find operation. The object returned by getReference
should appear to be a normal object, if you access any method or attribute other than its Id
it will trigger itself to be refreshed from the database.
The intention of getReference
is that it could be used on an insert or update operation as a stand-in for a related object, if you only have its Id
and want to avoid loading the object.
Note that getReference
does not verify the existence of the object as find
does. If the object does not exist and you try to use the unfetched object in an insert or update you may get a foreign key constraint violation, or if you access the object it may trigger an exception.
Example getReference
[edit | edit source]EntityManager em = getEntityManager();
Employee manager = em.getReference(Employee.class, managerId);
Employee employee = new Employee();
...
em.persist(employee);
employee.setManager(manager);
em.commit();
Flush
[edit | edit source]The EntityManager.flush()
operation can be used to write all changes to the database before the transaction is committed. By default JPA does not normally write changes to the database until the transaction is committed. This is normally desirable as it avoids database access, resources and locks until required. It also allows database writes to be ordered, and batched for optimal database access, and to maintain integrity constraints and avoid deadlocks. This means that when you call persist
, merge
, or remove
the database DML INSERT, UPDATE, DELETE
is not executed, until commit, or until a flush is triggered.
The flush()
does not execute the actual commit
: the commit
still happens when an explicit commit()
is requested in case of resource local transactions, or when a container managed (JTA) transaction completes.
Flush has several usages:
- Flush changes before a query execution to enable the query to return new objects and changes made in the persistence unit.
- Insert persisted objects to ensure their
Id
s are assigned and accessible to the application if usingIDENTITY
sequencing. - Write all changes to the database to allow error handling of any database errors (useful when using JTA or SessionBeans).
- To flush and clear a batch for batch processing in a single transaction.
- Avoid constraint errors, or reincarnate an object.
Example flush
[edit | edit source]public long createOrder(Order order) throws ACMEException {
EntityManager em = getEntityManager();
em.persist(order);
try {
em.flush();
} catch (PersistenceException exception) {
throw new ACMEException(exception);
}
return order.getId();
}
Clear
[edit | edit source]The EntityManager.clear()
operation can be used to clear the persistence context. This will clear all objects read, changed, persisted, or removed from the current EntityManager
or transaction. Changes that have already been written to the database through flush
, or any changes made to the database will not be cleared. Any object that was read or persisted through the EntityManager
is detached, meaning any changes made to it will not be tracked, and it should no longer be used unless merged into the new persistence context.
clear
can be used similar to a rollback to abandon changes and restart a persistence context. If a transaction commit fails, or a rollback is performed the persistence context will automatically be cleared.
clear
is similar to closing the EntityManager
and creating a new one, the main difference being that clear
can be called while a transaction is in progress. clear
can also be used to free the objects and memory consumed by the EntityManager
. It is important to note that an EntityManager
is responsible for tracking and managing all objects read within its persistence context. In an application managed EntityManager
this includes every objects read since the EntityManager
was created, including every transaction the EntityManager
was used for. If a long lived EntityManager
is used, this is an intrinsic memory leak, so calling clear
or closing the EntityManager
and creating a new one is an important application design consideration. For JTA managed EntityManager
s the persistence context is automatically cleared across each JTA transaction boundary.
Clearing is also important on large batch jobs, even if they occur in a single transaction. The batch job can be slit into smaller batches within the same transaction and clear
can be called in between each batch to avoid the persistence context from getting too big.
Example clear
[edit | edit source]public void processAllOpenOrders() {
EntityManager em = getEntityManager();
List<Long> openOrderIds = em.createQuery("SELECT o.id from Order o where o.isOpen = true");
em.getTransaction().begin();
try {
for (int batch = 0; batch < openOrderIds.size(); batch += 100) {
for (int index = 0; index < 100 && (batch + index) < openOrderIds.size(); index++) {
Long id = openOrderIds.get(batch + index);
Order order = em.find(Order.class, id);
order.process(em);
}
em.flush();
em.clear();
}
em.getTransaction().commit();
} catch (RuntimeException error) {
if (em.getTransaction().isActive()) {
em.getTransaction().rollback();
}
}
}
Close
[edit | edit source]The EntityManager.close()
operation is used to release an application managed EntityManager
's resources. JEE JTA managed EntityManager
s cannot be closed, as they are managed by the JTA transaction and JEE server.
The life-cycle of an EntityManager
can last either a transaction, request, or a users session. Typically the life-cycle is per request, and the EntityManager
is closed at the end of the request. The objects obtained from an EntityManager
become detached when the EntityManager
is closed, and any LAZY
relationships may no longer be accessible if they were not accessed before the EntityManager
was closed. Some JPA providers allow LAZY
relationships to be accessed after close.
Example close
[edit | edit source]public Order findOrder(long id) {
EntityManager em = factory.createEntityManager();
Order order = em.find(Order.class, id);
order.getOrderLines().size();
em.close();
return order;
}
Get Delegate
[edit | edit source]The EntityManager.getDelegate()
operation is used to access the JPA provider's EntityManager
implementation class in a JEE managed EntityManager
. A JEE managed EntityManager
will be wrapped by a proxy EntityManager
by the JEE server that forwards requests to the EntityManager
active for the current JTA transaction. If a JPA provider specific API is desired the getDelegate()
API allows the JPA implementation to be accessed to call the API.
In JEE a managed EntityManager
will typically create a new EntityManager
per JTA transaction. Also the behavior is somewhat undefined outside of a JTA transaction context. Outside a JTA transaction context, a JEE managed EntityManager
may create a new EntityManager
per method, so getDelegate()
may return a temporary EntityManager
or even null
. Another way to access the JPA implementation is through the EntityManagerFactory
, which is typically not wrapped with a proxy, but may be in some servers.
In JPA 2.0 the getDelegate()
API has been replaced by the unwrap()
API which is more generic.
Example getDelegate
[edit | edit source]public void clearCache() {
EntityManager em = getEntityManager();
((JpaEntityManager)em.getDelegate()).getServerSession().getIdentityMapAccessor().initializeAllIdentityMaps();
}
Unwrap (JPA 2.0)
[edit | edit source]The EntityManager.unwrap()
operation is used to access the JPA provider's EntityManager
implementation class in a JEE managed EntityManager
. A JEE managed EntityManager
will be wrapped by a proxy EntityManager
by the JEE server that forwards requests to the EntityManager
active for the current JTA transaction. If a JPA provider specific API is desired the unwrap()
API allows the JPA implementation to be accessed to call the API.
In JEE a managed EntityManager
will typically create a new EntityManager
per JTA transaction. Also the behavior is somewhat undefined outside of a JTA transaction context. Outside a JTA transaction context, a JEE managed EntityManager
may create a new EntityManager
per method, so getDelegate()
may return a temporary EntityManager
or even null
. Another way to access the JPA implementation is through the EntityManagerFactory
, which is typically not wrapped with a proxy, but may be in some servers.
Example unwrap
[edit | edit source]public void clearCache() {
EntityManager em = getEntityManager();
em.unwrap(JpaEntityManager.class).getServerSession().getIdentityMapAccessor().initializeAllIdentityMaps();
}