Centric connect.engage.succeed

How Entity Framework ruined our day (or did it?)

Geschreven door Arnoud van Bokkem - 05 oktober 2016

Arnoud van Bokkem
We were working on a project that used Entity Framework Code First. A new work item required us to add a few extra properties to an entity, which we wanted to tackle using table splitting. But after a day of frustration, we threw in the towel.

The problem looked simple enough. The application supported two types of projects — Consumer projects and Business projects. Since the two types shared some properties and even part of the UI, they shared a common ancestor. Our goal was to introduce project options that were specific to a business project.

You will see a simplified version of the class diagram below. Of course, both the BusinessProject and the ConsumerProject had relations to several other dependent entities that described the contents of the project. For the sake of simplicity, I have only included one of these component entities below.

We wanted to add the project options to a new entity, marked in red, that had a one-to-one relation to the existing BusinessProject entity. And because of the one-to-one relation between the business project and its options, we wanted to use the same table for both the business project and its options. In Entity Framework, this is called a table split. Unfortunately, we ran into serious problems using the table split approach, from migrations that couldn’t be created to Entity Framework not understanding that all three keys (the Project, BusinessProject and BusinessProjectOptions IDs) needed to use the same value. As soon as we'd fixed one problem, a new one popped up. In the end, we abandoned the table split and mapped the options to their own table.

But what went wrong? Had we found a bug in Entity Framework or simply reached the limits of its capabilities? Or had we made a mistake in the implementation? Let’s take a moment to find out.

To properly investigate the problem let's look at it step by step. First of all, we'll create a one-to-zero-or-one relation and perform some CRUD operations on it. Next, we'll implement a split table and perform the same type of CRUD operations. Finally, we'll combine these two components into what we tried to do and see if we can reproduce the problem.

One-to-zero-or-one scenario

For the first scenario, let’s create an example with employees that use laptops. Most employees have a laptop, but certainly not more than one. The class diagram for this scenario is simple:

Writing the above class diagram in POCO classes gives us the following code:

public class Employee
{
    [Key]
    [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
    public int Id { get; set; }

    public int PersonnelNumber { get; set; }

    public string FirstName { get; set; }

    public string LastName { get; set; }

    // Navigation properties
    public virtual Laptop Laptop { get; set; }
}

public class Laptop
{
    [Key, ForeignKey("Employee")]
    public int Id { get; set; }

    public string Code { get; set; }

    public string Type { get; set; }

    // Navigation properties
    public virtual Employee Employee { get; set; }
}

Employee is the principal end of the relation and has a navigation property to the dependent end of the relation, Laptop. Being the dependent end of the relation, Laptop has a foreign key defined as Employee on the Id and, since the relation is a one-to-zero-or-one relation, Laptop has a navigation property to Employee. So far, this is all by the book, as can be found all over the internet.

The CRUD operations in this scenario work as expected. Because of the relation, you can’t add a Laptop without an Employee or delete an Employee without also deleting the associated Laptop. The first operation will result in a DbUpdateException, the latter will be handled by the framework — deleting an employee also deletes the laptop. You can add an employee without a laptop, since the dependent end of the relation is zero-or-one.

Console.WriteLine("Insert laptop without employee");
 
var laptop = new Laptop
{
    Code = "DELL01",
    Type = "Dell XPS 15"
};
 
context.Laptops.Add(laptop);
 
context.SaveChanges();
 

Table splitting scenario

My example for table splitting involves a customer and its address. All customers have an address, and the address has no meaning without the customer. This is the class diagram:


Of course, you can’t see that the entities share a table by looking at this diagram. In order to see that, we have to look at the code for the POCO classes:

[Table("Customer")]
public class Customer
{
    [Key]
    [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
    public int Id { get; set; }

    public string Name { get; set; }

    public double MaxCredit { get; set; }

    // Navigation properties
    [Required]
    public virtual CustomerAddress Address { get; set; }
}

[Table("Customer")]
public class CustomerAddress
{
    [Key, ForeignKey("Customer")]
    public int Id { get; set; }

    public string Address { get; set; }

    public string City { get; set; }

    // Navigation properties
    public virtual Customer Customer { get; set; }
}

Now you can see that both entities share the same table, Customer. Furthermore, the reference to the CustomerAddress dependent relation is required, meaning this is essentially a one-to-one relation.

Again, all CRUD operations work as expected. Like in the previous example, adding an address without a customer results in a DbUpdateException and deleting a customer will also delete the customer address. However, unlike before, you can’t create a customer without an address. This results in a DbEntityValidationException, because the address is required.

Console.WriteLine("Insert customer without address");
 
var customer = new Customer
{
    Name = "My invalid Customer",
    MaxCredit = 1
};
 
context.Customers.Add(customer);
 
context.SaveChanges();

Combined scenario

Once we know how to implement both concepts separately, it’s time to join them together in a combined solution. Let’s use the project classes for this solution. The class diagram is almost the same as the one at the top of the page:

Project has a one-to-zero-or-one relation with both ConsumerProject and BusinessProject. BusinessProject has a one-to-one relation with BusinessProjectOptions. This relation is implemented as a table split, as can be seen in the code:

public class Project
{
    [Key]
    [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
    public int Id { get; set; }

    public string Name { get; set; }

    public DateTime DateAndTimeCreated { get; set; }

    public DateTime DateAndTimeLastModified { get; set; }

    // Navigation properties
    public virtual BusinessProject BusinessProject { get; set; }

    public virtual ConsumerProject ConsumerProject { get; set; }
}

public class ConsumerProject
{
    [Key, ForeignKey("Project")]
    public int Id { get; set; }

    // Navigation properties
    public virtual Project Project { get; set; }
}

public class BusinessProject
{
    [Key, ForeignKey("Project")]
    public int Id { get; set; }

    // Navigation properties
    public virtual Project Project { get; set; }

    [Required]
    public virtual BusinessProjectOptions Options { get; set; }
}

[Table("BusinessProject")]
public class BusinessProjectOptions
{
    [Key, ForeignKey("BusinessProject")]
    public int Id { get; set; }

    public bool HasDiscount { get; set; }

    public string Contact { get; set; }
}

Again, all CRUD operations work. Like in the previous examples, we can't add a dependent entity without the principal entity and deleting a principal entity also deletes the dependent entities. So we're still having no luck in finding out what went wrong in our project.

Final scenario

There’s just one more item to add to the equation. As you may recall, the original application had a number of other dependent entities that described the contents of the project. Let’s add one of those to the combined scenario, as outlined in the class diagram at the top of the page.

The updated BusinessProject entity and the new ProjectComponent entity look like this:

public class BusinessProject
{
    [Key, ForeignKey("Project")]
    public int Id { get; set; }
 
    // Navigation properties
    public virtual Project Project { get; set; }
 
    [Required]
    public virtual BusinessProjectOptions Options { get; set; }
 
    public virtual ICollection<ProjectComponent> Components { get; set; }
}

public class ProjectComponent
{
    [Key]
    [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
    public int Id { get; set; }
 
    [ForeignKey("BusinessProject")]
    public int ProjectId { get; set; }
 
    public string Name { get; set; }
 
    public string ManufacturerCode { get; set; }
 
    // Navigation properties
    public virtual BusinessProject BusinessProject { get; set; }
}

The new ProjectComponent entity has its own primary key and a foreign key linked to the BusinessProject principal entity.

Once again, all CRUD operations work as expected. We can’t insert dependent entities without the principal entity, and deleting the principal entity also deletes all dependent entities.

Conclusion

In the end, it turns out that Entity Framework is indeed capable of supporting our table split scenario. It may take some time to properly define all relations in code. And that is probably where we went wrong before: if you make a mistake in defining the relations, you could end up with some pretty unreadable error messages. For instance, omitting the [Required] attribute on the Options property in the BusinessProject gives the following error message when you add a new migration:

errormessage when omitting required attribute

Reading this message gives me no clue as to what went wrong. Similar mistakes likewise result in incomprehensible error messages. I think I need to delve into the inner workings of Entity Framework and migrations some more. Maybe in a later blog?

The code I used to test all scenarios is available on GitHub: https://github.com/ArnoudBM/TableSplitting.

Craft Expert Arnoud van Bokkem is part of the .NET team within Craft, the development programme for IT professionals (powered by Centric). If you would like to follow his blog, sign up for Craft updates.

Tags:.NET

     
Reacties
  • Thingks
    Henri Koppen
    05 oktober 2016
    Leuk stuk Arnoud, delen is altijd mooi. In 2008 kwam het Linq-to-SQL framework uit (voorloper van entity) en oh wat hebben we in de krochten daarvan gedwaald....
  • Solutions Design
    Frans Bouma
    05 oktober 2016
    Some remarks:

    1) Address is a valuetype, so it would have been better to model it as a 'complex type' in EF, as it has no identity. A lot of people model it as a real entity but that's not useful, you're not going to re-use an address entity instance across entity instances anyway.
    2) Why aren't you using better tooling for EF? LLBLGen Pro allows you to work model first, let the changes to the model flow to the relational model and mappinngs, generate EF code first mappings / classes. Changes in the relational model data is easily exported as ddl sql change scripts so your DBA can use these to update production/test etc. without magic 'migration' tooling.
    3) table splitting is 'nice' but it also brings a lot of overhead. EF more or less introduced it to make it possible to exclude fields with large data from entities (e.g. an 'image') when fetching these. It's a convoluted way to do this, better would have been if they would have implemented field exclusion.
    4) Don't fall into the trap to design your tables through the entities: entity models are projected onto relational model schemas and code, (see e.g. NIAM how the transformations from an abstract entity model to table schema works). While I can understand you want to achieve a certain table layout, if you want that, start from the other side, change the table, then map the entities to it (or reverse engineer it). In code/model first scenarios, the model drives the database schema.

    Life's too short to fight with tooling, Arnoud! ;)
  • Centric
    Arnoud van Bokkem
    10 oktober 2016
    Thanks Henry and Frans! Looking back, I remember Linq2SQL to be easier. But hey, everything used to be better, right?

    And Frans, I agree with a lot you say. But one cannot always dictate the customers architecture. Since code first was a given, we tried to be fancy and learn something in the process. And we did learn from the experience, so I don't count it as a failure.

    And it got me acquainted with your tool, which I did not know before. Something to read up on.
Schrijf een reactie
  • Captcha image
  • Verzenden