The argument against Entity Framework, and for micro-ORMs.

Preface

I've worked with Entity Framework (since the .NET 3.5 days, both code-first and database-first) as well as the latest .NET Core version. It was my preferred solution for a while and I've gotten pretty good with it. Looking back, I regret having to learn the hard way that EF is very taxing and it just isn't a good choice for most solutions.

There are many ORMs in the .NET world, but I think my point could be made when picking one of each a fully-fledged ORM, and a micro ORM.

  • Full ORM - Entity Framework Core - Chosen because it is the unofficial official version for .NET. It is front-and-center in most of the "Getting Started" docs and is what most junior devs will choose when beginning their journey into .NET.
  • Micro ORM - ServiceStack.OrmLite - For the sake of argument, I could easily have chosen other solutions, such as Dapper or PetaPoco, but I'm a fan of the API/features that ServiceStack.OrmLite provides. NOTE: OrmLite is free for open-source, but paid for closed-source.

Surface area/exposure

As with picking any dependency on your project, you must step back and take a 20,000-foot view of things to determine its impact on your solution. One way to do this is to consider the size of the dependency via the lines-of-code.

Using cloc, here is the overview of the size of each codebase.

Purely considering lines-of-code can be a fool's errand, but there is more to the story here. As the saying goes "more money, more problems", right? As you increase your surface area (including your dependency graph), you increase your chances of running into bugs/issues. As you sit on top of more layers of abstraction and indirection, the problems that you begin to run into begin to get more cryptic and harder to isolate/fix.

You can get a sense of this by spending a few minutes on the issue pages for each ORM (here and here). You'll find that the issues in OrmLite are generally about the problem-domain (getting data in-and-out of the database) or the underlying ADO provider, whereas the issues in EF generally involve the layers/types that are involved in the abstractions.

When it comes to the scope of your dependency and the exposure it brings to your project, I wouldn't take this point lightly. It is often overlooked, and when it does eventually tax your solution, it can go unnoticed/unrealized.

You can't escape the issues of just "getting data in and out" and the underlying database. However, there is a huge swath of issues that can be completely avoided by just choosing not to expose yourself. Smaller targets are harder to hit. Keep your dependencies small.

Bare metal

Micro ORMS are usually just extensions on top of raw ADO types (IDbCommand, IDbConnection, etc) and OrmLite is no exception. These extensions usually go only so far as to prevent the user from having to manually manage SQL strings, which is an obvious maintenance nightmare.

At this point of abstraction, no real complaints can be made. You get fully type-checked access to your underlying database, with a 1-to-1 relationship between your types and the data they represent. Your data types (POCOs) are expressed clearly. No magic. No voodoo. No "secret" tables for mapping many-to-many relationships. No complex graph management. No virtual proxies and lazy collections. No navigation properties leaked. These things usually end up just adding friction to your project, caking on additional features that silently tax you.

If exceptions happen when using OrmLite (or micro ORMs), they are usually a result of the underlying ADO provider, key constraints, etc. It's better to be closer to the metal when an issue arises because the cause/fix is usually more clear.

A common misconception is that "because you're bare metal, you need more boilerplate!" This just isn't true. I mean, in the case of working with HTTP over a raw TCP connection, of course. You'd want a higher-level abstraction. But this just isn't the case with OrmLite (or Micro ORMs in general). I'd wager that I'd get by with fewer lines of code when using OrmLite over EF.

Let's say you have a business requirement of storing/retrieving data. You've ruled out the need for non-conventional databases (graph, Cassandra, etc) and have decided that a relational database will work. The following code illustrates the bare minimum needed to tackle your problem, using OrmLite.

class Program
{
    public class Person
    {
        [AutoIncrement]
        public int Id { get; set; }

        public string Name { get; set; }
    }

    public static void Main(string[] _)
    {
        var factory = new OrmLiteConnectionFactory(":memory:", SqliteDialect.Provider);

        using (var db = factory.OpenDbConnection())
        {
            db.CreateTable<Person>();

            var person = new Person { Name = "Paul" };
            var personId = db.Save(person); // Id is also auto set.

            using (var trans = db.OpenTransaction(IsolationLevel.ReadCommitted))
            { 
                person = db.Single(db.From<Person>().Where(x => x.Name == "Paul"));
                person = db.SingleById<Person>(personId);

                person.Name = "Another name";

                db.Save(person);

                trans.Commit();
            }
        }
    }
}

Any business requirement can be achieved with the above code. There is little in the way to prevent you from defining your solution/architecture how you'd like. OrmLite (micro ORMs) focus on exactly what is needed to solve your problem. Nothing more, nothing less. This brings me to my next point.

Heavy ORMs impose artificial abstractions that force you into a unique style of development that's introducing a further disconnect and layer of indirection between your code and your database, requiring the usage the usage of augmented and proxied EF-specific models. Using this abstraction isn't going to make you a better OOP of FP programmer or make you more knowledable about SQL or any RDBMS-specific features.

Sitting on EF's layers limits your ability to clearly perdict the behavior and functionality of each query, instead relying on EF-specific behavior. This makes it harder to reason about your code as you'll need to keep a hidden context of the incidental complexity in EF's behavior when reviewing code. You'd have to know exactly EF does, when it does it and why it does it when diagnosting unwanted behavior like unintended data access.

Missing features

I believe I've set a high bar up until this point when it comes to choosing EF over OrmLite, but this doesn't factor the additional features that developers have come to love with EF.

  • Migrations
  • Change tracking
  • Unit of work
  • Lazy collections
  • Navigation properties (joins and projections)
  • Result caching
  • Graph persistence
  • ...the list goes on

In my opinion, each of these features is unlikely to address a business concern directly. However, they are still typically highly valued by developers for various reasons.

These features must each be carefully considered. Even if you won't use/benefit from a feature, there is still a cost with having it exist at all. They typically only exist in heavy ORMs (EF) found in enterprise languages (Java/.NET).

I'd prefer to code againt clean APIs that lerverage the DB's underlying functionality and features.

With that said, let me try to address a few of these features.

Migrations

Migrations are a requirement of just about every solution. A few things to consider.

First, just because you didn't write the code, doesn't mean someone didn't write the code. Choosing a batteries-included approach doesn't make your solution any simpler. You can put the engine under the hood, or in the trunk, but it will still break down.

Secondly, choosing a batteries-included approach for a one-size-fits-all solution often means that there are additional edge cases for use-cases that just don't apply to you. This may seem irrelevant, but even if you are using 20% of the feature, that doesn't mean you aren't sitting the abstractions needed specifically needed for the other 80% you don't need.

Lastly, what happens when something goes wrong? Things are a lot easier to debug/fix when you own the solution and there isn't any white noise. What happens when you run into an issue like this? What about the time spent debugging? Or fixing a database that the migration failed on? At that point, you've already spent more time using a feature that you didn't implement than it would have taken to just implement migrations yourself.

Seriously, write your own migration layer. I wrote this in 3 minutes.

class Program
{
    public interface IMigration
    {
        void Run(IDbConnection connection);
        
        int Version { get; }
    }

    public class Migrator
    {
        private readonly OrmLiteConnectionFactory _connectionFactory;
        private readonly IList<IMigration> _migrations;

        public Migrator(OrmLiteConnectionFactory connectionFactory, IList<IMigration> migrations)
        {
            _connectionFactory = connectionFactory;
            _migrations = migrations;
        }
        
        public void Migrate()
        {
            using (var connection = _connectionFactory.OpenDbConnection())
            {
                connection.CreateTableIfNotExists<Migration>();

                var installedMigrations = connection.Select<Migration>();

                using (var transaction = connection.BeginTransaction())
                {
                    foreach (var migration in _migrations.OrderBy(x => x.Version))
                    {
                        if (installedMigrations.Any(x => x.Version == migration.Version))
                        {
                            // Already done!
                            continue;
                        }
                        
                        migration.Run(connection);

                        connection.Insert(new Migration
                            {Version = migration.Version, AppliedOn = DateTimeOffset.UtcNow});
                    }
                    
                    transaction.Commit();
                }
            }
        }

        public class Migration
        {
            [AutoIncrement]
            public int Id { get; set; }
            
            public int Version { get; set; }
            
            public DateTimeOffset AppliedOn { get; set; }
        }
    }

    public class TestMigration1 : IMigration
    {
        public void Run(IDbConnection connection)
        {
            // Raw and auditable SQL.
            // Create tables, add/drop columns, etc.
        }

        public int Version => 1;
    }

    public class TestMigration2 : IMigration
    {
        public void Run(IDbConnection connection)
        {
            // Raw and auditable SQL.
            // Create tables, add/drop columns, etc.
        }

        public int Version => 2;
    }
    
    public static void Main(string[] _)
    {
        var factory = new OrmLiteConnectionFactory(":memory:", SqliteDialect.Provider);

       var migrator = new Migrator(factory, new List<IMigration>
       {
           new TestMigration1(),
           new TestMigration2()
       });

       migrator.Migrate();
    }
}

With less than 100 lines of code, you now have a solution that will have near-zero issues. And if there happens to be an issue, there is a good chance that any developer could fix it within minutes. There is no learning curve. No documentation to read. No CLIs to invoke. No hidden tax bill that will paid in the future.

Change tracking

@ardave2002: I find having a giant ball of mutable state with change detection via virtual proxy that spans my application from one edge to the other to provide huge benefits to my ability to reason about code </s>

In my opinion, this feature is just annoying. There is a performance overhead that you introduce when using this feature. This causes you to leak concerns into your application layer, adding AsNoTracking() on all of your read-only queries. Also, having ambient state in your application is generally a bad idea. There are risks associated with having the semantics of SaveChanges() differ depending upon factors that are outside scope. It makes things very difficult to reason with at first glance.

Unit of work

I have a hard time discussing this feature because people sometimes conflate this pattern with simple transactions, which exist in raw ADO. You will only have issues if you intend to use TransactionScope from EF, which is a little more than a simple transaction. If you don't intend on using this class, then this isn't a missing feature when choosing OrmLite over EF.

But if you need TransactionScope-like behavior, there are multiple ways in which this could be done. First, you could invert the creation of these objects so that implicitly shared/scoped instances can be used for every request. I've also used AsyncLocal successfully to use cached instances of IDbConnection and IDbTransaction for every nested method call. This is something that could be hand-rolled with minimal lines of code, similarly to the migration approach above.

But in the end, this only matters if you intend to use TransactionScope. Otherwise, this isn't a feature you're missing.

Lazy collections

This is a feature that can seem appealing at first but can be very problematic. Do you want database queries to happen implicitly in your views? This is compounded when you are enumerating a collection of objects that have nested lazy properties, causing an additional database query for each for loop.

This is just a really bad idea. Define your query model upfront and fully load it to avoid unintended side effects that only show themselves when unbounded collections inevitably grow.

Misc

What happens when things just are performant with EF? It's great that it allows you to analyze the SQL being executed, but at that point, you are still at the mercy of the underlying SQL generation. You may decide to jump out of EF in these cases and just execute raw SQL, but why even subject yourself to this?

I also don't like hearing the caveats of "if you know how to use it" or "when used right" when describing features of EF. I'm uncomfortable with the idea of playing hot potato with loaded guns, hoping every person catches the gun just right. Not everyone is as knowledgeable as you. Code reviews are a good thing, but depending upon them isn't a good idea. Things get past code reviews. All developers suck, including me and you. Why even risk it?

Final words

An ORM should only serve to give you a type-safe approach to writing SQL. Anything else is a tax that gets compounded as your project evolves.

You could use EF and everything turns out just fine, but the standard deviation between success and failure is wide. When you choose a micro ORM, that standard deviation is much smaller.

Quite frankly, you'd be hard-pressed to ever find a business requirement that only EF could solve. The features that EF provides and OrmLite doesn't are welcomed by developers as being time-saving nice-to-haves. However, when you factor in the taxing nature of using such a large framework like EF, you'll spend more time using it than if you were to just keep things minimal and bare-metal. This is especially true for larger projects involving people with ranging experiences.

If you'd like to add anything, please comment. I love the discussion.

Update: I created SharpDataAccess. It is a thin layer that sits on top of ServiceStack.OrmLite that adds migrations and ambient connections/transactions.


Comments

Lux44 commented Sep 23, 2019  

Thank you for not advocating building queries from strings :).

Change tracking downside of Entity Framework could have an easy enough workaround: use new DBContext for applying changes, then effects are more obvious.

What doesn't have an easy workaround: startup time of Entity Framework, which is quite noticeable in desktop apps.

Right now is not really a great time to look at EF bug list, or jump into EF, for that matter. The query translator got rewritten, but hasn't really stabilized, as the issue list rightfully indicates. Their test coverage for new/rewritten parts is also not great. "Please try again with nightly builds" has been the standard response for over a month now. Let's hope by the time 3.1 releases in November things have stabilized.

I stumbled into this article on reddit, and really liked it. I agree with a lot of what you're saying. 😊
However, the quote you've included in the "Change Tracking" paragraph feels like a straw man. I agree that it can be dangerous to throw around change tracking entities to every corner of your system, and thus lose sight of what actually happening in your system, but I feel like most people know this shouldn't be done, and that if you have this problem in your code base, it might be a code smell.
Currently I like to implement a command-handler approach in my system, where a command corresponds to a particular business concern, and has only one handler, where I centralize the preparation and execution of my business logic.
I agree that the change tracking gets in the way when all you need is to load read-only data for display. Not only that but if you're eager loading, and have only a couple of nested collections in your entities, the amount of data EF is querying the database for can get huge. We recently had to debug this problem where I work. A request for ~1000 entities could result in result sets of well over 100,000 lines. Surely this could be optimized in EF, but when I encounter something like this, I like to take a step back and evaluate the tool we're using.
If you're querying read-only data from the database, what exactly do you need an ORM for? I'd advocate for introducing a micro-ORM (Dapper is my current favorite, but I might check out ServiceStack.OrmLite soon πŸ˜‰). Nobody says you can't use more than one data-access library in your solution.
This approach of using full-fledged ORMs to retrieve and save business entities in your command handlers, and using bare metal ADO.net or a micro-ORM to very efficiently query for read-only data goes very well with the architectural pattern of CQRS, and is why I recommend it a lot to developers who are dealing with large systems and complex domains.

Anyway, the blog post was great, and I generally I agree with you. 😊 Just don't write ORMs off right of the bat, and consider them as a tool as you would any other dependency in your system.
Hope this brain dump makes to sense anybody. πŸ˜…

This blog post is good up to a certain level of application complexity. Most large enterprise projects are going to be dealing with data sets with millions of rows of data, with billions of relational outcomes. Advocating eager loading in such a scenario is non-sense. EF allows you to cherry pick relationships to eager load when using lazy loading by default. If you are using a webserver and constantly eager loading large graphs then you are placing a huge burden on the server to load data that gets dumped in the garbage when the controller returns. If it's a desktop app, you may get away with it for a while, but eventually your app will be holding a couple of gigs of data in RAM and user experience will suffer. On mobile, you should be just as strict with memory usage as on a REST server.

EF was designed to work in all these scenarios and when used correctly, does so admirably. Enterprise code bases need consistency and reliability as their top concerns for maintainability. Designing a system under the assumption that only you or someone of your skill level will be maintaining code is both arrogant and dangerous.

There's other options for ORMs.

For example, Tortuga Chain (which I work on) using database reflection. Rather than just assuming the class exactly matches the table or doing everything using SQL string literals, it compares the table and class definitions at runtime. This dramatically reduces the boilerplate, especially when you don't want every column.

Another is SQL Alchemy which allows you to build complex SQL expressions using an object model. Unfortunately it is Python only at this time.

Regarding boilerplate, consider this line:

Consider this line:

dataSource.Update("dbo.Person", new { ID = personId, Name = "Another Name"}).Exceute();

Why can't all ORMs do this? Why do they usually require manually dealing with connections/contexts and an extra round trip to the database just to perform a simple update?

In my opinion, the only time I should see a using statement in my DB code is when I actually need a transaction. And that should only be needed if I'm updating multiple records.

TehWardy commented Mar 12, 2020  (edited)

Having read this .... many of the points here are exactly where i'm at right now ... the hidden technical debt, abstraction issues I can't solve, complexity with no explanation, poor performance in scenarios that often seem trivial on face value.

I'm currently looking in to an alternative to EF Core to resolve a ton of issues I have that the EF Core team seems to be either confused about, not interested in, or simply willing to pick at my description rather than focusing on the issue at hand.

The fact is, since EF6, each and every subsequent release has removed or broken functionality that my stack has depended on and i'm sick of swallowing that with the reasoning being "this is the cost of progress".

So here's where i'm at ...
I'm trying to find an alternative that as this blog post states should not be an issue because "there shouldn't be a problem that EF solves that another framework couldn't also solve".

So here's my core functionality requirements from the EF functionality that I currently use ...

  • Mapping LINQ to SQL
  • Mapping SQL query results to Entities
  • Managing / migrating the DB (ideally without having to manually crank out SQL myself)
  • Complex filtration when "nested questions" happen.

That last point appears to be the sticking point for most "micro-ORMs" .. the "micro" prefix usually means like with say Dapper that it does the SQL query to Entity mapping but won't do the bit before that to get from the LINQ expression tree to SQL.

Assuming this ORM can handle that I'm looking for examples ofdoing things like applying set filters so I can achieve something like ...

var results = Db.GetAll<BaseT>()
   .Include(t => t.Property)
       .ThenInclude(i => i.SubProperty)
   .ToArray();

... key things to note here that EF solves that I can't seem to find a solution to in other ORM's the include, and then the sub include are both filtered by the relationship but also on filter conditions applied to the table regardless of context in which that table is questioned.

This seems to be a feature missing in all but EF.

Does this ORM support this scenario?

This requires you to know all the possible combinations of questions that you might want to ask the api up front or manually wiring up a second model so that the traversal is possible with sub queries.
That's never a desired result when you have a db with potentially billions of rows to manage and you want a joned set of maybe 1,000 of them from the results of a single dynamically generated SQL query.

Consider putting an OrmLite managed DB behind an OData API where the questions are virtually limitless but every possible combination of scenarios has to be considered and handled.
With EF this is trivial as I can tell EF that the Set has a filter and then any time EF sees any portion of a linq query hitting a given table it applies the filter then the requested query to the SQL query.

People often forget about the complexity that EF is solving stating that something is a "micro-ORM" instead of a "full ORM" is seemingly just like declaring "this solves half your issue, now go find something else that solves the other half, but it's fast ok".

I've not yet found anything that can match this type of EF solved scenario that didn't require a crap ton of "work arounds" or "patching stuff together" and it's the one thing that keeps my solutions sat on it ... which is frustrating because I both hate it and have to use it at the same time as there is seemingly no alternative.

Other things of note ...

  • For some reason Code generation is seen as bad
  • Designer tooling focus is seen as bad

... all features that I often use to solve dynamic scenarios that would otherwise be unsolvable or for situations like I don't want to sit around writing boiler plate stuff like the SQL statement for building a table when I have a class that exactly matches it's structure.

Also, the current version of EF core will never return "some proxy sub class with injected decorated behavior", as that functionality was ripped out as part of the rebuild when EF6 became EF Core 1.0.

My current entity model, has a DbContext as you might expect with entity sets, none of which I pull from the Db in such a manner that I use things like Lazy loading, or proxies, or in any way require the resulting entities to be "attached" to the context, this is basically the same setup as is explained here.

My ideal ORM would allow me to do something like ...

using(var db = new factory.GetConnection("db name"))
{
       IQueryable<T> results = new Query<T>().Where(...).Select(...).Include(...).OrderBy();
}

... in this situation I would be constructing a simple ADO.Net connection to the Db and then telling the framework "build me a SQL query of Type T", large complex "models" seem to feel like overkill for me since the type metadata for the query you're building should tell you all you need to know.

I continue my composition on that to construct the full query, including notifying it (as per my previous comment) of what "sub sets" I want in the results, then performing a .ToArray() .ToList() or simply iterating over it would actually execute it.

The results would be disconnected from the DB (simple POCO's) and be appropriately secured unless I specifically asked the framework to track changes for me to make saving easier later.
If a calling user asks for "select * from Table" configuration of a secure where clause should be a key feature of any ORM "micro" or not.

My issue tends to boil down to the fact that all these ORM's claiming to be better than EF appear to be so on face value, but as soon as you start drilling in to the complex scenarios they fall short of features resulting in me having to extend or build tons of framework around the ORM.

So the article makes the comparison of OrmLite's 89k cloc to EF's 514k cloc in the chart at the begining but it's forgetting that there's a bunch of stuff in that extra 400k cloc that OrmLite can't do.

Unless i'm missing something?

I recently asked the Dapper guys about how implementing a Dapper based back end for OData might look I simply got a one liner "you'd have to write your own version of LINQ to SQL as Dapper doesn't do that" ...

So Dapper can execute a query, and map the results to an Object graph, but it can't build the query in the first place. That's half the work is it not?

In short

I'd love to see a "micro-ORM" implementation of an OData or GraphQL or similar API model as those sorts of API's really push the limits of what ORM's can do.

See this: Why not OData?

Also this: AutoQuery

TehWardy commented Mar 13, 2020  (edited)

I don't agree with the bulk of points in that Odata article, for example the supposed "tight coupling of internals" problem highlighted is not present in my stack but that's a far more complex and different discussion. I was using OData as an example mainly because of the complexity in scenarios that it exposes us to only, even Microsoft who pushes OData heavily states that best practice is to have N-Tier separation and promotes the use of both an API model and a DB model, the mapping for that is an entirely different issue though that's not worth discussion here.

The article seems to suggest that AutoQuery is an alternative to OData which it just isn't.
the fact is, as the article points out "the OData query-space can reference any table and any column that was exposed.", not quite true but the premise is ... you have a model (that model doesn't have to be the same as your data model, but you can build any question on any part of the API model,

Essentially all i'm asking is that an ORM should be able to handle exactly that, and when I ask it a question I should be able to tell it "I want my question asked in this business context" which for the bulk of queries in enterprise applications boils down to "based on what the user making the call has access to", which in my case is a small cut of every table in the DB.

If the answer it seems is to just avoid asking the question then it's not really answer is it?
This comes back to my Dapper point above, can Dapper really claim it's faster than EF when it only solves half the problem!

mikependon commented Mar 14, 2020  (edited)

Interesting topics, I really like to chip-in my ideas but only on ORM side. Hope to share more here soon.

As of writing this, I do not have any idea of AutoQuery and I am not as deep at OData.

key things to note here that EF solves that I can't seem to find a solution to in other ORM's the include, and then the sub include are both filtered by the relationship but also on filter conditions applied to the table regardless of context in which that table is questioned.

@TehWardy - I think, a micro-ORM can do solve all the things, whereas EF can't. The EF has abstracted most of the things for you, and that limits you to access the benefits of the underlying storage. It means, you have less control with EF over a what we called micro-ORMs.

Migration tool - People often forget about the complexity that EF is solving.

True, and not true. It is based on preference. I can say, EF did not solve any complex problem :) - that's why there are micro-ORMs which allows you to have more control. Of course, you have to write more.

Specifically to Migration Tool, we have developed a DevOps tool and had not used EF code-first as our preference is to not bound it to EF at all. In the end, it is much easy to maintain and solves the complexities of our releases :)

I've not yet found anything that can match this type of EF solved scenario that didn't require a crap ton of "work arounds" or "patching stuff together"

Do not generalize this, EF does not even have Batch and Bulk operations, 2nd layer cache, Trace, etc. And if you tend to do that, it requires you a lot of works just to make it work with EF. Also, you are bound to the models at all, and you can't do anything on your model but just to use it on a specific table.

My ideal ORM would allow me to do something like ...

I am interested to collaborate and share. I am also an author of one micro-ORM named RepoDb. Can you have a look?

It is a hybrid-ORM which will allow you to do the things of micro-ORMs and macro-ORMs. You will have a lot of benefits while using it, I can even explain and support on this.

You may experience a different test as it has been baked differently.

It's refreshing to have people to talk to about this stuff ... most devs want to avoid this type of problem as it's a minefield of pain whatever path you go down it seems, the trick is picking which mines you step on with some level of "good guess work of the future".

@mythz

I can see you clearly have some big issues with the M$ stack, mostly they are valid too if the stack is used as documented.
The way they document stuff is the "this should get you going" method, not the "this is probably how you should be using it" way ... the main reasoning there I guess is that they want as many people using the technology as possible which is true of any stack vendor.

You seem to be under the impression that OData is ...

  1. Not type safe / lacking contract definition.
  2. Tightly bound to a db model.
  3. Specifically an M$ technology.
  4. Over-complicated for complexity's sake.

To that I would say ...

  1. The OData standard specifically requires an API model definition.
  2. It doesn't have to be, that's a documentation issue.
  3. It's open source, managed and owned by not M$ ... but M$ do have 1,000's of endpoints that use it (their entire cloud platform that they spent billions on uses it for it's entire API model).
  4. That complexity caters for scenarios that from what I can tell are impossible in service stack without a lot of code or the loss of that type safety (as per our discussion on stackoverflow).

I've noticed that OData is referenced with regards to the v3 spec and WCF, that version is basically dead, the v4 spec and version runs entirely on .Net Core 3.1 and is actually more complete than it's partner version of EF (the soruce of my frustration right now).

Also what's this ...

it was found wanting with the industry quickly moving to simpler RESTful JSON APIs built with CoC frameworks and libraries.

... OData isn't dead, it IS restful and returns JSON by default and my entire point to you centered around my testing efforts on service stack are literally that CoC point (i have that already and don't want to give it up), service stack doesn't seem to do anything by convention it requires explicit definition literally everywhere.

When I say "I should be able to tell it "I want my question asked in this business context" ... i'm not joking, I run a transactional platform and the context of a question is important ... I see a billion euros a week worth of invoice data through the system and users getting back the wrong rowset is not an option, that is by design a complex question nto because M$ said so but simply because it has to be.

As for the "mis-appropriation of features" ... I don't use any of the extended features that didn't get ported to .Net core for that exact reason, I saw the headache coming and avoided it.
The one feature i'm struggling with is expansions when pulling entities from the DB with their children, other ORM's seem to be able to do this but it's pita by comparison and lacks flexibility.

I've had DBA's hand crank queries to answer some of the simpler questions and the way EF handles some of the scenarios actually beats that (it's rare but it happens).
The most complex of the queries i've hit run a 1MB select query and it comes back in incredibly short times, that's how complex the questions are.

That's a requirement imposed on me because of the nature of our DB not due to the framework imposing that, replacing EF with ORmLite or Dapper will not change that, it's been tested extensively.

@mikependon

Thanks for the feedback, I can totally see why people say what they say about EF, hell the grief I give M$ on occasion and the EF team is somewhat ridiculous at times because i'm trying to solve problems that simply shouldn't exist.

I'll happily take a look at your ORM :)

As of writing this, I do not have any idea of AutoQuery and I am not as deep at OData.

Good place to start, I know here on the Service Stack side OData is seen as some sort of anti-pattern due to the way that M$ documents and recommends using it, I definitely don't use OData as documented so usually don't hit the down sides (like tight binding to the DB structure).
I actually get a lot of flack from the M$ dev teams about my abuse of their tech, but my implementations are cleaner, faster, and more secure than their documented examples.

When discussing it here though, it pays to appreciate the complexity of questions that you can answer with the OData + EF stack WITHOUT having to specifically write any code at all beyond building the class that matches the table by default.

With other stacks like shown here with service stack I've had some interesting conversations with @mythz (sorry mate, I do like to ask the complex questions) on this area and the choice to jump boils down to a few key points for me ...

  1. I should be able to implement what I want as a "base" and extend for specific cases where needed (and only where needed).
  2. I shouldn't have to handle every possible use case of my API explicitly (because my API is consumed in situations I don't have any visibility of).
  3. My clients build a solution using my framework so I can't "build the complete API they want" (meaning it needs to answer questions i haven't thought of)
  4. I can't hand crank blocks of SQL (nor do i have any interest in that / time to do so)
  5. It would be nice if it's free, (I can't test my code stack conversion to Service Stack without buying a licence which is frustrating).

@mythz I was going to ask you about point 5 actually ... I've converted a few thousand lines of code over to service stack but obvs because my model is more than 10 tables (or whatever the limit is) I can't test it, i'm actually seriously interested in at least spinning it up to see if that performance gain is really there (although I do have a query generation problem to solve) as ORMLite only solves some of the scenarios I have.

Some points of discussion had on stackoverflow ...

The key thing here is that as the technical lead on my own stack I should be able to pick the pieces that work for me (EF admittedly doesn't give me that, it's all of that half million lines or nothing - ish) ... but then having picked my pieces I should be able to build solutions around them as needed.

When I talk about my ideal ORM, i currently don't think it exists but then i'm very picky.

Key features I would like to see in an ORM which would make building my own API easy are ...

  • CRUD without a model (so much like OrmLite here or Dapper the ability to just grab a connection and do stuff with T's on it)
  • Query generation, from either some form of string source or an expression tree.
  • T based Filters on the DB (which is why i don't want to get involved with the SQL construction).
  • The ability to point the ORM at a context class which defines the model and generates migrations for me (EF does this bit incredibly well).
  • No forced patterns or architecture design.

That last point along side the lack of SQL generation from LINQ is where I feel both Dapper and OrmLite fall short, but this is why I think they are sold as "micro-ORM's" at least in part, they deal with ONLY the problem of talking SQL to SQL servers.
That's not a bad thing as it keeps the framework light, but maybe what's missing here is a LINQ to Sql abstraction (like the one in EF) but not tied to any ORM, one that's pluggable so consumers can override certain behaviour (probably on a tpye by type basis).

consider this sort of query example ...

// build the query
var query = new Query<T>()
    .Where(...)
    .Where(...)
    .Select(...)
       .Expand(...)
             .ThenExpand(...)
    .GroupBy(...)
    .OrderBy(...)
    .ToSql();

// then with Dapper I could do ...
var results = connection.Query<ResultType>(query).ToList();

... the key thing to note about this example is that I'm mapping questions presented as OData parameters in to this framework basically allowing the User to build the query they want the API to run, but not only that, the base set is filtered by a preconfigured filter for T based on the users access to rows in the DB, then when they expand in to the subsets those also have a filter applied to them, all of this is automatically injected in to the query.

Now I can see the response here ... "yeh you can do all that with x-ORM" ... you're right, I can ... but I don't want to hand crank all that functionality, EF already handles it for me, the only issue is that I have to take all of EF to get it.

If I could take the query building, as a feature and plug that in to any ORM then I'm free to choose to use OData on top of that if I so please.
If I take service stack I can't do this, I have to pre-think of all the possible queries the user might want to ask and build them in to my API layer providing a pairing of at least 1 DTO + a service method for each possible question.

For CRUD on a single OData endpoint I only require a single generic controller, with a filter on the DB table (a one of linq expression, and one liner) I can filter the table for any user that logs in by applying my own "app role logic or whatever" to the table and i'm done.

Assuming I follow a convention I would then have 1 controller / service and one context class representing my DB then the simple POCO that represents the table (all things I have with service stack) ... the key difference with the OData + Ef stack is that if I want a new endpoint I simply add a new POCO and i'm done, full CRUD implemented "by convention".

is this slower than a handcranked query for each CRUD operation on every possible endpoint query ... yup, do I care that it costs a few extra CPU cycles ... nope, servers are cheap to rent, cloud solutions architects and the dev teams to maintain complex codebases aren't.

@mythz i'm not ranting, sorry if you feel that way ... I thought this was a friendly discussion.

I'd also like to point out that i'm about 80% of the way through making the code work with Service Stack but i'm currently stuck on a few things (understanding this AutoQuery behaviour is actually one of them, so thanks for that). In addition to that i'm constantly sharing the information you give me internally to get further feedback from my team.

I'm not adverse to dropping the M$ stack entirely (again why i'm here), I just need to prove to the business that I can deliver the problem domain under the new stack without too much fallout and in a timely AND cost effective fashion (OData + EF is free after all which means something to a small business like us).

The fact that i'm using OData here is not out of some misguided loyalty to it or M$ it's more that it presents certain complex challenges to ORM solutions so acts as a good example of showing the worst cases that an ORM may need to face.
OData and GraphQL have a lot of common features in their design ethos and yet you don't seem as adverse to GraphQL.

Sure the M$ implementation of OData is horrendously bad in places but to conflate the implementation with the standard (which is what OData is) is out right wrong.
As you say i'm just throwing assertions around and not backing them up lets take @pauldotknopf 's blog post from earlier, in the first part of the article he straight up begins with a bunch of incorrect assertions without backing them up ...

There is virtually no contract. A service consumer has no idea how to use the service (for example, what are valid Command arguments, encoding expectations, and so on).

... it's wrong because they actively push the use of metadata based descriptions for the entire model that you're exposing and even go as far as defining the definition of that metadatas schema ...
http://docs.oasis-open.org/odata/odata/v4.0/odata-v4.0-part3-csdl.html

The interface errs on the side of being too liberal in what it will accept.

... that's just an assertion, the very thing you're accusing me of.

The contract does not provide enough information to consumers on how to use the service. If a consumer must read something other than the service’s signature to understand how to use the service, the factoring of the service should be reviewed.

OData is repeatedly comprared to WCF SOAP Services for some reason by both you and @pauldotknopf in his article (which I find odd as they have literally nothing in common) EXCEPT ... SOAP had a WSDL description, you could use the tools to generate your client code.
OData doesn't require tools, it has an XML based description, as stated above the schema for that is well documented and anything that can read XML can understand OData schemas and thus consume with strong typing the service.

I go a step further and expose metadata relevant to each endpoint on the endpoint itself to avoid the caller having to rely on a large blob of meta for which they want a small portion (IMO this should be the standard).

I could go on but my "opinions" (however documented and fact based they may be) regarding OData aren't wanted here.

Again .. The reason I pointed at OData was that it generates complex "real world questions" I have to build an API to answer.

Your examples are interesting and do solve the problem in the event that I handle the question or use AutoQuery to do this for me ... can AutoQuery do this with my own business logic in the middle something like this (taking the OrmLite example) ...

var q = db.From<Customer>()
        .Where(...)
    .Join<Customer, CustomerAddress>()
        .Where(...)
    .Join<Customer, Order>()
         .Where(...)
    .Where(x => x.CreatedDate >= new DateTime(2016,01,01))
    .And<CustomerAddress>(x => x.Country == "Australia");

var results = db.SelectMulti<Customer, CustomerAddress, Order>(q);

foreach (var tuple in results)
{
    Customer customer = tuple.Item1;
    CustomerAddress custAddress = tuple.Item2;
    Order custOrder = tuple.Item3;
}

... if I understand this correctly this is the equivilent of a query that returns an expanded subset of properties too.

  • Does this support filtered joins "in this manner"?
  • Can I configure AutoQuery / ORMLite to apply the nested Where clauses ANY time a table of that type is used anywhere in any query?

Essentially the reasoning here is that from the users information in the request (like an auth token for example), I have to filter the db down to the stuff they can see in every table then execute my question on what's left (standard multi-tenancy issue basically).

From there the logic is only as complex as the user question asked which it sounds like AutoQuery might be able to limit to a problem domain that's already coded for which is perfect!

@TehWardy quick question, since you are considering AutoQuery, I take it that the OData solution isn't deployed yet? You are still in the research phase? Is this a new solution?

@pauldotknopf I have an existing solution implemented with an OData based API layer, my issue generally isn't with OData, I think @mythz here has issues with "ugly URLs" in OData (not unreasonable to be honest if you lok at them encoded they can appear pretty ugly).

The background for my "problem domain"

The reason we use OData is because it allows the client to specify the question they want to ask instead of me stating to the client "these are the questions you can ask" which is key here.
In @mythz example from the last comment for example in order to achieve the join result in AutoQuery I have to provide the endpoint with a method that implements that particular question joined in that particular way.

My issue is that I don't know that that particular join is something the client wants at the time I'm writing the code and I don't want a support call to implement a new API method everytime they have a new question to ask the API.

I know you guys are highly against OData but the key thing this offers is "within the confines of the type safety as defined by the contract emtadata which defines the tpyed sets that can be questioned" the user can "build a question in a URL to confirm to even the most complex of business scenarios" and yes the nature of the question they can ask CAN get complex but it's on them to decide that not me, and forcing them to only ask "pre-built questions" won't cut it.

The issue is that our clients are fortune 500 companies with big complex "poorly deisgned" systems like SAP implementations and often are constrained by having to work to a standard that that system implements, and OData is one of those standards.
This is where OData excels because the provider (this case SAP / IBM / Seimans) that delivers the platform to our client will provide functionality to allow them to interact with systems through expensive (like half a million $) "connectors" which are specifically designed to a given spec and whilst AutoQuery looks great I can't tell a client "sorry this is how we work because it's better for us" I have to conform to the provisions that their system can handle.

With the netflix example netflix can decide how people communicate with them, with our platform we offer business services that connect between such systems and are forced to interact with those systems in the way that they support so i'm not dealing with an "in an ideal world" scenario.

With that in mind
Given that the clients system works a given way, the bulk of my questions are around this area and making my API layer fit to my clients requirements.
This is arguably not a "normal" API delivery scenario that most companies have where they can essentially dictate to partners how their systems work.

Allowing clients to be able to construct adhoc joins to System tables is even worse tight-coupling, which you have even less hope of being able to make changes without breaking existing clients. Might as well give out an RDBMS connection string and give them maximum flexibility.

Or in my case ...
Allowing clients to be able to construct adhoc joins to different sets in an API model which isn't the DB Model.
My service layer deals with the translation of questions but much like the layer above it, it can generate some "interesting" questions.

Believe it or not I carry much the same ethos as you but i'm often not in a position to make the "ideal choice" due to external concerns (as described above). I offset a ton of the concerns you have with the nature of the "over-complexity" and "over-engineering" by putting all my business logic behind interfaces and using IoC so if I have OData controllers, WebAPI controllers, or ServiceStack services i'm always insulated from that complexity, but it does make it tricky to answer some of the problems that such implementations introduce.

I've also deliberately put EF behind an Interface which having migrated the stack on to ServiceStack I'm seeing that I did lapse in a couple of places where I exposed IQueryables when I should have exposed IEnumerables (that's on me to fix and trivial to do so).

That said, having taken your advice onboard my plan is to update the code until those "leaks" are plugged then to re-migrate the code again as it only took me a couple days this time round it shouldn't be too bad next time.

I have another major demand on my time this week but hopefully I can get to looking at that stuff next week. The up side is that my business logic architecture is entirely interface / IoC driven so it's mostly a lift and shift operation (i have said I don't use that stuff like most people do).

I do appreciate your advice @mythz and you do make some great points, points that I intend to raise with Microsoft too in places because ultimately you're right and it's on them to provide good advice for the technology stacks that so many use.
I also feel in places you over generalise the problem of bad platform design and imply that I as a result have a bad stack that will ultimately need to be rebuilt because of my "assumption OData is a necessary complexity" or "mis-guided impressions", that's not how I work and for that reason my OData API doesn't support the full spec deliberately, and in places actively ignores it.

I would still be interested in looking at ServiceStack but the current feedback from our board of directors is the following ...

  1. Who are you / how big is the company as support is a must have for the work we do?
  2. Is there a way to test our complete migrated stack without the cost of a licence fees until we know it works for us?
  3. Is It compatible with the demands our existing clients have?

That last point is the one i've been trying to address here of course for the most part.
My current understanding is that it works the way it works and that clients should fall inline with that, which might not be something our clients are willing to accept, but it's extremely flexible so with a bit of work in some cases I should be able to make ServiceStack fit their needs.

I do happen to have that freedom, and as the technical lead here for everything we do the board leans on my guidance to make it's technology spending calls, gneerally speaking the calls made are "because it's right" not out of some mis-guided impression that M$ puts out about how things should be done (hopefully i've shown you that much at least).

Again, Many thanks for the feedback, when I get back to it i'll definitely ping them an email, i'm actually curious to see how the two solutions work side by side because as they say "the proof is in the pudding" ... right!

kakins commented Mar 22, 2020  (edited)

Just a comment here about the general attitude and tone I've seen in this discussion. Point to consider: if you happen to think a specific technology is more suitable, or disagree with someone on key points, try not to be arrogant and condescending about it.

I'd say especially is true if you're trying to promote a particular service or product, yet you resort to engaging other developers by basically insulting them:

Your long rant is basically a list of opinions presented as assertions without substance or concrete examples backing them up.

Or this...

I'm assuming your beliefs could only have been formed by having no experience in different languages or OSS platforms & communities where OData has no dev mindshare and zero consideration in modern technology stacks, but sure feel free to keep believing it has a thriving future and your future-proofed systems are just ahead of the curve.

I completely agree @kakins I'm trying to be polite and promote a strong technical discussion about the differences and why they exist instead i'm just being told "accept it, drop some opinions you have and move away from what years of experience has shown you because this is better", that's not how a responsible architect delivers good architecture to a business at scale.

Myths and Paul are clearly very attached to the Service Stack design and feel very strongly about it which is commendable to stand by your creation but there really is no need for the aggressive stance here, i'm not here to insult / knock anything.

As it happens i've got a wide variety of experience using different approaches to the "API layer problem" and the "N-Tier stack problem" having worked in the industry for 20 years I've seen a lot of stuff claimed as the "best answer, and anything else is simply broken due to ". Frameworks always die eventually and something better always comes along, no doubt at some point that will be the case with Service Stack too.

In order to shield myself from particular stacks flaws i've essentially built a business layer that sits behind and depends on interfaces entirely, so the specifics of a particular stack / framework design don't really matter much me, I just need an IoC/DI implementation to wrap it all up.

That said, what i've learnt from this discussion is that both Myths and Paul point out both some valid points and some misconceptions that they will defend to the very end and not accept any adverse information to the contrary which makes it hard to get advice on how to fit complex scenarios in to Service Stack. Given the price tag on Service Stack, it really doesn't matter how good or bad OData / EF are only how well Service Stack solves the problems I have already solved with those competing technologies. If you're gonna charge for something as a potential buyer i'm gonna be dam sure i'm putting my money in to value add for the business before I pull out the company credit card.

For example ...
Valid Point
OData's way of handling and serving up metadata requires heavy models and parsing of large blocks of metadata.

Misconception:
The WCF model is the only one, which requires "special tools" to generate client proxies that clients should interact with in order to retain any form of contract or type safety.

I have overcome this pitfall oddly enough by working not that differently to how AutoQuery works.
I've also discussed above some of my reasoning for not agreeing with many of the documented points that Paul makes in his article (both are above).

That doesn't mean I disagree with the approach taken by Service Stack, it just means I don't agree with the particular use case / implementation detail they have chosen to pick fault with or the fault reasoning, I'm just looking for the parallels in order to make informed choices.

Why I use OData + EF + .Net Core

I am in a unique position in that unlike API providers like Netflix (as this seems to be the example used above) ... I have to provide an API layer that allows users to construct a UI to support a business process that they design on top of a connected web of systems with mine as the middleware platform that ties them altogether.
I don't have my own business process design I can force on others, nor do I have control over how my clients may choose to interact with my system in order to facilitate their business.

This means I have to consider problems like ...

  1. Client needs to show a grid of data with columns from multiple tables in my DB and allow paging sorting filtering and grouping (and in some cases aggregation) on that grid regardless of my underlying data structure.
  2. The exposed sets can be mapped to cover complex edge cases to multiple entity sets (using DTO's) for complex joins or specific "common" scenarios I choose to optimise as known common paths.
  3. The data model and the API model are separated allowing me to inject my own business rules in the middle that apply to the platform (e.g. a multi-tenancy rule so that no instance of T is returned to a user without them having access to it no matter how they ask a given question).
  4. I don't have to expose all of / any of my DATA MODEL in my API MODEL as doing so is often considered bad practice.
  5. My clients are fortune 500 companies with massive complex systems that talk to standards based endpoints, some won't even use http calls because they refuse to spend Β£1 million on a "plugin" or module for a massive ERP solution.
  6. The nature of API provided questions are "complicated" in some scenarios beyond my control, typical things like aggregation, joins, and projections kill most API's ... not ours.
  7. It must be secure because it's financial data i'm dealing with.
  8. It must be reasonably maintainable (by convention implementations, type safe, ect).
  9. Support for us is important as often our contracts with come heavy fines if we mess up / have down time.

What this means

If a user wishes to design a query that pulls data from 10 tables, and filters on 3 of them as a flat set "source" for a datagrid in their UI I can support that with 0 coding, 0 deploys, 0 changes to the system at all. That's the key point here, I can't roll out a deploy every time any client finds some new question they wish to ask the API as I would be forever deploying.
I've been here asking exactly that sort of context specific question because it's the sort of question I've not really seen answerable in other so-called "better" frameworks, but these are the sorts of questions that if a framework can handle them it can literally handle anything in API circles.

The bottom line

There's a key point i'm trying to get understood here which boils down to ...
Can I deliver "OData like" features without OData (as it's deemed such bad design) or is my complex API situation that depends on those features (for good reasons) just bad design because "it's not simple enough" ....
If it's the latter that suggests that Service Stack isn't ready for real world situations that are complex as the one I have to face every day but works in situations where there's a known finite set of questions that can be asked on discrete unrelated DTO sets.

Regardless of peoples opinions, there's a technical fact that comes with all this discussion which determines the viability and I would be negligent not to ask these awkward questions.

I apologise if anything I have said has come across offensive here.

kakins commented Mar 24, 2020  (edited)

@TehWardy I don't think you've been offensive at all. You've laid out your case, and although I don't know the details of the problem you're trying to solve, you've described it well enough that I can understand the essence.

In the flurry of words contained in this thread, I've seen two viewpoints presented:

  1. My queries can by dynamically composed, constructed by the user. The exact shape of the resulting model is unknown, which could be based on joins from any number of tables -- it's all up to the user. There's flexibility here, but complexity comes with it.
  2. Data models should be "known" beforehand, because most APIs are essentially answering a set of known questions.

That could be a simplistic over-generalization of the two views. However, I can at least envision how EF can help solve item 1, while micro-ORMs may have difficulty supporting it.

On the other hand, for simpler scenarios like item 2, you could argue that EF is overkill. When I say "simpler" here, it is relative. I'm by no means suggesting that micro-ORMs are only for simple solutions.

However, I'll admit I haven't spent time in ServiceStack, or done much work with micro-ORMs. But I have asked myself the same question as @TehWardy as I've looked at Dapper. I started writing a dynamic query builder using EF that, at least from what I could tell, would be extremely more difficult using Dapper.

Exactly @kakins it sounds like you understand the nature of my problem ...
It boils down to a need to map from a URL to an expression tree, that then gets manipulated in my business logic layer and then translated in to the final SQL in the data layer.

Putting all these pieces together is essentially a standard web stack, but some use more strict / restricted API layer capabilities under the guise of "complexity is bad" but I don't have that option due to the operational requirements of the problems i'm solving.

We could make a case for not using expression trees instead and just manipulate strings sourced from the API layers "query" and then directly translate those to SQL but having the expression tree in the middle gives us type safety in the business logic and an interception point that doesn't require things like reflection which can be slow.

I'd be really interested in a demo service stack project that replicated some of the more complex capabilities of OData "by convention" avoiding the need to write out a lot of DTO's specific to each use case but my gut feeling is that point that Myths comes back to about the key design elements in Service Stack force this as the DTO's explicitly define the contract information.

Sure OData may be bad for some valid reasons but it's a great way to point at complex API layer functionality and be like "can your API layer do this" as a point of discussion, it's certainly not the holy grail though.

Things like aggregation or sub selection don't appear to be possible as a user defined scenario without me having to pre-define those in Service Stack which contradicts the "over-posting" and "over-responding"best practices i've come to like for both security and perf reasons.

background for this

If I have a business object on my back end (simple POCO) with 20 properties and I need a result set with 10 of them, filtered on some child tables values, with a system derived business rule injected in to the query for each table hit I'm not losing any type safety by asking for only those 10.
Nor should I be forced to declare a second POCO with only those 10 for that scenario else I have to now consider (in my situation at least) what are all the possible combinations of those 20 that may be needed and implement a POCO for each to avoid returning excessive data loads to the client.

This is a common problem in API layers, Netflix for example assumes I want all of the fields for a movie, and I have no choice but request them all, I can't just ask for a subset like say a key and a name if i'm building a drop down list of them. In my API layer, 1MB responses could turn in to 10MB repsonses. OData whilst ugly does at least give me the flex to define exactly what I want it to do on the back end and exactly what I want it to pull from the DB for me.

Join the discussion at GitHub