posted under category: General on October 30, 2021 at 7:06 pm by Nathan
A blog series in which I confess to accidentally having written my own poor version of a solved problem
It was my first .NET Core project ever. I knew a little ADO and ADO.NET from many years before, and I quickly learned enough about the built-in ORM, Entity Framework Core, to know there isn't a way to map SQL Server's stored procedures to EFCore entities. It's a shame.
This project had a team of data scientists writing a bunch of SQL that came out as record sets from stored procedures. Data scientists aren't necessarily great at SQL, but they managed to get it done. Questioning their methods wasn't something that was in the cards for me. Believe me, I tried. Nevertheless, they produced data, and I needed to consume it.
So the problem remained: there is no simple way in .NET Core to pull rows of data from a stored procedure into the application. So I started exploring. I have a pretty good memory, and I remembered the names of some classes in the
System.Data namespace which, to my surprise, were still there. This is the fully rebuilt .NET Core and ADO.NET Core, but a bunch of the old things I knew from over a decade ago were still in there. Astonishing!
Thanks to google and stackoverflow, I managed to piece together something that queried the database. Moving forward, I could eventually read results! This was really working. I got a single stored procedure mapped to a simple class I built. Commit. Push. Deploy. Happy.
Mission accomplished, right? Hardly.
At this point, I was like 300 lines of code into something that I used to be able to do in 10. There were two problems:
Building and sending the query takes multiple calls to awkward ADO APIs, and it's just too many lines of code. Sure maybe there's no better way to do it, and I appreciate all the control I get, but I'm just trying to make a little web app here. Why can't they just open and close the connections for me?
Getting the data back is completely manual. I have to manually loop over the iterator that's streaming all the records from the server, set the columns by their ordinal position, and then cast types into my record object. This is a lot of lines of code.
In summary of my two problems, it's (1) the input, and (2) the output. Hah! Yeah literally the whole thing basically. sigh
So it was an obvious thing to quickly evolve it. I should be able to remove a lot of the boilerplate junk by adding some convenience features. All of a sudden, I could call the procedure in a single line of code! I added my connection to the dependency injection system so that any object in the data access layer can request it.
With a serotonin high off that success, I started to tackle the data retrieval problem. I poked around enough until I found how to get the return columns by their column name instead of their ordinal position in the recordset. But how do I know how the columns in the query compare to the properties in my data record class? The only way without a bunch of configuration is to use Reflection. I modified it to read the columns in the recordset and search through the properties in the target class to match it up. Works perfectly!
Now the API is something like this:
List<MyDataRow> data = await dataService.QueryAsync<MyDataRow>("exec usp_GetData");
Getting fancier, sometimes our front-end application needs different names for columns in the database. Also our data science team sometimes uses illegal names like columns with spaces and whatnot, so that's a mess. I looked around and found that ADO.NET Core has column alias hints you can give to properties for EFCore entities, why not use them here?. Adding those means my data-to-record mapper needs to build a dictionary between the incoming column names and the outgoing properties, with the possibility of multiple aliases or just with the property name. I suppose that's simple enough.
Then there's the problem of input parameters. Of course we need the ability to pass in parameters. ADO has this covered typically, with a SqlCommand class. Naturally, I wanted to have a way to simplify even that, but that one looks like it's nearly as simple as possible, so very little work to do here. It's actually inconvenient to create a SqlCommand, so I made a convenience method on my DataService from a fake SqlConnection and connect it to the actual one once the request was made.
Now queries looked like this:
var cmd = dataService.CreateCommand(); cmd.CommandText = "exec usp_GetData @myParam"; cmd.Parameters.Add("@myParam", value) var data = await dataService.QueryAsync<MyDataRow>(command);
Is it the simplest I could make it, and it felt pretty good, so that's all I need to go to production! (I'm kidding! [kinda]). All the rest of the database connection opening and closing, data-model binding, and anything else is all handled by this DataService of mine. It could take any query data, populate any matching data record class, and spit out a
List<T>. And it could do it really, really fast!
But I promised this was a duct-tape-and-shoestrings thing, didn't I?
Yes, in line with the rest of my accidental frameworks that have already been invented, I had two final problems - the first, as always, being portability. Taking this tool to a second app actually worked really well, which is surprising. Well, that is, until they needed me to connect to an Oracle server, and a Teradata server. I realized quickly that the SqlConnection is really just for SQL Server.
Taking it to a second app of course is where it goes from part of a product to a product of its own. Now instead of Sql*, it's Db* classes, like DbCommand and DbConnection, but configuring it to work with the differences in data types between servers ended in some unfortunate compromises. I'm sure that would have gotten ironed out eventually, but before long I was off to another project.
The second way that my framework fell short was in data mapping. There was always a single line of code that was prone to break, right where database types were mapped and cast into C# data types. The Flux Capacitor that did it was a single line of code that I commented very well. There are just some fields that don't map correctly, and every time I ran into a new one, or tried to cast it to the wrong type, I would get runtime errors. I think the answer to this would have been a utility called Automapper, but by now, why even?
What did I learn?
This was genuinely a fun project. In addition to the underlying ADO roots still inside the framework, I also got to really play with the implementation side of generics for the first time. Plus the chance to make my own ORM is just something that seems like fun to me. I'm still surprised that something like this isn't just included directly into the framework. Why is EFCore so limited?
I'd seen mention of the Dapper framework, and AutoMapper, a few times on stackoverflow answers, but didn't investigate them any further. I should really follow up on those kinds of leads more often. While it sounds like there is a problem with me -- and very likely there is in this case -- there is also the fact that my company doesn't like external software. If we didn't write it, how can we trust it? More than just a not-invented-here mentality, external software always has to go through a waiver process, and even new versions always have to be re-verified. It's often more trouble than just playing with some code and writing something custom.
On the next project, I finally followed the stackoverflow advice and looked into Dapper, which was initially written for StackOverflow.com. Surprisingly, it wasn't much different than my little framework, with similarly named methods. Plus, as usual, switching to Dapper would mean that I don't have to write my own unit tests for it. Dapper was surprisingly easy to plug in, and thanks to its use of extension methods and anonymous types, it solves a lot of the usability problems I created for myself.
I do wish Dapper had better support for cancellation tokens, especially since I know it's built into the .NET Core framework's data access methods, and is relatively easy to pass an aborted HTTP call through in order to cancel a data call.
I also struggle a little with the fact that I can't write unit tests on code that uses Dapper to talk to my database because extension methods can't be overridden. Oh well, I guess. That's the price we pay for some things.
Overall, I'm much happier using Dapper for raw data access.