Good morning, and welcome to my talk, "Contributing to Django … OR How I learned to stop worrying and just try to fix an ORM Bug"
But before we get started, I wanted to see how many people are attending their first DjangoCon
Awesome! I’m so glad that you’re all here. And welcome
My name is Ryan Cheley.
I’ve been using Python for about 7 years now, and Django for about 5 years
I work for a Medical Services Organization in Southern California called Desert Oasis Healthcare
Part of my duties include managing a team of web developers and one of our newer projects is an internal Django Application
But today I'm going to talk about the fantastic experience I had in trying to fix a bug in the Django ORM
But first, a side quest
When I was 15 my Boy Scout troop did a hiking trip to Mt Whitney. For those of you that don’t know,
Mt Whitney is the highest peak in the contiguous United Stantes with an elevation of 14,505 ft (4421 meters).
There I was at the Mt Whitney Trail Crest, 13,600 ft above sea level … that’s 4159 meters for our metric friends
We had spent the 7 days hiking to get there.
I was only 3 miles from the summit of Mt Whitney ... but I wasn't sure I could go any further.
You see, my hiking companions told me I didn't look so good.
Honestly, I didn't feel so good either.
Did I make it to the summit?
I’ll say that getting to DjangoCon US at times felt an awful lot like trying to get to the summit of Mt Whitney
I've been trying to make it to DjangoCon since 2018
2018: But, I had a work conference I had to attend.
2019: It happened again in 2019
2020: I was determined to go in ... but then we locked down the world and none of got to go
2021: I was able to attend the online version and had a great time
2022: And then I finally made it in person 2022 … And it was amazing.
This is a shot of all of the participants at DjangoCon US last year in San Diego
And yes, that is actually me circled there
https://www.flickr.com/photos/djangocon/52487681790/in/album-72177720303543873/
And here it is a little closer
https://www.flickr.com/photos/djangocon/52487681790/in/album-72177720303543873/
Here I am … I had slightly longer hair then and had maybe a little less gray in my beard
I was very excited about attending the sprints and working on something ORM related
My strategy for finding a ticket was to look for something old, that seemed straightforward(ish) and then just go for it.
I spent some time during the conference
trying to find something that met that criteria,
and by the day of the sprints
I had been able to narrow it down
to a single ticket
The critiria
Old?
Straightforward
Ticket 10070
I went into Trac and
assigned the ticket to myself
I added a comment:
I’m at DjangoCon US and I’m looking at this ticket
Reviewing the Ticket
I was fortunate enough to have Simon Charette at the sprints last year
For those of you that don’t know who Simon Charrette is,
He’s a Django contributor that’s spent the last 10+ years working on the Django ORM and the migration framework
I spent time reviewing the ticket,
and after talk with Simon a bit about it
It seemed as though it wasn't a problem anymore
And so, I added a comment to the ticket
In looking at the tests in raw_query\tests
I see a few tests that already exist
These are already available
and appear to be testing the very thing that
the ticket is asking for
With that,
I marked the Ticket as Done!
Later that day a comment was added to the issue thanking me for working on the ticket … from the original reporter for the ticket Matias Surdi
I was so excited that I let the world know about it on Twitter / X
Here we see a screenshot of my tweet, which says:
I closed a ticket while at DjangoConUS2022 sprints with Simon Charette
I spent the rest of my time at the sprints
working on other
non-ORM Django related tickets (a quick documentation update)
in other Django adjacent projects (an update to Django Packages to add a View More button to the home page)
All in all, I had an amazing time at the conference and the sprints last year,
and met some truly amazing people
as I’m sure you will this year.
When it was all over
I headed home
having been able to close a bug
in the ORM!!!!
And I felt really, really good about this
This is a scene from the James Cameron movie Titanic, released December 19, 1997, where Leonardo DiCaprio's character Jack Dawson says, “I’m the king of the world!”
And that’s kind of how I felt
Shai Berger is a member of the Django Security team
Sorry -- this is still broken on Sqlite.
The additions and fixes mentioned are mostly for Oracle and other backends.
The Sqlite backend, at the time I write this,
still has supports_paramstyle_pyformat = False
and still borks with a syntax error if you try to replicate the session in the ticket description.
What is supports_paramstyle_pyformat?
It’s a flag
That indicates if the backend (SQLite, Postgres, Oracle, etc) support 'pyformat' style
Which looks like this
("... %(name)s ...", {'name': value})
For SQLite this was set to False
Oh No!
I was faced with two potential choices:
Choice 1
Ignore the bug.
It had been a bug for YEARS … as you’ll recall, it was opened on January 19, 2009
and didn’t seem to have caused any MAJOR issues
Choice 2
Or
I could try to Fix it
… Try to work it on my “own”
I also had lots of feelings surrounding the discovery of the bug still existing
Embarrassed
I declared it fixed on Twitter
I had people at the Sprints ASK ME HOW I DID IT!
Disappointed
I felt like I let people down
I felt like I let myself down
Fear
Imposter Syndrome – started to sink in
that voice
in the back of my head asking,
“Can YOU really work on this on your own?”
“Do you really have any idea what you’re doing?”
ACTION: Take a sip of water
But then I remembered a talk that Carlton Gibson gave
at DjangoCon US in 2018
Titled
“Your Web Framework Needs You”.
It's a great talk and you should totally watch it,
but here are the highlights that I got out of it
And to drive that point home I’m going to talk about a comment on another issue.
In June of 2021 Sarah Boyce, who is now a member of the Triage & Review team, posted a comment on ticket 32359:
I was wondering if this is
being worked on or if I could try and pick it up, happy to work on it in a group as well
To which former fellow Carlton Gibson replied
Hi Sarah - you're very welcome to pick it up. :)
I'd suggest an initial time-boxed investigation to work out what the state of play is.
At that point you'll be the world expert TM on this issue, so either you'll have an idea or you'll want input. If you open a PR at that point (even just adding comments in the second case) to explain what's going on, it's easier for others to get up to speed and provide input
And so with that
I went to work on the ticket
again
I did what most of us probably do
when we’re assigned a ticket for work,
or on an open source or personal project
(and honestly I should have done a better job of when I initially worked the ticket at the sprints last year)
Write down what you learned ... as your learning it Replicate the Bug Read some docs … which can include Doc strings! Write some code Test the code
But before we can write down what we’ve learned, we need to learn some stuff!
Replicate the Bug
First, I set up the connection to point to a Postgres Database
Here we see a standard connection using the postgres Django Database Engine
Taking the code from the original ticket and putting it into the REPL
When we run this
This runs without error … as expected
Now, we swap out the Postgres connection and put in a connection to a SQLite Database
Here we see a standard connection using the SQLite Django Database Engine
And try to reproduce the bug
Again, taking the code from the original ticket and putting it into the REPL
This will generate a stack track error
And we get this stack trace as reported in the bug
A comment posted by Baptiste Mispelon in November 2019 provided a workaround. Their comment was
I don't know how recent it is, but sqlite does support named parameters,
albeit with a different syntax
Both of these work on my machine (and Baptiste gave two commands that work) c.execute("select name from inventory_host where id=:id", {'id': '1’}) Host.objects.raw("select * from inventory_host where id=:id", {'id': '1’})
Baptiste then asks the question, Do we want to try and make the syntax consistent across all backends or would it be enough to document the syntax for each one?
When I applied that work around to the code that borked, to use Shia’s term above, above it does run as intended
OK, we’ve replicated the bug! Now, we’re going to read some docs … which can include Doc strings!
Read some docs … which includes Doc Strings!
The second to last line of the Stack Trace says
django/db/backends/sqlite3/base.py", line 357, in execute
So let’s start at Django.db.backends.sqlite3.base.py on line 357
When I look at that line in that file I see that there is
a class called SQLiteCursorWrapper
which has three methods:
Here we see that both execute
And execute many
CALL
upon convert_query
So let’s go take a look at the code for convert_query
Here we see the entire method for convert_query
is a single line that uses FORMAT_QMARK_REGEX
BUT
FORMAT_QMARK_REGEX = _lazy_re_compile(r"(?<!%)%s")
Which is a method that lazily compiles regular expression
In looking at the helper method _lazy_re_compile() we see that it “lazily compiles a regex with flags” and has two parameters
Regex: The Regular Expression to evaluate Flags: Flags that modify the behavior of the regex compilation
Some examples of flags are
re.I: ignore case sensitvity
re.S: makes the dot character match any character
In our case, we don’t need to worry about the flags though. I included for completeness here
OK … so what have we learned?
The issue is with the execute method But that is driven by the convert_query() method will also likely impact the executemany() method as well
What we've done and what we still have left to do
Write some code
We know from previous slides that this select statement will fail,
while the second one will succeed
SO ... One solution could be to somehow transform the first statement into the second,
accounting for all of the ways that someone could put in a parameter set.
Use a regular expression to convert the statement!
We want to go from this
To this
We want to replace )s with an empty string And we want to replace %( with a :, To get here
This transformation can be done with this regular expression
And with that, I posted my initial idea to the ticket in Trac
But as the (paraphrased) saying goes (originally attributed to Jamie Zawinski), “I had a problem so I used a Regular Expression and now I have two problems”
ACTION: Take a sip of water
This looks quite fragile
consider, for example a query which, for whatever reason, includes )s
in a string constant
A more robust approach is to create a naming dictionary from all the parameter names, and use that to format the original text:
naming_dict = { param: f":{param}" for param in param_names}
query = query % naming_dict
Where param_names can be collected either by parsing the query, or from the parameters given.
If this would have been all that was put into the comment I might have been a bit discouraged because honestly, I was a really confused, but …
Shia also left this in the comment
take a look at the Oracle backend -- I wrote that piece too long ago to remember which it takes,
but at the time I knew what I was doing there
Similarly Simon Charette said:
I agree with Shai that we should avoid using regex here. Simon stated in the ticket:
The most straightforward solution I can think of,
I haven't looked at the Oracle implementation in details,
would be to simply implement getitem
This will ensure the backend
reports a meaningful low level message regarding missing_param
in cases of a parameter is missing
instead of a KeyError
during the Django translation phase
Which should make for a (hopefully) better developer experience in debugging
The original code for the convert_query method is given by
Again, It’s returning the FORMAT_QMARK_REGEX (which is a lazily compiled regular expression)
I need to incorporate the feedback from Shia
Recall the hint given by Shia regarding using the Oracle as a starting point using a naming dictionary
So applied the feedback to my use case
Taking Shia’s hint and applying it to my use case
We see this implemented here.
The if statement checks to make sure that params
is a dictionary like object, so we check the params to ensure “keys” exists using hasattr
I used the same idea here for executemany.
Again checking for a dictionary like object
and using a list comprehension
to get information
from param_list, which is potentially a generator
What we've done and what we still have left to do
Test the code that you wrote
When going through something like this we want to
make sure the current tests pass
Write new tests if necessary
OK, as I work through testing the issue I've found a few things:
There is a features file django.db.backends.sqlite3.features which has a DatabaseFeatures class with supports_paramstyle_pyformat value set to False; this was also something that Shia called out in their initial comment
I see a class called RawQueryTests that has two methods test_pyformat_params
test_query_representation
BUT, in looking at test_pyformat_params
I see this decorator
@skipUnlessDBFeature("supports_paramstyle_pyformat")
Since supports_paramstyle_pyformat = False for SQLite
This will cause the test to be skipped
So, in order for the tests to check what is happening it looks like I have to:
Add my proposed code Change the flag for supports_paramstyle_pyformat from False to True
Based on the tests discovered above, it doesn’t look like new tests are needed
Before changing the code, let’s see the outcome of running the two tests above
So we’ll run just the two tests using this command
Here we’re using the Django test runner, and using the k flag to specify the tests to run
Since supports_paramstyle_pyformat = False
The test test_pyformat_params should be skipped
While the test test_query_representation should pass
And as expected, here are the test results
One test skipped denoted with the s
And as expected, here are the test results
One test passed denoted with the .
When working on testing in a sitation like this, the strategy I use is something I’m calling a Possible States Testing Matrix. I’m not sure where I got that idea or term from, but I’m sure I didn’t come up with it on my own
What are our possible states?
The original state, that is
Flag is False
With the original code
Here 1 test should pass and 1 test should be skipped
A broken state, that is, when the Flag is set to true with the original code where 1 test should pass and 1 test should fail. This is why the bug exists in the first place!
A ‘check’ state where the flag is set to False, and the code is updated. Here, as in the original state, we should have 1 test pass and 1 test be skipped
And finally, the desired state, where the flag is set to True with the updated code. Here we should have 2 passing tests!
While I’m only focusing on two tests during development,
We also need to make sure that the entire test suite passes!
So, after those two tests had passed I did run the entire test suite, and it did continue to pass
What we've done and what we still have left to do
Write down what you’re learning as you’re learning … because we all do this, right?
I didn’t until working on this issue.
I was inspired by a talk given by Simon Willison at DjangoCon US 2022 titled,
“Massively increase your productivity on personal projects with comprehensive documentation and automated tests”
This talk included an idea called Simon calls ‘public notes’.
This is another talk I highly recommend watching and implementing the public notes idea
As I said, this was the first time I used the idea,
but I've used it a few more times since then
as I try to learn about a thing.
I’ve even started doing it at my day-job
OK, great, but what ARE Public Notes?
Essentially you use an issue
(in my case a GitHub issue)
as a spot to work through
a problem
and leave yourself tips,
hints,
breadcrumbs
… whatever you want call it
for what you’ve tried and learned,
basically doing what you would do
in the ‘scientific method’
In my notes there’s a lot of back and forth, a feeling going two steps backward, and then 1 step forward kind of thing.
But even looking at it now, I can see my thought process for how the bug ended up getting fixed
And honestly, were it not for the Public Notes I’m not sure I would have been able to give this talk
Here is a short video of my Public Notes on GitHub
I have 40 comments
as a conversation to myself,
back and forth
from Oct 30, 2022
to Nov 8, 2022
That’s just 10 days …
We’ve spent a bit of time getting here, but what was ‘The Fix’?
Recall the feedback given by Shia about the use of a naming dictionary
Which I adapted for my use case
And applied to execute method
And applied to executemany method
While leaving convert_query unchanged
As a reminder, here is the the original code for the convert_query method
Nick Pope added comments to the conversation on the issue in GitHub:
There is duplication
which should be pushed down into .convert_query()
.
Converting %(value)s
to :value
in this way will
also convert %%s
to %s
which will then be incorrectly converted to ?
in .convert_query()
.
Converting param_list
to a list
in .executemany()
will saddle us with poor performance.
We many have been passed a generator and the current approach will force all of the parameters to be materialized.
The approach needed requires that we can peek at the first item in the generator without consuming the lot.
I'll provide a PR to your branch so that you can see what I mean.
True to their word, Nick submitted a PR against my code
Where the duplicated code was pushed into convert_query
And updated the execute() method
And the executemany() method
For those of you that may want to see what the Full Diff looked like here we see the convert_query
The full diff of the execute method from GitHub
The full diff of the executemany method from GitHub
As a BONUS to all of this the SQLite Docs in the Python Standard Library got updated
Here’s the documentation pre change
With a Note:
The sqlite3 module supports both "qmark"
and ”numeric" DB-API parameter styles,
because that is what the underlying SQLite library supports.
After the change to the documentation made by Nick,
We see “named” was added
Since that update other updates have been made
and is now less verbose,
BUT the named parameter style is still a part of the note
From this I hope it’s clear that this fix wasn't just implement or made by me
I had a lot of help in getting the code to the right spot.
Shia Berger Simon Charette Nick Pope Mariusz Felisiak
Shia Berger
Feedback that the approach had some fragility
that regex should be avoided (with Simon Charette agreeing)
And gave a place to get started for investigation using a Oracle as a starting point
Simon Charette
He gave a great keynote at DjangoCon US 2022
And an awesome introduction on the ORM in general at the sprints
And reminded me to ‘Update the docs!’
Nick Pope
As we saw above, the PR that Nick did against my code helped improve it dramatically!
It removed code duplication and pushed down that duplicative code to the convert_query() method
Mariusz Felisiak
Simplified some of the comments in the code
And ultimately was the one that merged the code into Django
I hope I’ve been able to convey what a positive experience this was for me
I also hope I’ve been able to convey how much I’ve learned
This process taught me more about
SQLite – it does ACTUALLY support pyformat style The Django ORM – specifically how it’s structured in the code and how to better reason about it Python – generators … up to this point I didn’t really understand them (and maybe I don’t still 100%) but I’m closer now because of the work I did on this ticket
Started using Public notes to learn about a thing … to write down what I learned. Public notes since then have included:
Upgrading my Digital Ocean servers OS Installing Python 3.11 on a Raspberry Pi Figuring out how SSH keys work and how I should implement
This was a great experience, and you may think that because I’m telling you about it, that I’ve made tons more contributions since then
BUT
I’ve not had an opportunity to contribute again to Django since then … sometimes life gets in the way.
I have contributed to a couple of other open source projects, most notably
Django Packages
Implemented Django at DOHC
Lessons
The ORM can seem big and scary
The code for Django can seem big and scary
But remember
The Django ORM ...
Is Python
In fact, all of Django ...
Is Python
So when you’re ready to go look for a ticket to work on
Whether it’s
ORM related or NOT,
remember
that there is an amazing community here
that wants to see you succeed …
and at the end of the day
the code you’re going to
look at,
The code that you’re going to read,
and The code that you’re going to write
Is Python
So just keep that in mind
when you’re ready to dive in
to one of the currently <> open tickets
And because this is Django,
there is great documentation
on how to triage tickets
and work through them
Back to Mt Whitney
As I said before, I was only three miles from the summit, but I just wasn’t feeling it.
My fellow hikers didn’t discourage me from continuing, but they didn’t really encourage me either.
I didn’t make it to the summit that day and it’s been a regret ever since.
I’ve not attempted hike quite that big since then
But why am I telling this story? What does this have to do with fixing a bug in the Django ORM.
When the bug wasn’t resolved I had an option to ignore it … to give up like I did that day at the Trail Crest on Mt Whitney
I could have NOT gone to the “summit” of fixing the BUG
But the community helped me to work through it.
The solution that was ultimately submitted
and merged wasn’t just mine, it was through the hard work of several people.
My experience has been that this community wants you to succeed. They want you to learn.
If this community had been up there with me that day,
I’m 100% sure
that I would have made it to the summit
because of the encouragement that is provided
and the determination to want you to succeed
So remember ...
Your Framework needs YOU
And there’s a great community to support you and wants to see you succeed and learn
This year, there will be both
Development Sprints and
Contribution Sprints.
If you’re looking to get involved, this is a great opportunity to do so and I highly encourage you all to go.
I really hope to see you there
My presentation wasn’t made in a vacuum and I’d like to acknowledge the help that I received in getting this presentation created
Thank you
Reference Links
Your Web Framework Needs You! (Slide 33) https://www.youtube.com/watch?v=8eYM1uPKg7c
Increase your productivity on personal projects with comprehensive docs and automated tests - DCUS (Slide 88) https://www.youtube.com/watch?v=GLkRK2r/GBO
My Public Notes (Slide 93) https://github.com/ryancheley/public-notes/issues/1
Keynote: State of the Object-Relational Mapping (ORM) with Simon Charette (Slide 116) https://www.youtube.com/watch?v=HNIGFriBI8o
Django Triaging Tickets (Slide 137) https://docs.diangoproject.com/en/dev/internals/contributing/triaging-tickets/