Co-authored-by for Copied Code

Co-authored-by for Copied Code

Maybe you're like me and you have a closed source repo that you need to copy shared library code out of occasionally to the open source version. You can't just set the closed repo as an upstream and merge in commits to the open source repo as this might include changes not meant for the open repo. However, you didn't write the code so you'd like to rightfully credit your coworkers. Luckily GitHub will show multiple authors for a commit if you use the Co-authored-by git commit trailer.

Now you probably don't want to sit there and read every single commit and see what folders it touches just to get the right list of people who made changes to the dependencies you needed to update. Luckily we can generate a list for ourselves with the git log command. Here's what I used in order to do this:

git log 427daa8..7565dc0 --pretty=format:"Co-authored-by: %an <%ae>" -- $(ls ~/influxdata/influxdb/) | sort | uniq

Since influxdb has the crate folders in the same place I was able to just use ls here, but you could instead just list every directory that you copied over. Let's break this command down. First git log by default will just show the log we all know and love. You can also pass to it a revision range. So in this case I said "show me the log between commits 427daa8..7565dc0" which starts with the commit we last had pulled in deps from up to the one we we were actually copying them from. What is nice about git log is that you can actually specify how you want it to output data. There are a lot of options you can use to set the output, but you can also add your own strings in there. So in this case with the --pretty flag I said "for each item in the log output 'Co-authored-by:' with the name of the author and their email for the commit". This command outputs one line per commit in the log. Now we want to scope it just to the directories we copied and changed in that revision range by passing in – <directories> .

So what's with the piping to sort | uniq? Well multiple commits can be made by the same person. Piping it into that will let us get rid of any duplicates. However, it will not deduplicate someone using a different email. For example you might see something like this:

Co-authored-by: Michael Gattozzi <michael@ductile.systems>
Co-authored-by: Michael Gattozzi <self@mgattozzi.dev>
Co-authored-by: Michael Gattozzi <mgattozzi@gmail.com>
Co-authored-by: Michael Gattozzi <00000000+mgattozzi@users.noreply.github.com>

Sometimes people commit from a different computer with different emails set for git or from GitHub and it uses the noreply GitHub email. There might also be some bots that did commits included. There's nothing you can really do here except just prune the list down to one email per person like I had, but hey at least now you don't need to go around trying to do this by hand! All you need to do is remove a few essentially duplicate lines, add them to the end of your commit that will be put in main, and you're good to go. GitHub will take care of the rest and your coworkers get the credit they deserve.