31 Get upstream changes for a fork

This workflow is relevant if you have done fork and clone and now you need to pull subsequent changes from the original repo into your copy.

Sometimes you set this up right away, when you fork and clone, even though you don’t need it yet. Congratulations, you are planning for the future!

It’s also very typical to do this step a few days or months later. Maybe you’re taking an interest in someone else’s work for the second time and you want to make another pull request. Or you just want your copy to reflect their recent work. It is also totally normal to set this up upon first need.

Vocabulary: OWNER/REPO refers to the original GitHub repo, owned by OWNER, who is not you. YOU/REPO refers to your copy on GitHub, i.e. your fork.

31.1 No, you can’t do this via GitHub

You might hope that GitHub could automatically keep your fork YOU/REPO synced up with the original OWNER/REPO. Or that you could do this in the browser interface. Then you could pull those upstream changes into your local repo.

But you can’t.

There are some tantalizing, janky ways to sort of do parts of this. But they have fatal flaws that make them unsustainable. I believe you really do need to add OWNER/REPO as a second remote on your repo and pull from there.

31.2 Initial conditions

Get into the repo of interest, i.e. your local copy. For many of you, this means launching it as an RStudio Project. You’ll probably also want to open a terminal (Appendix A) within RStudio for some Git work via Tools > Terminal > New Terminal.

Make sure you are on the master branch and your “working tree is clean”. git status should show something like:

On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

BTW I recommend that you never make your own commits to the master branch of a fork. However, if you have already done so, we are going to address your sorry situation below.

31.3 List your remotes

Let’s inspect the current remotes for your local repo. In the shell (Appendix A):

git remote -v

Most of you will see output along these lines (let’s call this BEFORE):

origin  https://github.com/YOU/REPO.git (fetch)
origin  https://github.com/YOU/REPO.git (push)

There is only one remote, named origin, corresponding to your fork on GitHub. This figure depicts a BEFORE scenario:

This is sad, because there is no direct connection between OWNER/REPO and your local copy of the repo.

The state we want to see is like this (let’s call this AFTER):

origin    https://github.com/YOU/REPO.git (fetch)
origin    https://github.com/YOU/REPO.git (push)
upstream  https://github.com/OWNER/REPO.git (fetch)
upstream  https://github.com/OWNER/REPO.git (push)

Notice the second remote, named upstream, corresponding to the original repo on GitHub. This figure depicts AFTER, the scenario we want to achieve:

Sidebar: If you used usethis::create_from_github("OWNER/REPO") for your original “fork and clone”, the upstream should already be set up. In that case, you can skip to the part where we pull from upstream.

31.4 Add the upstream remote

Let us add OWNER/REPO as the upstream remote.

On GitHub, make sure you are signed in and navigate to the original repo, OWNER/REPO. It is easy to get to from your fork, YOU/REPO, via “forked from” links near the top.

Use the big green “Clone or download” button to get the URL for OWNER/REPO on your clipboard. Be intentional about whether you copy the HTTPS or SSH URL.

31.4.1 Command line Git

Use a command like this, but make an intentional choice about using an HTTPS vs SSH URL.

git remote add upstream https://github.com/OWNER/REPO.git

The nickname upstream can technically be whatever you want. There is a strong tradition of using upstream in this context and, even though I have better ideas, I believe it is best to conform. Every book, blog post, and Stack Overflow thread that you read will use upstream here. Save your psychic energy for other things.

31.4.2 RStudio

This feels a bit odd, but humor me. Click on “New Branch” in the Git pane.

]

This will reveal a button to “Add Remote”. Click it. Enter upstream as the remote name and paste the URL for OWNER/REPO that you got from GitHub. Click “Add”. Decline the opportunity to add a new branch by clicking “Cancel”.

31.5 Verify your upstream remote

Let’s inspect the current remotes for your local repo AGAIN. In the shell:

git remote -v

Now you should see something like

origin    https://github.com/YOU/REPO.git (fetch)
origin    https://github.com/YOU/REPO.git (push)
upstream  https://github.com/OWNER/REPO.git (fetch)
upstream  https://github.com/OWNER/REPO.git (push)

Notice the second remote, named upstream, corresponding to the original repo on GitHub. We have gotten to this:

31.6 Pull changes from upstream

Now we can pull the changes that we don’t have from OWNER/REPO into our local copy.

git pull upstream master --ff-only

This says: “pull the changes from the remote known as upstream into the master branch of my local repo”. We are being explicit about the remote and the branch in this case, because (as our git remote -v commands reveal), upstream/master is not the default tracking branch for local master.

I highly recommend using the --ff-only flag in this case, so that you also say “if I have made my own commits to master, please force me to confront this problem NOW”. Here’s what it looks like if a fast-forward merge isn’t possible:

$ git pull upstream master --ff-only
From github.com:OWNER/REPO
 * branch              master     -> FETCH_HEAD
fatal: Not possible to fast-forward, aborting.

See Um, what if I did touch master? to get yourself back on the happy path.

31.7 Push these changes to origin/master

This is, frankly, totally optional and many people who are facile with Git do not bother.

If you take my advice to never work in master of a fork, then the state of the master branch in your fork YOU/REPO does not matter. You will never make a pull request from master.

If, however, your grasp of all these Git concepts is tenuous at best, it can be helpful to try to keep things simple and orderly and synced up.

Feel free to push the newly updated state of local master to your fork YOU/REPO and enjoy the satisfaction of being “caught up” with OWNER/REPO.

In the shell:

git push

Or use the green “Push” button in RStudio.

31.8 Um, what if I did touch master?

I told you not to!

But OK here we are.

Let’s imagine this is the state of the original repo OWNER/REPO:

... -- A -- B -- C -- D -- E -- F

and and this is the state of the master branch in your local copy:

... -- A -- B -- C -- X -- Y -- Z

The two histories agree, up to commit or state C, then they diverge.

If you want to preserve the work in commits X, Y, and Z, create a new branch right now, with tip at Z, via git checkout -b my-great-innovations (pick your own branch name!). Then checkout master via git checkout master.

I now assume you have either preserved the work in X, Y, and Z (with a branch) or have decided to let it go.

Do a hard reset of the master branch to C.

git reset --hard C

You will have to figure out how to convey C in Git-speak. Specify it relative to HEAD or provide the SHA. See future link about resets for more support.

The instructions above for pulling changes from upstream should now work. Your master branch should reflect the history of OWNER/REPO:

... -- A -- B -- C -- D -- E -- F

If you chose to create a branch with your work, you will also have that locally:

... -- A -- B -- C -- D -- E -- F (master)
                   \
                    -- X -- Y -- Z (my-great-innovations)

If you pushed your alternative history (with commits X, Y, and Z) to your fork YOU/REPO and you like keeping everything synced up, you will also need to force push master via git push --force, but we really really don’t like discussing force pushes in Happy Git. We only do so here, because we are talking about a fork, which is fairly easy to replace if things go sideways.