Yet Another Git Primer

What is Git and why should I use it?

Motivation: Checkpoints

You have been writing code for hours. You’ve got into a good flow. What started off as a few functions in a single file has expanded into dozens of functions and classes in half a dozen files each consisting of several hundred lines of code. At least. You save the latest line you wrote. After The Great Cherry Coke Debacle in freshman year where you lost two pages of your final paper hours before the deadline, you’re very diligent about saving. But it has started to occur to you — not all saves are created equal. Saving code might have one of several purposes:

  • To compile and run your code as a sanity check.
  • To mark a checkpoint in your progress — a version of your code to backup and come back to if things break.
https://imgs.xkcd.com/comics/git.png
https://imgs.xkcd.com/comics/git.png
All comics have been borrowed from https://xkcd.com/

Copying a repo — clone

In order to use an existing repo, you must first clone it to your local machine. The clone command takes the location of a repo and copies it to your local directory along with all the repo's history. The location provided to the clone command can either be a url for a remote repository or a file path, if you want to make a copy of a repository already on your computer. The cloned repo will automatically store the source repo as the origin, so you can send back your changes.

Creating a repo — init

To create a repo, you would perform a init. This creates a .git folder inside the current directory, making it the base directory of your new repo. At this stage, the repo considers itself to be empty. Even if there are files already in your directory, they won't be added to the repository until you explicitly start tracking them. In order to start tracking a file, you'll have to add and commit it to the repo. A repo created like this will have no remote target to push changes to. Information about remotes and what files are being tracked are all stored in the .git folder.

Staging — add

Once you’ve created or cloned your repo and edited some files, you will want to add your changes to the repo’s history. This is a two step process. The first step is to stage your changes. You can choose which files you want to add to your repo. During this process, you can stage or un-stage changes as you wish — nothing gets saved permanently until the next step.

Commit — commit

To make your changes (semi) permanent, you make a commit. In addition to staged changes, a commit requires a commit message describing its contents. It is important to leave descriptive commit messages so you and your collaborators can more easily go back and track what was changed, when, and by whom if something goes wrong. Since commits are a `unit’ in git repo, you may want to split a set of your latest code changes into multiple commits For example, grouping your commits by feature would allow you to better trace issues or remove features later on. Just as with saving, it is good practice to commit often. The commit gets assigned a unique reference which can be used in other commands.

Receive — fetch/pull

Performing a fetch checks the remote repository for any updates that have been sent by others (or by you from other machines) that aren't present in your copy of the repo. In order to actually get the changes, you would perform a pull. This gathers all updates to the remote repo since your last pull and sends them to you, ensuring that your view of the repo is up to date.

Conflicts

In some cases, the commits on the remote server may overlap with your own changes. If this is the case when you pull, you'll be prompted to perform a merge. Merges can be stresfull and we'll talk about them in more detail a little later. We will also discuss some strategies to avoid unnecessary merging.

Send — push

Once you’ve made one or more commits, you’ll want to synchronize them with your remote repo. In order to do this, you’ll need to perform a push. Pushing sends your revisions to the remote repository and makes them available for other users to add.

Conflicts

Sometimes as you try to push, your remote repository will notify you that it has received a push from someone else since you last pulled from the repo. In situations like this, the remote repository will ask you to perform a pull first. You might see an error like this:

! [rejected] master -> master (fetch first) 
error: failed to push some refs to 'https://github.com/mrmechko/repo.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

A sample branching paradigm

Most repositories have a single branch to start with, called main. This branch should contain the most recent `released' version of your software. In early the early stages of development, of course, it might not be a finished product. Next, you might create a branch called develop to track less stable improvements to the software. Individual features will spawn their own branches off of develop and be merged back into develop as they are completed. When the team is sufficiently sure that the code on develop is ready for release, it can be pushed to main.

Creating branches — branch

branch allows you to create a new branch from the commit you are currently working on. This does not create any new commits. Instead, you just move your HEAD pointer to the existing commit. Once you've created the new branch, you will need to switch to the branch by performing a checkout --branch to switch your HEAD pointer to the tip of the branch and make sure any future commits get added there.

Exploratory — checkout

A checkout takes a commit reference and sets your working directory state to that commit. checkout --branch can be used to switch to a different branch.

Conflicts!

This line does not have a conflict. 
<<<<<<< main
this line has a conflict.
=======
This line is a conflict.
>>>>>>> other-branch;
This line does not have a conflict. 
this line has a conflict.
This line does not have a conflict. 
This line is a conflict.
  1. =======
  2. >>>>>>> other-branch

Pull Requests

Pull requests are a special feature provided by some git-hosting services (e.g. github, gitlab) which help with the latent potential chaos surrounding merge. Teams will often write-protect shared branches like main or development because introducing errors to those branches can be disastrous to the whole team. In such situations, a pull-request might require the approval of multiple team members or of product managers. A pull request allows the team to discuss commits that are requested to be merged into a branch, test the resulting merged code, and to add additional commits to fix any issues, all before the merge occurs.

Temporary — stash

stash is a very useful command which takes any uncommitted changes you've made to the working directory and stores them safely. You can later stash pop to return the changes themselves to the working directory. Note that if you move some changes to your stash and then make conflicting changes, then you will have to resolve the conflicts. After you've performed a stash to store some changes, you can pull, add, commit, and merge as necessary before using stash pop to return your changes to the working directory.

UNDO! UNDO!! — revert

Since each commit in git is a ‘change’, the natural way to undo things is to make the opposite change. revert will construct the inverse of a specific revision and commit it. This is great for two reasons. First, you haven't altered the commit history - just added to it. Second, you can target a specific commit from several days ago without undoing all the work you've done since!

Some strategies to avoid panic.

Commit messages

Work on your own branch

Merging your code with someone else’s code is one of the more stressful processes when it comes to working in a team. While it is important to make sure your changes remain compatible with your teammates’ changes, you might want to avoid integrating partially completed features. By working on separate branches, you avoid issues surrounding merging code until actually necessary. Many teams will perform integration tests on a regular basis to ensure that teammates don’t diverge too far. You will still have to perform the merges from time to time, but at least you can minimize the frequency and complexity of unexpected merge conflicts and bugs that manifest from combining different bodies of code.

Test before you merge

Having a number of commits between working versions of code is not usually a problem, however, it is a good idea to test your code thoroughly before pushing or merging on a shared branch.

Avoiding Conflicts

The best way to deal with conflicts is to have as few of them as possible. Working on separate branches reduces the frequency with which you’ll have to deal with conflicts, but that might open the door to much larger conflicts cropping up down the line. Conflicts will usually only occur when two people are editing the same segment of code. If this is happening, it probably means that two people are working on very closely related features! The best way to avoid conflicts in this case is to communicate with your team before hand and let everyone know what you’re working on and collaborate proactively instead of reactively.

Help, I need to remove a file from the repository!

git rm will untrack a file that you have committed to the repository and delete the file from your disk. If you don't want to delete the file altogether, you can use the git rm --cached option. The file will be removed from the repo in your next commit but a copy will remain in your working directory. This option is great if you accidentally commit a file containing, say, private notes that don't belong in the team's repository.

Help, I staged a file for commit but I don’t want to commit it!

reset HEAD [filename] will reset the index. Your local changes will still be on disk, but the file won't be committed.

CS Professor, AI/NLP Researcher