What is Git and why should I use it?
When you’re working on a file or a document, it is important to save frequently in order to make sure you don’t lose your work. But code is complicated. The development of code is not linear — often edits are exploratory and end up being reverted. You find quickly that what you want to be able to save and explore is not the actual state of your files, but the sequence of edits you’ve made.
git saves your project as a sequence of edits to your code base rather than a set of file contents.
You have been writing code for hours. You’ve got into a good flow. What started off as a few functions in a single file has expanded into dozens of functions and classes in half a dozen files each consisting of several hundred lines of code. At least. You save the latest line you wrote. After The Great Cherry Coke Debacle in freshman year where you lost two pages of your final paper hours before the deadline, you’re very diligent about saving. But it has started to occur to you — not all saves are created equal. Saving code might have one of several purposes:
- To ensure that you don’t lose what you’ve done.
- To compile and run your code as a sanity check.
- To mark a checkpoint in your progress — a version of your code to backup and come back to if things break.
In a smaller, simpler project, you might have just copied the code into a different folder now and then. But what if you want to go back several checkpoints? or what if you forget? What if you want to copy something from the checkpoint you made three hours ago? What if you forgot to copy the checkpoint? Preparing for all of these eventualities leads to more and more complicated practices of saving and backing up your code. Eventually, you might find that you are annotating your checkpoints with little messages describing what exactly you’ve done since the last one.
git is the underlying software that performs version control, Github is a website where users can host and collaborate on their repositories. Github provides many additional features such as issues for repositories where users can file bug reports or make feature requests, code reviews where users can provide feedback on each others code, and project boards which can help teams of developers to coordinate their work. Github also provides a framework to give developers easier access to various tools such as continuous integration servers and app hosting.
A git repository, or repo for short, is just a normal folder with a subfolder named
.git/ containing metadata. The
.git folder will contain information about previous commits, branches, and where the remote repos are located. Most repos will also contain one or more files called
.gitignore file tells git which files to skip adding to the repository. You might add things like
*.class files in Java or
*.o files for C projects to your
.gitignore file. Repos hosted on Github might also have a
.github folder to store configuration of Github specific features.
If you open a repo folder in a file browser, you might not see some of these files or folders because the prefix
. on a filename is an instruction to many operating systems to keep a file hidden. Many developer tools use "dotfiles" to store user settings invisibly. Most file explorers have options to make all files, including dotfiles, visible.
Conceptually, there are three parts to the git repo. The working directory are the current state of the files in your repository. The staging index stores the the current uncommitted changes to the repo. Finally, the commit history stores all the incremental updates to the repository.
The set of all the commits that have been added to the repository is called the commit history. When people
clone the repository or
pull to update their local copy with changes from the remote, they get a copy of the commit history. Each commit will have a unique reference which is shared across everyone's copy of the repository, allowing your teammates to all jump to a specific commit across different machines. Of course, you'll have to
push your changes to the remote before your teammates will be able see them! Below, we'll discuss how to do that.
Commands in this primer are referenced by name. If you want to use these commands in the terminal, you will have to preface them with
git push in the terminal. If you are using a git client, like Github Desktop or GitKraken, this primer tells you which command to look for.
In this section, we will outline the basic operations for interacting with git repos
Copying a repo —
In order to use an existing repo, you must first clone it to your local machine. The
clone command takes the location of a repo and copies it to your local directory along with all the repo's history. The location provided to the clone command can either be a url for a remote repository or a file path, if you want to make a copy of a repository already on your computer. The cloned repo will automatically store the source repo as the origin, so you can send back your changes.
You can manually set a remote target using
remote add [name] [url]. You can also have multiple remote locations, for example, a Github repo for collaboration and a private server where you deploy your code.
Creating a repo —
To create a repo, you would perform a
init. This creates a
.git folder inside the current directory, making it the base directory of your new repo. At this stage, the repo considers itself to be empty. Even if there are files already in your directory, they won't be added to the repository until you explicitly start tracking them. In order to start tracking a file, you'll have to
commit it to the repo. A repo created like this will have no remote target to push changes to. Information about remotes and what files are being tracked are all stored in the
Once you’ve created or cloned your repo and edited some files, you will want to add your changes to the repo’s history. This is a two step process. The first step is to stage your changes. You can choose which files you want to add to your repo. During this process, you can stage or un-stage changes as you wish — nothing gets saved permanently until the next step.
To make your changes (semi) permanent, you make a commit. In addition to staged changes, a commit requires a commit message describing its contents. It is important to leave descriptive commit messages so you and your collaborators can more easily go back and track what was changed, when, and by whom if something goes wrong. Since commits are a `unit’ in git repo, you may want to split a set of your latest code changes into multiple commits For example, grouping your commits by feature would allow you to better trace issues or remove features later on. Just as with saving, it is good practice to commit often. The commit gets assigned a unique reference which can be used in other commands.
Sometimes it might feel like committing too often will make the history look messy. There are advanced commands such as
rebase --interactivewhich allow you to rewrite the history. This is a dangerous operation and can break things for other people. There are very few ways to completely destroy a git repo and misuse of history rewriting is basically it.
fetch checks the remote repository for any updates that have been sent by others (or by you from other machines) that aren't present in your copy of the repo. In order to actually get the changes, you would perform a
pull. This gathers all updates to the remote repo since your last
pull and sends them to you, ensuring that your view of the repo is up to date.
In some cases, the commits on the remote server may overlap with your own changes. If this is the case when you
pull, you'll be prompted to perform a
merge. Merges can be stresfull and we'll talk about them in more detail a little later. We will also discuss some strategies to avoid unnecessary merging.
fetch only checks for changes, performing a
fetch before you
pull can tell you whether you need to prepare for a
Once you’ve made one or more commits, you’ll want to synchronize them with your remote repo. In order to do this, you’ll need to perform a
push. Pushing sends your revisions to the remote repository and makes them available for other users to add.
In general, synchronizing with the remote regularly (
pull) will reduce the likelihood of requiring a
merge. There might be an occassional reason to refrain from a
push - perhaps other members of your team are in the midst of resolving some issue on the branch you've been working on and you don't want to add to the chaos just yet. Some advanced git users will clean up their local commit history and by combining commits or rewriting the commit messages before pushing them to the remote.
Sometimes as you try to push, your remote repository will notify you that it has received a push from someone else since you last pulled from the repo. In situations like this, the remote repository will ask you to perform a
pull first. You might see an error like this:
! [rejected] master -> master (fetch first)
error: failed to push some refs to 'https://github.com/mrmechko/repo.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
This is a safety measure. This way, if there are any issues surrounding integrating your changes with your collaborator’s changes, you can resolve them yourself. We will discuss how to handle conflicts below.
A key features of git repos is the ability to create a branch of your code. Branches allow you switch between and develop multiple versions of your code simultaneously. If we decide that we want to combine two branches, we can perform a
merge. Merging includes all the commits from one branch into another and then caps them off with a special merge commit to indicate that the two branches have been combined. This special merge commit can be used to resolve any conflicts that might have arisen. We will talk about merge conflicts a little later.
The most common use for branches would be developing a new feature for a piece of software with an existing stable version. You don’t want to add unstable code to the version that other people might be using, so you can make a branch to work on your feature until it is ready. As mentioned previously, a git repo stores the history as a sequence of changes to repo, not as files. That means in many cases changes made to code by different people can be automatically combined by git. This does NOT mean that the combined files will work! Make sure you test your code after every
merge (and before every
Your git repo has a
HEAD pointer which refers to the revision/commit you're currently working on. When you switch to a branch, your
HEAD pointer automatically gets moved to the latest commit on that branch.
A sample branching paradigm
Most repositories have a single branch to start with, called
main. This branch should contain the most recent `released' version of your software. In early the early stages of development, of course, it might not be a finished product. Next, you might create a branch called
develop to track less stable improvements to the software. Individual features will spawn their own branches off of
develop and be merged back into
develop as they are completed. When the team is sufficiently sure that the code on
develop is ready for release, it can be pushed to
The default branch on many repos you see will be called
master. In a more recent attempt to avoid slave/master metaphors in computer science, Github and many others have encouraged people to use the name
In order to make the most of this paradigm for branching, it is a good idea to set milestones which lists a set of features and fixes that are required before the next release. Features that do not contribute to the next milestone are left un-merged
When you perform a
merge, you are taking commits from another branch and adding them to the branch you are currently on. For example, if you want the branch
mainto include changes made in
my-cool-feature-branch, you would have to first checkout
mainand then merge
Creating branches —
branch allows you to create a new branch from the commit you are currently working on. This does not create any new commits. Instead, you just move your
HEAD pointer to the existing commit. Once you've created the new branch, you will need to switch to the branch by performing a
checkout --branch to switch your
HEAD pointer to the tip of the branch and make sure any future commits get added there.
checkout takes a commit reference and sets your working directory state to that commit.
checkout --branch can be used to switch to a different branch.
If you checkout a specific commit instead of a branch, you might get a scary looking message informing you that you are in a
detached HEAD state. This is a good guide explaining how to handle this.
When pulling or merging changes from other branches, from time to time you will come across conflicts. These occur when two commits change the same line in a file. Merge conflicts are one of the most panic-inducing events associated with git. It is important to remember that all your work is safe as long as you’ve been committing regularly.
In the event that a segment of code has a conflict during a merge, that segment will be replaced with both versions of the code with in-text annotations to indicate the conflict. Below is an example of a conflict when merging
main. Note the line of
='s separating the two alternatives.
This line does not have a conflict.
this line has a conflict.
This line is a conflict.
If we were to look at the state of
main before the merge began, we would see this:
This line does not have a conflict.
this line has a conflict.
Similarly, if we were to look at the state of
other-branch before the merge began, we would see this:
This line does not have a conflict.
This line is a conflict.
In order to resolve the conflict, you need to edit each file with a conflict in it and remove the conflict markers. Each conflict in delimited by three lines:
The text between lines
b are the contents of the
main branch and the text between
c are the contents
other-branch. Most times, you'll just choose either the text in
main or the text that was in
other-branch. Occasionally, you might have to choose lines from both or even add additional lines. In some cases you might want to keep some or all of both alternatives.
Once you’re done with the edits, make sure to test your code, commit the changes, and you’re all done! Remember, there might be more than one conflict in a merge.
You can usually find all the conflicts by searching for
<<<<< in your code base - it is a fairly uncommon string in code!
Pull requests are a special feature provided by some git-hosting services (e.g. github, gitlab) which help with the latent potential chaos surrounding
merge. Teams will often write-protect shared branches like
development because introducing errors to those branches can be disastrous to the whole team. In such situations, a pull-request might require the approval of multiple team members or of product managers. A pull request allows the team to discuss commits that are requested to be merged into a branch, test the resulting merged code, and to add additional commits to fix any issues, all before the merge occurs.
Occasionally, you’ll discover that a commit introduces errors or the approach you’ve been taking to solving a problem is far more trouble than it is worth. In situations like this, you might want to undo or discard some changes you’ve made.
stash is a very useful command which takes any uncommitted changes you've made to the working directory and stores them safely. You can later
stash pop to return the changes themselves to the working directory. Note that if you move some changes to your
stash and then make conflicting changes, then you will have to resolve the conflicts. After you've performed a
stash to store some changes, you can
merge as necessary before using
stash pop to return your changes to the working directory.
stash is a versatile command that allows you to save multiple uncommitted chunks as
WIP or work-in-progress. You can navigate the list of stashed modifications using
stash list and select a specific one to restore using
stash apply. You can even give you stashed modifications descriptive commit messages.
UNDO! UNDO!! —
Since each commit in git is a ‘change’, the natural way to undo things is to make the opposite change.
revert will construct the inverse of a specific revision and commit it. This is great for two reasons. First, you haven't altered the commit history - just added to it. Second, you can target a specific commit from several days ago without undoing all the work you've done since!
Some strategies to avoid panic.
Take the extra few minutes to write a complete commit message. Your teammates and future self will thank you. Writing good commit messages is a practiced art. An ideal commit message has a short one line header and then a body that describes the the changes in more detail. Consider going over all the files you’ve changed in the commit and leaving a short summary of what was changed and why.
Work on your own branch
Merging your code with someone else’s code is one of the more stressful processes when it comes to working in a team. While it is important to make sure your changes remain compatible with your teammates’ changes, you might want to avoid integrating partially completed features. By working on separate branches, you avoid issues surrounding merging code until actually necessary. Many teams will perform integration tests on a regular basis to ensure that teammates don’t diverge too far. You will still have to perform the merges from time to time, but at least you can minimize the frequency and complexity of unexpected merge conflicts and bugs that manifest from combining different bodies of code.
Test before you merge
Having a number of commits between working versions of code is not usually a problem, however, it is a good idea to test your code thoroughly before pushing or merging on a shared branch.
The best way to deal with conflicts is to have as few of them as possible. Working on separate branches reduces the frequency with which you’ll have to deal with conflicts, but that might open the door to much larger conflicts cropping up down the line. Conflicts will usually only occur when two people are editing the same segment of code. If this is happening, it probably means that two people are working on very closely related features! The best way to avoid conflicts in this case is to communicate with your team before hand and let everyone know what you’re working on and collaborate proactively instead of reactively.
Help, I need to remove a file from the repository!
git rm will untrack a file that you have committed to the repository and delete the file from your disk. If you don't want to delete the file altogether, you can use the
git rm --cached option. The file will be removed from the repo in your next commit but a copy will remain in your working directory. This option is great if you accidentally commit a file containing, say, private notes that don't belong in the team's repository.
Help, I staged a file for commit but I don’t want to commit it!
reset HEAD [filename] will reset the index. Your local changes will still be on disk, but the file won't be committed.