Thursday, November 25, 2010

How to track your Joomla! project with Git

If you’ve ever worked on an existing website, chances are you’ve run into a directory listing like the following:

Copy of index.html
about.html

contact.html
favicon.ico
index.html
index.html.bak
index.html.bak2
index.html.old
pricing.html

It’s also quite possible you are responsible for having created a mess like this. We’re always told to make backups of our files, and so we make them, often right next to the files of a live site. While it’s a good idea to make a backup of your code before changing something that already works, .bak, .old, and .other files can accumulate very quickly.

Made-up extensions like .bak tell us (let alone others) very little about the significance of each change. It would be nice if there were some way of keeping a history of every change made to a file. Better still, tracking who made each change would be useful. And a way of combining changes from two different copies of the same file would be fantastic.

Fortunately, such systems already exist.

Introducing Version Control

A Version Control System VCS is designed to keep a complete history of your project’s code. There’s no need to copy the same file over and over again, as the VCS will keep it for you. As you make significant changes to a file, you tell the VCS to take a snapshot of it. This snapshot is called a commit, which you can compare against later. When you commit a file or set of files, the changes since the last commit are stored in a repository managed by the VCS. After sending a commit to the repository, you can make any changes you want to the file. If the changes aren’t what you wanted, you can tell the VCS to pull out a copy of your code the way it was at the last commit.

All major software projects (including Joomla!) use VCS software to manage the code being written between releases. These systems are crucial for larger projects as they make it possible for dozens and even hundreds of people to be working on the same code at the same time. The VCS helps programmers merge changes so that each file reflects everyone’s code.

Despite the design for team use, you can still benefit from the history log a VCS provides even if you’re working by yourself. In any instance where you are writing code, a VCS can be used to help you keep track of your changes.
Distributed vs. Centralized

VCS software can be broadly categorized as either distributed or centralized. With a centralized VCS, every commit requires a connection back to the central server where the repository is hosted. This guarantees every commit is on the server. The downside is that you must have a connection to that server to make a commit. If you lose your Internet connection, you typically lose quite a bit of the functionality of a centralized VCS.

In contrast, distributed VCS software is designed to keep a full repository on your computer. You are able to make commits whenever you want, but the commits stay on your computer until you push them to another repository. This gives you the flexibility of making as many commits as you want, regardless if whether or not you have an Internet connection. You can push your changes to another repository when you want to share them, or you can keep them completely private.

Joomla and WordPress currently use centralized version control systems to manage the core code, but many other open source projects (such as Ruby on Rails) have switched from centralized VCS software to distributed ones. Joomla and WordPress currently use Subversion (often abbreviated SVN), while Ruby on Rails is using Git. Drupal is also moving to Git from a centralized VCS called Concurrent Versions System (CVS). Distributed VCS software is gaining in popularity, but centralized systems have been around longer. Consequently, with more tools are currently available for centralized VCS software. However, many tools for distributed systems have been released over the past year.
Setting up a Git repository

So if you’re starting on a new client project today, how can a VCS be of most use and least intrusion to you? If you’re the only person writing code for the project, distributed VCS software will most likely serve you best. A distributed VCS is easy to install and set up. If you haven’t already, go download a copy of Git. Once you’ve downloaded and installed Git, you’ll be ready to start creating new repositories.

To get a feel for working with Git, let’s start with an example project. For this example, we’re tracking the changes to a single text file. Before we make any commits, we need to first create a Git repository where they’ll be held. To set up the repository, we must first create a directory for the project. This can be done from the command line:


$ mkdir test
$ cd test

Once the directory is ready to go, type in the following command:
$ git init

After typing this command, a hidden .git directory will be present in the folder. This is the repository where the commits will be stored. Don’t touch this folder or anything inside of it; let Git manage the contents.

If we already had files in this directory, we would be able to add them and commit right away. Since we’re working through an example with one file, let’s add a file named test, with some text to track:
This is a test file with text being tracked by Git

After saving the test file, go back to the command line and type git status:
$ git status
# On branch master
#
# Initial commit
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# test
nothing added to commit but untracked files present (use "git add" to track)

At this point, Git sees our new file, but tells us that it’s not being tracked yet. To track it, type git add test to add the file or git add . to add everything in the folder. Once this is done, another call to git status will look like this:
$ git add test
$ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
# (use "git rm --cached <file>..." to unstage)
#
# new file: test
#

Now that the file has been added to Git, we can record our changes to the file, along with a note about what the changes were. Go to the command line again and type git commit -m “First commit, added test message”. If -m is left off, Git will automatically pull up a text editor where a message can be entered. Your screen will look similar to this one:
$ git commit -m "First commit, added test message"
[master (root-commit) 2a96325] First commit, added test message
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 100644 test

Special note: the strings of random letters and numbers seen in these simulations will be different on your computer. They are hashes that uniquely identify each commit. This way, Git can tell them apart when you want to share them with others.

We just created a commit. A commit consists of a set of changes to one or more files, a message describing what the changes were, and a unique hash used to identify the commit. If we type git log, we see a record of all the commits in our Git repository:
$ git log
commit 2a96325aed31d954d91e764702578cbe307c9c74
Author: Joe LeBlanc <contact@jlleblanc.com>
Date: Wed Apr 28 14:34:26 2010 -0400

First commit, added test message

Let’s make some changes to the test file. Add the following lines just after the first one

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.


After saving this file, we can go back to the command line, type git status, and see that Git noticed the changes we made:

$ git status
# On branch master
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified: test
#

To make a record of these changes, we need to make another commit. First, add the file to the commit:
$ git add test

Then record the commit with a message:
$ git commit -m "Added Lorem ipsum text"
[master f9e2d30] Added Lorem ipsum text
1 files changed, 5 insertions(+), 1 deletions(-)
rewrite test (100%)

Now that there are two commits in the Git repository, you can look through the log of the commits to see the history of the project. Typing git log will bring up this history:
$ git log
commit f9e2d30ab60d8ba8b7316217bc25416e8c74b2bd
Author: Joe LeBlanc <contact@jlleblanc.com>
Date: Fri May 7 10:43:59 2010 -0400

Added Lorem ipsum text

commit 2a96325aed31d954d91e764702578cbe307c9c74
Author: Joe LeBlanc <contact@jlleblanc.com>
Date: Wed Apr 28 14:34:26 2010 -0400

First commit, added test message

Right now, the test file should look like this:

$ more test
This is a test file with text being tracked by Git
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
 

To go back to the version of the file that only has the first line, use git checkout, followed by the hash of the commit to be retrieved:

$ git checkout 2a96325aed31d954d91e764702578cbe307c9c74
Note: moving to "2a96325aed31d954d91e764702578cbe307c9c74" which isn't a local branch
If you want to create a new branch from this checkout, you may do so
(now or later) by using -b with the checkout command again. Example:
git checkout -b <new_branch_name>
HEAD is now at 2a96325... First commit, added test message

$ more test
This is a test file with text being tracked by Git

When we’re done looking at the old version, we can get everything back to the latest copy by checking out master:
$ git checkout master
$ more test

This is a test file with text being tracked by Git
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
 Coding project vs. site management

Getting a Git repository set up for a single folder on a brand new project is quick. However, when you’re working with Joomla, there are some things you’ll want to consider before calling git init. The biggest issue you run into with a Joomla-based project is the fact that Joomla itself is often larger than the code you’re adding.

There are two general strategies you can employ for using Git with Joomla: either track the entire Joomla installation or track a specific extension. The major deterrent to tracking Joomla itself with Git is the sheer size of the Joomla codebase. While Git is reasonably fast, it can still be overkill to track an entire Joomla installation if you’re adding a single template or module.

On the other hand, putting everything into Git makes it possible to determine when patches were applied to the Joomla site. This can be helpful when you’re trying to trace an issue to a specific patch. Also, if you’re creating a number of extensions that are all designed to work together, you may have no choice but to place the entire site under version control.

If you’re working on a single extension and you know it’s the only one that will be a part of a project, it may be more advantageous to track a single directory. A single frontend only component, backend only component, module, or template are all candidates for being tracked separately from a Joomla installation. It’s difficult to track plugins in this way, due to the fact that plugin .php files are placed side-by-side in shared folders. Tracking complete components this way is also problematic, as changes in the backend can affect behavior in the frontend.

To use Git to track a single extension, simply call git init while inside of the extensions folder and start using Git as you would on any other project.
Nothing to see here, move along

If you need to track multiple extensions at a time, it will be more advantageous to track the entire Joomla installation. However, when setting up your Git repository, there are a few folders you don’t want to track. For instance, the contents of the tmp and cache folders will contain files that are subject to change as you use the site, rather than as a result of deliberate code changes. Tracking the changes in these folders is counterproductive as they contain temporary files. You can create a .gitignore file containing a list of all the files and folders you don’t want to place under version control. This file is placed next to the .git folder in your project’s root directory. A minimal .gitignore file tailored for Joomla might look like this:
administrator/cache
cache
logs
tmp

Notice that by specifying administrator/cache, you can ignore the cache folder for the admin portion of your Joomla site without ignoring the entire administrator folder. Depending on your needs, you may want to add other files and folders to this list. Regardless of the contents, it’s best to get .gitignore in place before putting other files into the project.

Also, you can also craft your .gitignore file to ignore all of the standard files and folders that come with Joomla. This can be done if you want to only track the extensions you’re creating. If you don’t want to take the time to create this file, you can download it here.
Considerations

One of the big drawbacks with Git and other version control systems is that they only track files on the filesystem. Your Joomla site depends on the database to store articles, menu items, modules, and other configurations. Unfortunately, these won’t get stored in Git automatically. The only way to get this data is to place SQL code in a file within your project. There are a couple of approaches you can take with this.

If the specific configuration of the site is important, you can periodically use mysqldump or phpMyAdmin to get a snapshot of your database in a .sql file. You can then place this file in your project and create a commit for each snapshot you create. This makes it possible for you to restore your entire site to any state where you have a commit. While you might be tempted to use this as your backup method, it’s best to have a dedicated process take care of your backups. If you do use this method, use it in addition to regular database backups, rather than as your backup.

If you are creating a reusable component and you’re not worried about the site configuration, another option is to keep track of your component’s schema and data. Start off with your initial schema in administrator/components/com_yourcomponent/install.mysql.sql. As you make changes to the table, add the necessary ALTER TABLE statements to the end of the install.mysql.sql file. You can also do with with your INSERT statements, as well as additional CREATE TABLE statements. A common practice is to add a comment with the date in your SQL file before each set of database changes (using # at the beginning of the line). This way, if you’re sharing your repository with others, they can use the statements to get their database “caught up” rather than dumping what they have and starting over. This is particularly useful when you have a changing schema and don’t really care about sharing changes to data in the table.

Finally, if you have your Joomla installation and Git repository hosted on a live site, make certain that your web server is not serving hidden files. You can check this by going to http://www.yoursite.com/.git. If you get a 404 or 403 HTTP code back, your web server is configured correctly. If you see the directory structure of the .git folder, be sure to get your server reconfigured so that it does not show these files.
Conclusion

We’ve only scratched the surface of using Git software. If you want to learn more about branching, merging, sharing code with others, and making the most of Git, there are tutorials and reference guides available:

The Git Parable

Git Community Book

Git - SVN Crash Course

The considerations of using Joomla with Git also apply to systems like SVN and CVS: you need to consider ahead of time what you’ll want to track and what you’ll want to ignore. Despite the need to preplan, using a version control system like Git can help you save a lot of time later when you need to look up the history of your project.


No comments:

Post a Comment