Linux (and OSX) commands for working with FASTA files

02 Sep 2009
3 Comments »

When working with the genomics or molecular marker side of BIoinformatics, Bioinformaticians are faced (very) often with DNA (or RNA) sequence files in FASTA format. FASTA files can store a single sequence, or multiple sequences. To be able to access individual sequences or measure some metric of the sequence data, such as length, some form of manipulation of the files is usually required.

This post gives a set of unix commands to perform some common manipulations required of FASTA files. These commands should work using the BASH shell under most popular distributions of linux (I use Ubuntu and CentOS). They will also work in the OSX terminal and *SHOULD* work in Windows using software such as Cygwin.

The manipulations and metric measurements I cover in this article deal with:

  1. Splitting a FASTA file of multiple sequences into FASTA files of individual sequences
  2. Joining multiple FASTA files into a single, multi-sequence FASTA file
  3. List the sequence headers in a FASTA file
  4. Counting the number of sequence entities in a FASTA file
  5. Determining the length of the sequence in a FASTA file

BEFORE WE BEGIN
Where you see or (or a similar name in angled brackets), replace this with your input file of choice or the name of the output file you wish to create respectively.

The FASTA Format

sourced from http://www.nmpdr.org/

To give a super brief description, FASTA format was the ASCII file format used for sequence information for the application of the same name. Some time in bioinformatics world passed and now FASTA formatted files are used by a variety of Bioinformatics packages and is the de facto standard for storing sequence information in text files.

The FASTA format itself is very simple: A file can consist of one or more sequence elements, each headed by a free text header starting with the chevron ‘>’ character and ending with a newline ‘\n’ character.

e.g. for DNA sequence:

>sequence 1
ACCGTACGATACGATCGCATCGCTGACTCG
ACTTACGACGACGCANNNNACATCGATCGA
ACACTCAGCA
>sequence 2
CACGCATTATCATCGATCCTCAGCTCATCGA
ATACGTACCACAACTCGCATCTCAGTCAGAC
ACTCGTACGCTACGTACGCATGCATCAGATC
ATCCTATGCATGCATCGTACGCTAGACTCGA
ATCGATCGCATGCATACGTACGCAT

NOTE: The sequence itself may have newline characters throughout the sequence – these should be
stripped when using the sequence data.

Splitting a FASTA file of multiple sequences into FASTA files of individual sequences

This command will create as many files as there are member sequences in the same directory as the source file,
incrementally numbered with a .fasta extension. (e.g. for an input file with 5 member sequences, such as the Arabidopsis genome, it will output files 1.fasta to 5.fasta.

awk '/^>/{f=++d".fasta"} {print > f}' <inputFile>

Joining multiple FASTA files into a single, multi-sequence FASTA file

This is the reverse of the above and we will assume a few things. Firstly, you want to combine all fasta files in the current directory and, secondly, they all have the same extension (.fasta). Adapt to your needs if this is not the case!

cat *.fasta > <outputFile>

List the sequence headers in a FASTA file

grep ">" <inputFile>

Counting the number of sequence entities in a FASTA file

grep ">" <inputFile> | wc -l

Determining the length of the sequence in a FASTA file

This method will give the TOTAL sequence length of a FASTA file. This means that if your FASTA file has a number of sequence entries, it will return the sum of the length of each sequence entry. To get the length of individual entries you would first need to split the file into individual entries, or do it programatically: either using a homegrown method or a Bioinformatics library such as BioPerl.

grep -v ">" <inputFile> | tr -d [:space:] | wc -c

These are a few useful commands for performing some common and simple FASTA file manipulations without needing to resort to programatic methods. It may be worthwhile defining an alias or simple bashscript wrapper for the above commands, allowing you to type something like: fastaLength fastafile.fasta at the command line.

Make Firefox look and feel like Safari on OS X

03 Feb 2009
No Comments »

Firefox is a great browser and is becoming the browsing standard. It is cross-platform, standards-compliant, as extendable as Katee Sackhoff is gorgeous and as of version 3, pretty fast. I used to use Safari on the Mac back in the Firefox 2 days, as Firefox 2 was slow, prone to crashing, and ugly. Once Firefox 3 came out, I jumped at the opportunity to switch back to the Mozilla camp (in the interest of supporting cross-platform, open-source projects!).

The Firefox 3 interface for OSX is pretty good, but I’ve become accustom to some of the features of Safari 3, such as the minimal screen real estate, private browsing and resizable text fields. I’ve compiled a list of plugins and themes I used to get my Firefox working like the safari I knew!

For each addon I’ve linked to the homepage of the developer where possible, which is only fair with regards to true credit. If you would prefer the links for the official Firefox Addon Repository, they can be found at the end of the post in a table.

The Theme

The theme I found that best replicates the safari interface is Arronax’s GrApple Yummy (graphite) theme. There are four themes on the site, you may like on of the others better, but I felt this one was the truest recreation.
[GrApple Yummy (graphite)  -  http://www.takebacktheweb.org/]

Combined Progress/Address bar

The Fission addon moves the progress bar from the bottom status bar and combines it with the address bar. It is quite customisable and has a feature that I feel allows you to break apart from the status bar all together. I always kept the status bar on in Safari as I want to know where the link I click is heading, and the URL when mousing over a link is displayed in the status bar by default. With Fission, you can have the URL show in the address bar for the current link.
[Fission  -  http://mozilla.zeniko.ch/fission.html]

Private Browsing

This is by far one of the most important features of Safari. I think it is a mushave when visiting sites linked in any way to your financial details, such as internet banking, PayPal, eBay, etc.. The Distrust addon gives Firefox a similar ability. It will take note of when it is first turned on and when it is next turned off, deleting any private data recorded in the meantime including: passwords, history, cache, etc..
[Distrust  -  http://www.gness.com/distrust/]

Resizable Textarea

Another handy feature in Safari, this is great for resizing comment fields on blogs or online email contact forms to a size that is actually useable! The addon here is called Resizeable Textarea by Raik Jürgens. The resizing can be slightly finicky when trying to find the anchor for diagonal resizing, but it is still very useable.
[Resizeable Textarea - https://addons.mozilla.org/en-US/firefox/addon/3818]

Combined Stop/Reload button, and hide the main throbber.

I have combined these two features, as they require the same plugin! In Safari, when a page finishes loading, the Stop button gets replaced by the Reload button, and vice-versa. Also, each individual tab has a throbber icon that represents loading activity. Firefox also has a master throbber to the right of the address bar, which many feel is is redundant (and indeed missing from Safari). Stylish is an addon that allows you to add ‘modules’ that modify the CSS outlay of the Firefox interface.
[Stylish - http://userstyles.org/stylish/]

Once Stylish is installed (and firefox restarted) you can visit Stylish module pages to adjust the appearance and behaviour of Firefox.

Combined Stop/Reload button module:
http://userstyles.org/styles/10 – IMPORTANT, make sure you follow the instructions provided for this addon to work properly!

Hide Throbber module:
http://userstyles.org/styles/13762

Official Firefox Addon Links

GrApple Yummy (graphite): https://addons.mozilla.org/en-US/firefox/addon/7525
Fission: https://addons.mozilla.org/en-US/firefox/addon/1951
Distrust: https://addons.mozilla.org/en-US/firefox/addon/1559
Resizeable Textarea: https://addons.mozilla.org/en-US/firefox/addon/3818
Stylish: https://addons.mozilla.org/en-US/firefox/addon/2108
[Stylish modules: http://userstyles.org/styles/10 , http://userstyles.org/styles/13762]

Firefox made to look and feel life Safari (for OS X)

DONE!

This has given me an experience that is pretty similar to using Safari, but with some of the extra benefits of Firefox, I like the better support for tabbed-browsing in Firefox, and I really like the del.icio.us plugin. Let me know if you know of any other addons for Firefox that have helped make Firefox feel more OSX-like.

How to overburn a CD/DVD in Mac OS X

30 Jan 2009
8 Comments »

The Mac OS X Finder does not allow overburning of optical discs, nor does the Disk Utility application.

Overburning is the process of recording data past the normal size limit. Generally, an optical disc has a lead-out of approximately 10% of the stated disc capacity. Having recording software that supports overburning will allow a user to exploit this extra space.

Unfortunately, Finder and Disk Utility have built-in checks to ensure users don’t try to burn over the stated disc capacity. However, OS X ships with a command-line program called hdiutil that does not perform such a check and will allow users to overburn a disc.

Step #1, create an ISO image

Use hdiutil to make a temporary ISO image. The easiest way is to put all the files to burn in a directory, lets call ours overburn. Once you have done this, open the Terminal [Application/Utilities/Terminal]. Navigate to the parent directory of the temporary directory you just created (for example, if the absolute location of overburn is /Users/cduran/overburn, navigate to its parent directory by typing:

cd /Users/cduran

Then, in this directory run the hdiutil program to create the ISO image.

hdiutil makehybrid -o tempimage overburn/

This will make an ISO image called tempimage.iso in the parent directory.

Step #2, burn the ISO image to disc

This step will use hdiutil to burn the ISO image file you just created to your disc. to do this, type the following (remember to put a disc in the drive!):

hdiutil burn tempimage.iso

DONE!

That’s it, you’ve just overburnt you disc! Now that you have burnt the disc, you can get rid of the tempimage.iso file, and the contents of the overburn directory. The thing I like most about this method, is it doesn’t require the installation of any third-party software – less rubbish to bloat your harddrive with!