Next Generation Sequencing (NGS)/Ray
Contents
[edit | edit source]A basic knowledge of the UNIX command line is assumed.
In this tutorial, Ray will be installed in $HOME/software using its source code downloaded to $HOME/sources. A dataset will be downloaded to $HOME/datasets and it will be assembled de novo with Ray in $HOME/projects
Installing Ray
[edit | edit source]The first thing to do is to download the Ray tarball that contains its source code.
mkdir -p $HOME/sources cd $HOME/sources wget http://downloads.sourceforge.net/project/denovoassembler/Ray-v2.1.0.tar.bz2 tar -xjf Ray-v2.1.0.tar.bz2
A MPI library is required to install Ray. On Ubuntu or Debian, the package names are: openmpi-bin, libopenmpi-dev, make, g++.
Optionally, native support for compressed files can be included in Ray. This requires zlib and/or libbz2. On Ubuntu or Debian, the package names are: zlib1g-dev libbz2-dev.
With MPI installed, Ray can now be installed:
mkdir -p $HOME/software/ray cd $HOME/sources/Ray-v2.1.0 make HAVE_LIBZ=y HAVE_LIBBZ2=y PREFIX=$HOME/software/ray/2.1.0 make install
Obtaining data
[edit | edit source]The commands below fetch E. coli data.
mkdir -p $HOME/datasets/SRA001125 cd $HOME/datasets/SRA001125
wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA001/SRA001125/SRX000429/SRR001665_1.fastq.bz2 wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA001/SRA001125/SRX000429/SRR001665_2.fastq.bz2 wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA001/SRA001125/SRX000430/SRR001666_1.fastq.bz2 wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA001/SRA001125/SRX000430/SRR001666_2.fastq.bz2
Running Ray
[edit | edit source]It is a good habit to create a directory for each project. A directory will therefore be created for this tutorial.
mkdir -p $HOME/projects/Ray-tutorial cd $HOME/projects/Ray-tutorial
Next, we create symbolic links to the data files so that long paths are not required.
ln -s $HOME/datasets/SRA001125/SRR001665_1.fastq.bz2 ln -s $HOME/datasets/SRA001125/SRR001665_2.fastq.bz2 ln -s $HOME/datasets/SRA001125/SRR001666_1.fastq.bz2 ln -s $HOME/datasets/SRA001125/SRR001666_2.fastq.bz2
An arbitrary number of Ray processes can be launched. In this example, 4 Ray processes are launched. These processes can be on several computers or on a single computer.
mpiexec -n 4 $HOME/software/ray/2.1.0/Ray \ -k 21 -o EcoliAssembly \ -p SRR001665_1.fastq.bz2 SRR001665_2.fastq.bz2 \ -p SRR001666_1.fastq.bz2 SRR001666_1.fastq.bz2 \
The -k parameter sets the length of k-mers.
Assessing the assembly
[edit | edit source]Ray writes files to a single directory. Ray does several automated quality control tests.
You can list the produced files with:
ls EcoliAssembly
The important files are these:
less EcoliAssembly/OutputNumbers.txt less EcoliAssembly/Contigs.fasta less EcoliAssembly/Scaffolds.fasta less EcoliAssembly/CoverageDistribution.txt less EcoliAssembly/LibraryStatistics.txt