telnet: pinehost.sci.ccny.cuny.edu
We all are going to use the same account and password this is NOT the way to do things in the real world
login: rise
Password: ris3111
Once you are logged in, the first thing to do is create a folder (directory) for yourself
mkdir yourname
Move into your new directory and look around
cd yourname (change to directory yourname)
ls (list files)
You will get nothing, because your new directory is empty.
Just to get a feel for UNIX, make a dummy file like this:
echo abc > test1 (writes abc into a file called test1)
ls (list files)
more test1 (write the contents of test1 onto the screen)
Now, we are going to analyze a set of DNA sequences that I loaded into a folder called Seqs
Move over there and have a look around
cd .. (this moves you up one level)
ls (list the files and folders here)
cd Seqs (move into directory Seqs)
ls (list the files in Seqs)
and look at one of the files
more dros1.seq (write dros1.seq to the screen)Ok, now it is time to learn something about a program called fasta We have a copy of this program in the directory Fasta fasta compares sequences one to another or one to a whole set. You can try it out like this (you have to type the whole path to the program) But first, use cd to move back to your own directory - if you get lost type pwd, that shows where you are in the file system.
../Fasta/fasta
Fasta will ask you for the name of a test sequence - use ../Seqs/dros1.seq
more dros1.fasta
It would be useful if we could automate the fasta program so that it could compare a whole bunch of sequences to a database or to each other. But in order to do this, you will have to learn something about writing UNIX programs shell scripts to be specific.
First, you need to understand the concept of a loop: A program repeats some
set of actions a specific number of times or until some goal is reached. In this case, we want to repeat the fasta search for every sequence in the Seqs
directory.
Also, we need to make a special file that contains all of the sequences that we want to search against. I have already done this; it is called seq.lib
Here is how to make a loop using the Foreach command:
1) csh
2) Foreach x (../Seqs/*.seq)
3) ? ../Fasta/fasta q n $x seq.lib > $x.fasta
4) ? end
What this does is:
3) go back and repeat the loop until all the files have been run through the fasta program
One more point. We have been typing in commands on the command line. Once you are done, your commands are lost. If you make a mistake, you have to start all over. You can write these commands into a text file - which is then called a shell script. Then you can modify and re-use your program whenever you want to.
Now , we have to figure out how to get the results out of all of these .fasta files. There is a UNIX utility called "grep" that can find a word in a file and display the lines containing that word. We can make another loop that uses grep to scan all of the fasta results files and writes a summary of the matches to one new summary file.
Foreach x (*.fasta)
? grep '>>' $x |cat fasta.log
? end
That should give you a file called fasta.log that contains all of the matches
for each sequence. Now you can easily see what matches what.
This approach could be really useful if you were starting with thousands of
sequences rather than just 16.