Bioinformatics 生物信息学理论和实践唐继军 13928761660.

Bioinformatics 生物信息学理论和实践唐继军 jtang@cse.sc.edu 13928761660

#!/usr/bin/perl $DNA = 'ACGT'; # Next, we print the DNA onto the screen print $DNA, "\n"; print '$DNA\n'; print "$DNA\n"; exit;

Do the Math (your 2nd Perl program) #!/usr/bin/perl print " 4+5\n " ; print 4+5, " \n " ; print " 4+5= ", 4+5, " \n " ; [Note: use commas to separate multiple items in a print statement, whitespace is ignored]

Strings (text) in variables can be used for some math-like operations Concatenate (join) use the dot. operator $seq1= " ACTG " ; $seq2= " GGCTA " ; $seq3= $seq1. $seq2; print $seq3; ACTGGGCTA String Operations

#!/usr/bin/perl –w $DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC'; print "Here is the starting DNA:\n\n"; print "$DNA\n\n"; # Transcribe the DNA to RNA by substituting all T's with U's. $RNA = $DNA; $RNA =~ s/T/U/g; # Print the RNA onto the screen print "Here is the result of transcribing the DNA to RNA:\n\n"; print "$RNA\n"; # Exit the program. exit;

#!/usr/bin/perl -w $DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC'; print "$DNA\n\n"; $revcom = reverse $DNA; # See the text for a discussion of tr/// $revcom =~ tr/ACGTacgt/TGCAtgca/; # Print the reverse complement DNA onto the screen print "Here is the reverse complement DNA:\n\n"; print "$revcom\n"; exit;

#!/usr/bin/perl -w $proteinfilename = 'NM_021964fragment.pep'; open(PROTEINFILE, $proteinfilename); # First line $protein = ; print “\nHere is the first line of the protein file:\n\n”; print $protein; # Second line $protein = ; print “\nHere is the second line of the protein file:\n\n”; print $protein; # Third line $protein = ; print “\nHere is the third line of the protein file:\n\n”; print $protein; close PROTEINFILE; exit;

#!/usr/bin/perl -w # The filename of the file containing the protein sequence data $proteinfilename = 'NM_021964fragment.pep'; # First we have to "open" the file open(PROTEINFILE, $proteinfilename); # Read the protein sequence data from the file, and store it # into the array variable @protein @protein = ; # Print the protein onto the screen print @protein; # Close the file. close PROTEINFILE; exit;

#!/usr/bin/perl -w # "scalar context" and "list context" @bases = ('A', 'C', 'G', 'T'); print "@bases\n"; $a = @bases; print $a, "\n"; ($a) = @bases; print $a, "\n"; exit;

#!/usr/bin/perl -w # array indexing @bases = ('A', 'C', 'G', 'T'); print "@bases\n"; print $bases[0], "\n"; print $bases[1], "\n"; print $bases[2], "\n"; print $bases[3], "\n"; exit;

String functions Chomp Length of a string Substring

#!/usr/bin/perl -w $proteinfilename = 'NM_021964fragment.pep'; unless ( open(PROTEINFILE, $proteinfilename) ) { print "Could not open file $proteinfilename!\n"; exit; } while( $protein = ) { print " #####Here is the next line of the file:\n"; print $protein; } # Close the file. close PROTEINFILE; exit;

Comparison String comparison (are they the same, > or <) eq (equal ) ne (not equal ) ge (greater or equal ) gt (greater than ) lt (less than ) le (less or equal )

Conditions if () {} elsif() {} else {}

#!/usr/bin/perl –w $word = 'MNIDDKL'; if($word eq 'QSTVSGE') { print "QSTVSGE\n"; } elsif($word eq 'MRQQDMISHDEL') { print "MRQQDMISHDEL\n"; } elsif ( $word eq 'MNIDDKL' ) { print "MNIDDKL-the magic word!\n"; } else { print "Is \”$word\“ a peptide?\n"; } exit;

More Conditions

$x = 10; $y = -20; if ($x <= 10) { print "1st true\n";} if ($x > 10) {print "2nd true\n";} if ($x -21) {print "3rd true\n";} if ($x > 5 && $y < 0) {print "4th true\n";} if (($x > 5 && $y 5) {print "5th true\n";}

But Use ==,, >=, !=, ||, && for numeric numbers Use eq, lt, le, gt, ge, ne, or, and for string comparisons

$x = 10; $y = -20; if ($x le 10) { print "1st true\n";} if ($x gt 5) {print "2nd true\n";} if ($x le 10 || $y gt -21) {print "3rd true\n";} if ($x gt 5 && $y lt 0) {print "4th true\n";} if (($x gt 5 && $y lt 0) || $y gt 5) {print "5th true\n";}

#!/usr/bin/perl -w $num = 1234; $str = '1234'; print $num, " ", $str, "\n"; $num_or_str = $num + $str; print $num_or_str, "\n"; $num_or_str = $num. $str; print $num_or_str, "\n"; exit;

More Arithmatics +, -, *, **, /, % +=, -=, *=, **=, /=, %= ++, --

$x = 10; $x = $x*1.5; print $x*=3, "\n"; print $x++, "\n"; print $x, "\n"; print ++$x, "\n"; print $x, "\n"; print $x % 3, "\n"; print $x**2, "\n";

Motif finding Read a sequence from a fasta file Ask the user to input a motif Check if the sequence has the motif

#!/usr/bin/perl –w print "Please type the filename: "; $fname = ; chomp $fname; open(PROTEINFILE, $fname); $name = ; @protein = ; close PROTEINFILE; $protein = join( '', @protein); $protein =~ s/\s//g; print "Enter a motif to search for: "; $motif = ; chomp $motif; if ( $protein =~ /$motif/ ) { print "I found it!\n\n"; } else { print "I couldn\‘t find it.\n\n"; }

#!/usr/bin/perl -w print "Please type the filename of the DNA sequence data: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = ; @DNA = ; close DNAFILE; $DNA = join('', @DNA); $DNA =~ s/\s//g; $count_of_CG = 0; $position = 0; while ( $position < length $DNA) { $base = substr($DNA, $position, 1); if ( $base eq 'C' or $base eq 'G') { ++$count_of_CG; } $position++; } print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n";

#!/usr/bin/perl –w print "Please type the filename of the DNA sequence data: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = ; @DNA = ; close DNAFILE; $DNA = join('', @DNA); $DNA =~ s/\s//g; $count_of_CG = 0; for ( $position = 0 ; $position < length $DNA ; ++$position ) { $base = substr($DNA, $position, 1); if ( $base eq 'C' or $base eq 'G') { ++$count_of_CG; } print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n";

#!/usr/bin/perl –w print "Please type the filename of the DNA sequence data: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = ; @DNA = ; close DNAFILE; $DNA = join('', @DNA); $DNA =~ s/\s//g; $count_of_CG = 0; while($DNA =~ /c/ig) {$count_of_CG++;} while($DNA =~ /g/ig) {$count_of_CG++;} print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n";

#!/usr/bin/perl –w print "Please type the filename of the DNA sequence data: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = ; @DNA = ; close DNAFILE; $DNA = join('', @DNA); $DNA =~ s/\s//g; $count_of_CG = 0; while($DNA =~ /c/i) {$count_of_CG++;} while($DNA =~ /g/ig) {$count_of_CG++;} print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n";

Exercise 1 Ask for a protein file in fasta format Ask for an amino acid Count the frequency of that amino acid TKFHSNAHFYDCWRMLQYQLDMRCMRAISTF SPHCGMEHMPDQTHNQGEMCKPRMWQVS MNQSCNHTPPFRKTYVEWDYMAKALIAPYTL GWLASTCFIW

Exercise 2 Ask for a DNA file in fasta format Ask for a codon Count the frequency of that codon TCGTACTTAGAAATGAGGGTCCGCTTTTGCCC ACGCACCTGATCGCTCCTCGTTTGCTTTTAAG AACCGGACGAACCACAGAGCATAAGGAGAA CCTCTAGCTGCTTTACAAAGTACTGGTTCCCT TTCCAGCGGGATGCTTTATCTAAACGCAATGA GAGAGGTATTCCTCAGGCCACATCGCTTCCTA GTTCCGCTGGGATCCATCGTTGGCGGCCGAA GCCGCCATTCCATAGTGAGTTCTTCGTCTGTG TCATTCTGTGCCAGATCGTCTGGCAAATAGCC GATCCAGTTTATCTCTCGAAACTATAGTCGTA CAGATCGAAATCTTAAGTCAAATCACGCGACT AGACTCAGCTCTATTTTAGTGGTCATGGGTTT TGGTCCCCCCGAGCGGTGCAACCGATTAGGA CCATGTAGAACATTAGTTATAAGTCTTCTTTTA AACACAATCTTCCTGCTCAGTGGTACATGGTT ATCGTTATTGCTAGCCAGCCTGATAAGTAACA CCACCACTGCGACCCTAATGCGCCCTTTCCAC GAACACAGGGCTGTCCGATCCTATATTACGA CTCCGGGAAGGGGTTCGCAAGTCGCACCCTA AACGATGTTGAAGGCTCAGGATGTACACGCA CTAGTACAATACATACGTGTTCCGGCTCTTAT CCTGCATCGGAAGCTCAATCATGCATCGCACC AGCGTGTTCGTGTCATCTAGGAGGGGCGCGT AGGATAAATAATTCAATTAAGATATCGTTATG CTAGTATACGCCTACCCGTCACCGGCCAACAG TGTGCAGATGGCGCCACGAGTTACTGGCCCT GATTTCTCCGCTTCTAATACCGCACACTGGGC AATACGAGCTCAAGCCAGTCTCGCAGTAACG CTCATCAGCTAACGAAAGAGTTAGAGGCTCG CTAAATCGCACTGTCGGGGTCCCTTGGGTATT TTACACTAGCGTCAGGTAGGCTAGCATGTGT CTTTCCTTCCAGGGGTATG

Bioinformatics 生物信息学理论和实践唐继军 13928761660.

Similar presentations

Presentation on theme: "Bioinformatics 生物信息学理论和实践唐继军 13928761660."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bioinformatics 生物信息学理论和实践 唐继军 13928761660.

Similar presentations

Presentation on theme: "Bioinformatics 生物信息学理论和实践 唐继军 13928761660."— Presentation transcript:

Similar presentations

About project

Feedback

Bioinformatics 生物信息学理论和实践唐继军 13928761660.

Presentation on theme: "Bioinformatics 生物信息学理论和实践唐继军 13928761660."— Presentation transcript: