Beginning BioPerl for Biologists MPI Ploen Jun Wang
Perl modules Collections of “Object” definitions and functions Usage: – Put in Perl module environmental path – Or, use “use lib ”
1, BioPerl and BioSeqIO Begin! – Use “Data::Dumper” – Open a sequence file by Bio::SeqIO – Check data with dumper – Write another sequence file use SeqIO Bio::Seq object – Functions related to Bio::Seq
2, do something (Blast) Use Bio::Tools::Run::RemoteBlast – Modify the usage from the perl module – Get a hit-list and print scores, etc Get the sequences by sequence accession numbers – use Bio::DB::GenBank; – Get sequences – Save them
2, do something more (clustalw) use Bio::Tools::Run::Alignment::Clustalw – Read all sequences you get from blast – Do an alignment and save the alignment – Make a guidance tree
3, sequence annotations (features) Also save a genbank files for blast result – Get information for the sequences Bio::SeqFeatureI – Also try to make a feature (add annotations to a raw sequences ) Bio::SeqFeature::Generic
7, primary Bio::Fastq Get a Fastq file (e.g. through ftp.ebi.ac.uk/)ftp.ebi.ac.uk/ – Get standard sequence information – Understand quality scores and do filtering – Check MID and assign groups (maybe on raw data) Some recent updates in processing NGS data – call programs using either module or system – Parse results from each one and connect to form a pipeline