- Designed For
- Graduate students, undergraduates (with permission), internal and external faculty, staff, postdocs, and visiting scholars. Graduate-level standing is a pre-requisite for this course.
- Session 1: September 20 – October 9, 2017
Session 2: October 16 – November 3, 2017 Course meets Monday, Wednesday and Friday at 2:00PM in the Agriculture and Life Sciences Building (ALS) 3005
- Courses can be taken separately under exceptional circumstances. Please contact the program manager for more information
- Onsite in Corvallis, OR
- OSU Faculty, Staff or Students: $150
- Six weeks
- 2.0 Units │ 18 hours
Ask us about Corporate Training. We can customize this program to fit your organization's specific needs.
This two-part offering covers effective use of the Unix/Linux command-line environment:
Introduction to Unix/Linux
This part introduces the natural environment of bioinformatics: the Linux command line. Material will cover logging into remote machines, filesystem organization and file manipulation, and installing and using software (including examples such as HMMER, BLAST, and MUSCLE). Finally, we introduce the CGRB research infrastructure (including submitting batch jobs) and concepts for data analysis on the command line with tools such as grep and wc.
Command-Line Data Analysis
The Linux command-line environment has long been used for analyzing text-based and scientific data, and there are a large number of tools pre-installed for data analysis. These can be chained together to form powerful pipelines. Material in this part will cover these and related tools (including grep, sort, awk, sed, etc.) driven by examples of biological data in a problem-solving context that introduces programmatic thinking. This part also covers regular expressions, a useful syntax for matching and substituting string and sequence data. Individuals who complete both parts will receive a Certificate of Completion and a Digital Badge detailing the course information.
What You'll Learn
- Leave with the ability to navigate and operate a Linux computational infrastructure via the command-line.
- Understand the installation, functioning, and use of common bioinformatics analysis software packages on a Linux infrastructure.
- Navigate and use the Unix/Linux file system, including understanding directory structure/permissions, and creating/editing/removing files and directories.
- Locate and download bioinformatics data sets along with the installation and use of bioinformatics utilities such as HMMER, BLAST, and MUSCLE.
- Use `sort` and `uniq` to build filtering pipelines for bioinformatics data.
- Use the utilities `sed` and `awk` along with POSIX compliant “regular expressions” (regex) to perform complex pattern matching and extraction on bioinformatics data.
- Submit batch jobs to a computational infrastructure to run (non-interactively) on cluster nodes.
Matthew Peterson is a Faculty Research Assistant at the Oregon State University Center for Genome Research and Biocomputing. Matthew develops bioinformatics processing pipelines for the Center's Illumina, PacBio, and Genotyping by Sequencing services, as well as data management strategies. As a CGRB Bioinformatics Trainer, he helps researchers learn the bioinformatic basics through workshops, user groups, and one-on-one training