The scary 2D space, bioinformatics for microbiologists

By David Hourigan, PhD Student

 

 

My first introduction to computer coding was at the tender age of 12 when our school class got a brief course in website design through HTML using Notepad, a barebones text editor. You’re probably thinking ‘People use that?!’. Each website was meticulously and patiently crafted upon a cathode ray tube monitor displaying no more than thirty lines of text. Not until a decade later would I understand the true benefit of that initial encounter with the ‘coding kind’, and because of that, I was never daunted by the thought of programming. Instead, I looked at it with intrigue (it truly reminded me of the matrix movies or video games but let’s not dwell on that). I count myself lucky to have had that opportunity, and it benefitted me when I returned to learning bioinformatics during the ‘COVID-stage’ of my PhD. But now, deep into the genomic era, how do we encourage every biologist, microbiologist, and scientist to learn?

There is no doubt that the future of biology will further involve a hybrid blend of computer-guided experimental design, some of which can be seen today, where most publications in the field of microbiology have bioinformatics to back up data. Similarly, it’s become fashionable for early-career microbiologists to have both laboratory and computing expertise. But what does this mean for wet-lab science? Why are some bioinformatic terms so scary, and what can be done to encourage others to jump aboard the informatics train? Deep neural networks, machine learning and other complex algorithms allow informatically adept researchers to probe the troves of genomic data for signals within the microbiome. Implementing these tools can predict disease state relapse, identify biomarkers, annotate functions of genes and proteins, and predict genomes of bacteria, phages or plasmids in metagenomic datasets, to name but a few functions. But why is learning how to use these tools and the overarching umbrella term bioinformatics and computational biology so complex and scary?

 

The true essence of computing is open source with an ‘anyone can do it’ philosophy. After all, you only need a computer and an internet connection. But why do scientists see the command line interface as a ‘scary 2D place’ (I’m quoting my colleague on that one). Intrigued, when I asked my colleagues for deeper insight, I found that a preference for kinetic and visual learning styles led lab members to prefer the wet lab because of its visual nature, and you could see and ‘feel’ the results (physically, not emotionally). This leads me to think that the gap of trust between the computer and the wet lab is a mere artefact of lack of exposure, knowledge of the tools available and self-confidence in one’s own computing ability. I truly believe all microbiologists do not need to be bioinformatic experts, but they should at least attain a particular set of skills which will allow them to navigate the hybrid space of the future. Ignoring the bespoke tools for annotating genomes and deploying complex algorithms, what other tools exist within the space to give potential users confidence and trust in their own abilities? To begin, a lot of resources on the web use interactive quizzes and games to keep the user interested - but on the other hand, it can be challenging to build a chess game using Python when feeling like you’re wasting time with something that won’t benefit your work. Chess games, counting fruit in a shopping market’s stock, or grading school children’s exams were only a fraction of the tasks I was challenged with when beginning to learn. These tasks often took days to complete, and the longer courses can take months to complete. But learning the fundamentals of data handling, manoeuvring around data frames, and calculating results are core skills I now use on a daily basis. It’s just about getting over the hump. So how can we encourage our peers to challenge themselves and get over that hump?

 Here are some of my tips to help, encourage and hopefully inspire some of you to conquer the internal fear of the dreaded 2D space.

1.    Pick a language and stick with it. This might seem obvious to some, but it wasn’t for me. I wanted to learn whilst simultaneously completing work so it felt like I was getting data for publication and thesis purposes. This led me to want to learn some R and some Python because I geared my learning toward the two tools I wanted to use. This didn’t help! (Obviously. I know right?). Eventually, I crumbled with self-inflicted frustration. I then restarted solely with R from scratch, step by step building confidence day by day. What inspired me to choose R was the available tools in the Bioconductor (This is essentially a filing cabinet of R-tools to be used by you and me for scientific data purposes, each tool with a manual of how to use). If the tools are already available, why wouldn’t we use them? I also suggest using an integrated development environment (IDE) which you feel comfortable with. What is an IDE I hear you say? It’s essentially the software you use which has a user interface, it allows us to interact with the computer in a more familiar “point and click” way. You may have heard of some of them such as Rstudio, Visual Studio, and Jupyter Notebook, each comes with its own perks. I suggest this because if you plan on looking at a screen on the daily basis to code, you may as well do it in a comfortable setting. Visual Studio and Rstudio also have interactive auto-complete functions. What does this mean? It means if you’re lazy like I am, the software you use will predict the code you need to type. Similar to that of a text message when you partially type a word, and you just tap the correct suggestion. A cheat code, if you will. IDEs are capable of highlighting your errors too! So use them. They are your friend.

 

2.    Use resources. Ask questions. The internet is an endless bucket of information. Period. Udemy, Codeacademy, Coursera, R Programming by Johns Hopkins University, R for data science books. The list of courses and resources is endless. All resources mentioned have introductory and intermediate levels of computer programming - the point here isn’t saying go and learn everything. The point is to find a course that is tailored to your skill level, your needs, your learning style and your language. If using a Linux server or an R-package, the majority of tools have built-in manuals. For example, the simple command ‘ls’ on Linux which lists the files and folders in your current location has a manual which can be read using the command ‘man ls’. If you come to a point where the manuals for tools and packages aren’t sufficient, there are numerous forums for you to ask your questions; stackoverflow.com biostars.org. Ask questions and don’t be afraid to be wrong. If you’re lucky enough to be surrounded by bioinformaticians in your lab, or up the corridor, approach them and ask them questions. Bribe them if necessary! Everyone began learning at the beginning - with nothing. They can point you to resources they know are useful for whatever specific purpose you need. When I say this, I don’t mean ask them for answers, I mean ask for guidance on good resources or how to design your in silico experiment, or ask how would they approach the problem you wish to solve. They have been in the space, so they know how to tackle projects appropriately and include adequate controls.

3.    Take your time. “Rome wasn’t built in a day”. When we were children, it wasn’t expected of us to walk without learning, to read/write our native language without learning or play sports without learning. The key emphasis here is that learning takes time. I feel as we become adults we put ourselves under pressure to get jobs completed ASAP. But as kids, we would try and fail, and try and fail again! Each time without deterrent. Learning to code is no different. Once settled on a language and you begin to practice, you will inevitably hit hurdles along your journey. You will forget commas, brackets, letters, and full stops with each mistake detrimental to the code you’re writing. Don’t give up. Fail and fail again! Over time looking for errors within syntax (the structure and punctuation of the programming language) will become second nature. But challenge yourself, and if you need a mental break, take a mental break. A fresh and willing mind is helpful for you to learn, and each day you complete something, make notes, and keep these notes as they can be copied and pasted to be used again and again!

4.    Make it fun. The final piece of the puzzle, I found, was to incorporate a side-project into your learning experience that takes you away from the biological work setting. Sometimes we have enough going wrong in the lab to want to spend further time on the computer learning through trial and error. As I mentioned previously, some courses incorporate games both building them and playing them. If that isn’t for you perhaps you prefer to look at football data, are curious about a squirrel census, want to plot UFO sightings or even plot your favourite tv characters! Find a second interest that separates some of your learning from the work you’re doing. It will benefit you, I promise. Doing these trivial tasks build skills which can transfer to biological data and in no time you’ll be in a position to confidently annotate that genome.

If the ‘scary 2D’ space is something that previously resonated with you, I hope some of these suggestions will encourage you to fail with me in learning! It’s not about getting it perfectly right the first time, it’s a process of learning, building confidence and taking steps in the right direction. So, I’ll leave you with a quote from the great philosopher Homer* - “Weaseling out of things is important to learn. It’s what separates us from the animals… except the weasel.”

*Homer Simpson

 

David Hourigan, Microbiology PhD Student at APC Microbiome Ireland

 


 
 
ISAPP_SFA