Biomedical Data Preprocessing

Biomedical Data Preprocessing

Although most elements of data science have always been present in the informatics disciplines, there seems to be a particular skill set that is more pressingly relevant in current applications. Today's challenges include very large data sets that must be managed carefully because, for example, they do not fit in the working memory of a typical computer. In addition, there is a need for large-scale annotation and metadata that explain how the data were generated and what the sources of noise are. There is increased interest in applying machine learning to very large data sets. The high-profile success of neural network–based deep learning systems has created an active market for individuals with the knowledge required to build and deploy these systems. Thus, there seems to be a specific skill set for data science that is a subset of all of informatics and that addresses the pressing needs of those who need more data science. Interestingly, this workforce need has led to the creation of domain-independent data science training programs in which trainees learn the key skills for which there currently is a strong market. They become experts at managing data but may not have any specific knowledge in the area of application, depending on collaborators to provide domain knowledge to ensure that the questions they ask and answer are relevant and well formed.


Last Updated on: Nov 26, 2024

Global Scientific Words in Bioinformatics & Systems Biology