Machine learning discriminates a movement disorder in a zebrafish model of Parkinson's disease

ABSTRACT Animal models of human disease provide an in vivo system that can reveal molecular mechanisms by which mutations cause pathology, and, moreover, have the potential to provide a valuable tool for drug development. Here, we have developed a zebrafish model of Parkinson's disease (PD) together with a novel method to screen for movement disorders in adult fish, pioneering a more efficient drug-testing route. Mutation of the PARK7 gene (which encodes DJ-1) is known to cause monogenic autosomal recessive PD in humans, and, using CRISPR/Cas9 gene editing, we generated a Dj-1 loss-of-function zebrafish with molecular hallmarks of PD. To establish whether there is a human-relevant parkinsonian phenotype in our model, we adapted proven tools used to diagnose PD in clinics and developed a novel and unbiased computational method to classify movement disorders in adult zebrafish. Using high-resolution video capture and machine learning, we extracted novel features of movement from continuous data streams and used an evolutionary algorithm to classify parkinsonian fish. This method will be widely applicable for assessing zebrafish models of human motor diseases and provide a valuable asset for the therapeutics pipeline. In addition, interrogation of RNA-seq data indicate metabolic reprogramming of brains in the absence of Dj-1, adding to growing evidence that disruption of bioenergetics is a key feature of neurodegeneration. This article has an associated First Person interview with the first author of the paper.

Genes significantly up-regulated in dj-1 -/zebrafish brains Genes that were up-regulated in the brains of dj-1 -/mutants (n = 3) when compared to their wild-type siblings (n = 3) at 16 wpf. Differentially expressed transcripts from the RNA-Seq analysis with a Q-value of <0.05 were considered significant. tpm = transcripts per million.

Table S2b
Genes significantly down-regulated in dj1 -/zebrafish brains Genes that were down-regulated in the brains of dj-1 -/mutants (n = 3) when compared to their wild-type siblings (n = 3) at 16 wpf. Differentially expressed transcripts from the RNA-Seq analysis with a Q-value of <0.05 were considered significant. tpm = transcripts per million.  Hallmark gene sets enriched in the dj-1 -/zebrafish brain. A list of Hallmark gene sets found significantly enriched in the dj-1 -/zebrafish brain. The thresholds for significance were NES >1.5, p-value<0.05 and FDRadjusted q-value<0.05. NES = normalized enrichment score.

Table S4 Selection and calculations of extracted features of movement
The features of movement being extracted and the previous research, investigating movement phenotypes in zebrafish, that inspired them. The principles used in this work are described in addition to the principles used in previous works.

Table S5 Numbers used to evolve classifiers with features of movement
The number of mutant and age matched control clips used to evolve classifiers with features of movement for each mutant line. An equal number of clips needed to be present in each class as accuracy was used as the measure of fitness. Each clip also had to have a value present for the features calculated (e.g. low/medium/high speed features).

Table S6 Numbers used to evolve sliding window classifiers
The number of mutant and age matched control clips used to evolve sliding window classifiers for each mutant line. The sliding window classifiers used the area under the curve as a measure of fitness which is insensitive to class imbalances. This allowed each class to a have a different number of clips.

Analysis of extracted features of movement in dj-1 -/zebrafish at 8 wpf
The features of movement compared between dj-1 -/and wild-type (WT) at 8 wpf including (A) distance travelled, (B) velocity, (C) percentage of time spent moving, (D) mean duration of a swimming episode, (E) tail beat frequency at low, medium and high swimming speeds, (F) tail bend amplitude at low, medium and high swimming speeds. * = P < 0.05, ** = P < 0.01, *** = P<0.001, ns = not significant.

Principle component analysis
The 5 principal components (PCs), generated from linear combinations of the 5 angles along the zebrafish spine, plot against time. The first PC (PC1) explains the most variation in the data, followed by PC2 and with each subsequent PC explaining less of the variation. Together all of the PCs explain 100% of the variation within the tail bend angle data. PC1 captured only broad movements whilst PC2 retained some high frequency movements.

Scores of the dmd ta222a/+ sliding window classifiers
Training and test scores (AUCs) from the 20 sliding window classifiers evolved using the PC2 time series data from the 20 folds of data set.