-
Tychsen Mortensen posted an update 4 months, 1 week ago
In eukaryotes, 5′-3′ co-translation degradation machinery follows the last translating ribosome providing an in vivo footprint of its position. Thus, 5′ monophosphorylated (5’P) degradome sequencing, in addition to informing about RNA decay, also provides information regarding ribosome dynamics. Multiple experimental methods have been developed to investigate the mRNA degradome; however, computational tools for their reproducible analysis are lacking. Here, we present fivepseq an easy-to-use application for analysis and interactive visualization of 5’P degradome data. This tool performs both metagene- and gene-specific analysis, and enables easy investigation of codon-specific ribosome pauses. To demonstrate its ability to provide new biological information, we investigate gene-specific ribosome pauses in Saccharomyces cerevisiae after eIF5A depletion. In addition to identifying pauses at expected codon motifs, we identify multiple genes with strain-specific degradation frameshifts. To show its wide applicability, we investigate 5’P degradome from Arabidopsis thaliana and discover both motif-specific ribosome protection associated with particular developmental stages and generally increased ribosome protection at termination level associated with age. Our work shows how the use of improved analysis tools for the study of 5’P degradome can significantly increase the biological information that can be derived from such datasets and facilitate its reproducible analysis.Fungal secondary metabolites (SMs) are an important source of numerous bioactive compounds largely applied in the pharmaceutical industry, as in the production of antibiotics and anticancer medications. The discovery of novel fungal SMs can potentially benefit human health. Identifying biosynthetic gene clusters (BGCs) involved in the biosynthesis of SMs can be a costly and complex task, especially due to the genomic diversity of fungal BGCs. Previous studies on fungal BGC discovery present limited scope and can restrict the discovery of new BGCs. In this work, we introduce TOUCAN, a supervised learning framework for fungal BGC discovery. Unlike previous methods, TOUCAN is capable of predicting BGCs on amino acid sequences, facilitating its use on newly sequenced and not yet curated data. It relies on three main pillars rigorous selection of datasets by BGC experts; combination of functional, evolutionary and compositional features coupled with outperforming classifiers; and robust post-processing methods. TOUCAN best-performing model yields 0.982 F-measure on BGC regions in the Aspergillus niger genome. Overall results show that TOUCAN outperforms previous approaches. check details TOUCAN focuses on fungal BGCs but can be easily adapted to expand its scope to process other species or include new features.Pancreatic islet β-cell failure is key to the onset and progression of type 2 diabetes (T2D). The advent of single-cell RNA sequencing (scRNA-seq) has opened the possibility to determine transcriptional signatures specifically relevant for T2D at the β-cell level. Yet, applications of this technique have been underwhelming, as three independent studies failed to show shared differentially expressed genes in T2D β-cells. We performed an integrative analysis of the available datasets from these studies to overcome confounding sources of variability and better highlight common T2D β-cell transcriptomic signatures. After removing low-quality transcriptomes, we retained 3046 single cells expressing 27 931 genes. Cells were integrated to attenuate dataset-specific biases, and clustered into cell type groups. In T2D β-cells (n = 801), we found 210 upregulated and 16 downregulated genes, identifying key pathways for T2D pathogenesis, including defective insulin secretion, SREBP signaling and oxidative stress. We also compared these results with previous data of human T2D β-cells from laser capture microdissection and diabetic rat islets, revealing shared β-cell genes. Overall, the present study encourages the pursuit of single β-cell RNA-seq analysis, preventing presently identified sources of variability, to identify transcriptomic changes associated with human T2D and underscores specific traits of dysfunctional β-cells across different models and techniques.DNA methylation is a stable epigenetic modification, extremely polymorphic and driven by stochastic and deterministic events. Most of the current techniques used to analyse methylated sequences identify methylated cytosines (mCpGs) at a single-nucleotide level and compute the average methylation of CpGs in the population of molecules. Stable epialleles, i.e. CpG strings with the same DNA sequence containing a discrete linear succession of phased methylated/non-methylated CpGs in the same DNA molecule, cannot be identified due to the heterogeneity of the 5′-3′ ends of the molecules. Moreover, these are diluted by random unstable methylated CpGs and escape detection. We present here MethCoresProfiler, an R-based tool that provides a simple method to extract and identify combinations of methylated phased CpGs shared by all components of epiallele families in complex DNA populations. The methylated cores are stable over time, evolve by acquiring or losing new methyl sites and, ultimately, display high information content and low stochasticity. We have validated this method by identifying and tracing rare epialleles and their families in synthetic or in vivo complex cell populations derived from mouse brain areas and cells during postnatal differentiation. MethCoresProfiler is written in R language. The software is freely available at https//github.com/84AP/MethCoresProfiler/.Influenza A viruses (IAVs) use diverse mechanisms to interfere with cellular gene expression. Although many RNA-seq studies have documented IAV-induced changes in host mRNA abundance, few were designed to allow an accurate quantification of changes in host mRNA splicing. Here, we show that IAV infection of human lung cells induces widespread alterations of cellular splicing, with an overall increase in exon inclusion and decrease in intron retention. Over half of the mRNAs that show differential splicing undergo no significant changes in abundance or in their 3′ end termination site, suggesting that IAVs can specifically manipulate cellular splicing. Among a randomly selected subset of 21 IAV-sensitive alternative splicing events, most are specific to IAV infection as they are not observed upon infection with VSV, induction of interferon expression or induction of an osmotic stress. Finally, the analysis of splicing changes in RED-depleted cells reveals a limited but significant overlap with the splicing changes in IAV-infected cells. This observation suggests that hijacking of RED by IAVs to promote splicing of the abundant viral NS1 mRNAs could partially divert RED from its target mRNAs. All our RNA-seq datasets and analyses are made accessible for browsing through a user-friendly Shiny interface (http//virhostnet.prabi.fr3838/shinyapps/flu-splicing or https//github.com/cbenoitp/flu-splicing).Measurements in sequencing studies are mostly based on counts. There is a lack of theoretical developments for the analysis and modelling of this type of data. Some thoughts in this direction are presented, which might serve as a seed. The main issues addressed are the compositional character of multinomial probabilities and the corresponding representation in orthogonal (isometric) coordinates, and modelling distributions for sequencing data taking into account possible effects of amplification techniques.RNA-seq studies are growing in size and popularity. We provide evidence that the most commonly used methods for differential expression analysis (DEA) may yield too many false positive results in some situations. We present dearseq, a new method for DEA that controls the false discovery rate (FDR) without making any assumption about the true distribution of RNA-seq data. We show that dearseq controls the FDR while maintaining strong statistical power compared to the most popular methods. We demonstrate this behavior with mathematical proofs, simulations and a real data set from a study of tuberculosis, where our method produces fewer apparent false positives.Recently we presented a frequentist dynamic programming (DP) approach for multiple sequence alignment based on the explicit model of indel evolution Poisson Indel Process (PIP). This phylogeny-aware approach produces evolutionary meaningful gap patterns and is robust to the ‘over-alignment’ bias. Despite linear time complexity for the computation of marginal likelihoods, the overall method’s complexity is cubic in sequence length. Inspired by the popular aligner MAFFT, we propose a new technique to accelerate the evolutionary indel based alignment. Amino acid sequences are converted to sequences representing their physicochemical properties, and homologous blocks are identified by multi-scale short-time Fourier transform. Three three-dimensional DP matrices are then created under PIP, with homologous blocks defining sparse structures where most cells are excluded from the calculations. The homologous blocks are connected through intermediate ‘linking blocks’. The homologous and linking blocks are aligned under PIP as independent DP sub-matrices and their tracebacks merged to yield the final alignment. The new algorithm can largely profit from parallel computing, yielding a theoretical speed-up estimated to be proportional to the cubic power of the number of sub-blocks in the DP matrices. We compare the new method to the original PIP approach and demonstrate it on real data.The advent of single-cell open-chromatin profiling technology has facilitated the analysis of heterogeneity of activity of regulatory regions at single-cell resolution. However, stochasticity and availability of low amount of relevant DNA, cause high drop-out rate and noise in single-cell open-chromatin profiles. We introduce here a robust method called as forest of imputation trees (FITs) to recover original signals from highly sparse and noisy single-cell open-chromatin profiles. FITs makes multiple imputation trees to avoid bias during the restoration of read-count matrices. It resolves the challenging issue of recovering open chromatin signals without blurring out information at genomic sites with cell-type-specific activity. Besides visualization and classification, FITs-based imputation also improved accuracy in the detection of enhancers, calculating pathway enrichment score and prediction of chromatin-interactions. FITs is generalized for wider applicability, especially for highly sparse read-count matrices. The superiority of FITs in recovering signals of minority cells also makes it highly useful for single-cell open-chromatin profile from in vivo samples. The software is freely available at https//reggenlab.github.io/FITs/.RNA function crucially depends on its structure. Thermodynamic models currently used for secondary structure prediction rely on computing the partition function of folding ensembles, and can thus estimate minimum free-energy structures and ensemble populations. These models sometimes fail in identifying native structures unless complemented by auxiliary experimental data. Here, we build a set of models that combine thermodynamic parameters, chemical probing data (DMS and SHAPE) and co-evolutionary data (direct coupling analysis) through a network that outputs perturbations to the ensemble free energy. Perturbations are trained to increase the ensemble populations of a representative set of known native RNA structures. In the chemical probing nodes of the network, a convolutional window combines neighboring reactivities, enlightening their structural information content and the contribution of local conformational ensembles. Regularization is used to limit overfitting and improve transferability. The most transferable model is selected through a cross-validation strategy that estimates the performance of models on systems on which they are not trained.