Evaluation of a new nauronal induction protocol using Single-Cell RNA-Sequencing and machine learning

dc.contributor.advisorCosta, Marcos Romualdo
dc.contributor.advisorLatteshttp://lattes.cnpq.br/6118493598074445pt_BR
dc.contributor.authorCarvalho, Lukas Iohan da Cruz
dc.contributor.authorLatteshttp://lattes.cnpq.br/2797997375475881pt_BR
dc.contributor.referees1Hedin-Pereira, Cecília
dc.contributor.referees2Lourenço, Mychael Vinícius da Costa
dc.contributor.referees3Dalmolin, Rodrigo Juliani Siqueira
dc.contributor.referees3IDhttps://orcid.org/0000-0002-1688-6155pt_BR
dc.contributor.referees3Latteshttp://lattes.cnpq.br/4065178015615979pt_BR
dc.contributor.referees4Velho, Tarciso André Ferreira
dc.date.accessioned2024-05-17T13:24:27Z
dc.date.available2024-05-17T13:24:27Z
dc.date.issued2024-02-26
dc.description.resumoCell type identification is a critical step in the computational analysis of scRNA-Seq experiments, involving the unsupervised grouping of cells based on gene expression profiles. Traditional methods relying on canonical gene markers exhibit limitations, such as sensitivity to variations and the absence of characteristic genes for certain cell types. To address these challenges, we propose a novel approach combining machine learning algorithms with feature selection. Our methodology involves selecting a dataset suitable for training a model to ensure generalization to new data. We chose a comprehensive dataset encompassing the central and peripheral nervous system from mice at different developmental stages. Subsequently, feature selection was applied using the DUBStepR algorithm, considering gene-gene correlations to identify optimal features for cell classification. The resulting dataset, composed of 28,795 cells and 16,960 genes, was used to train and evaluate models employing k Nearest Neighborhood (kNN), Decision Tree (DT), Naive Bayes (NB), Support Vector Machine (SVM) and Multilayer Perceptron (MLP) algorithms. All models demonstrated F1-scores exceeding 90%, except for NB. Testing on a human brain scRNA-Seq dataset confirmed the robustness of the algorithms, with area under curve (AUC) values indicating accurate cell classification. SVM and MLP were selected for further analysis due to lower false positive and false negative rates. Comparisons with existing tools such as scAnnotatR and ACTINN highlight the versatility of our approach, particularly when dealing with diverse cell types. Next, we applied the SVM and MLP models to classify neurons generated in vitro human-induced neurons (hiNs) generated using distinct protocols, achieving consistent results in identifying glutamatergic and GABAergic neurons. We also attempted to classify hiNs according to cells of different brain regions, revealing challenges in classifying GABAergic neurons by region, possibly due to a limited number of optimal features. Gene expression analysis and Gene Set Enrichment Analysis (GSEA) contributed to identify gene sets associated with the electrophysiological maturation of glutamatergic hiNs generated through an alternative protocol using ASCL1 compared to other protocols. Regulatory network analysis identified master transcription factors with higher activity specifically in this protocol. In conclusion, our integrated approach of feature selection and machine learning algorithms offers an alternative way of identifying cell groups based on gene expression profiles, enhancing the refinement of single-cell analysis in the context of differential gene expression, GSEA, and regulatory gene networks.pt_BR
dc.description.sponsorshipCoordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPESpt_BR
dc.identifier.citationCARVALHO, Lukas Iohan da Cruz. Evaluation of a new nauronal induction protocol using Single-Cell RNA-Sequencing and machine learning. Orientador: Dr. Marcos Romualdo Costa. 2024. 128f. Tese (Doutorado em Bioinformática) - Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, Natal, 2024.pt_BR
dc.identifier.urihttps://repositorio.ufrn.br/handle/123456789/58360
dc.languagept_BRpt_BR
dc.publisherUniversidade Federal do Rio Grande do Nortept_BR
dc.publisher.countryBrasilpt_BR
dc.publisher.initialsUFRNpt_BR
dc.publisher.programPROGRAMA DE PÓS-GRADUAÇÃO EM BIOINFORMÁTICApt_BR
dc.rightsAcesso Abertopt_BR
dc.subjectSingle-Cell RNA-Seqpt_BR
dc.subjectMachine learningpt_BR
dc.subjectiPSC-derived neuronspt_BR
dc.subject.cnpqCNPQ::CIENCIAS BIOLOGICASpt_BR
dc.titleEvaluation of a new nauronal induction protocol using Single-Cell RNA-Sequencing and machine learningpt_BR
dc.typedoctoralThesispt_BR

Arquivos

Pacote Original

Agora exibindo 1 - 1 de 1
Nenhuma Miniatura disponível
Nome:
Evaluationnewnauronal_Carvalho_2024.pdf
Tamanho:
12.28 MB
Formato:
Adobe Portable Document Format
Nenhuma Miniatura disponível
Baixar