{ "cells": [ { "cell_type": "markdown", "id": "3960a901", "metadata": {}, "source": [ "# Centre Universitaire de Mila" ] }, { "cell_type": "markdown", "id": "c4568d4a", "metadata": {}, "source": [ "## Master 1 (STIC & I2A), Matière: Apprentissage Automatique" ] }, { "cell_type": "markdown", "id": "a8bedc4a", "metadata": {}, "source": [ "## Travaux pratiques N°4 : Détection de la maladie de Parkinson" ] }, { "cell_type": "markdown", "id": "f57e8c63", "metadata": {}, "source": [ "## Enoncé" ] }, { "cell_type": "markdown", "id": "751221fd", "metadata": {}, "source": [ "L'ensemble de données sur la maladie de Parkinson a été créé par Max Little de l'Université d'Oxford, en collaboration avec le National Center for Voice and Speech, Denver, Colorado, qui a enregistré les signaux vocaux. L'étude originale a publié les méthodes d'extraction de caractéristiques pour les troubles généraux de la voix.\n" ] }, { "cell_type": "markdown", "id": "e69cb11f", "metadata": {}, "source": [ "### Informations sur l'ensemble de données :\n", "\n", "Ce dataset est composé d'un ensemble de mesures biomédicales de la voix de 31 personnes, dont 23 atteintes de la maladie de Parkinson (PD). Chaque colonne du tableau est une mesure de voix particulière, et chaque ligne correspond à l'un des 195 enregistrements vocaux de ces individus. L'objectif principal des données est de discriminer les personnes en bonne santé de celles atteintes de la maladie de Parkinson, selon la colonne \"statut\" qui est définie sur 0 pour les personnes en bonne santé et 1 pour les malades.\n" ] }, { "cell_type": "markdown", "id": "bca5e82a", "metadata": {}, "source": [ "### Informations sur les attributs :\n", "\n", "Matrix column entries (attributes):\n", "name - ASCII subject name and recording number\n", "MDVP:Fo(Hz) - Average vocal fundamental frequency\n", "MDVP:Fhi(Hz) - Maximum vocal fundamental frequency\n", "MDVP:Flo(Hz) - Minimum vocal fundamental frequency\n", "MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP - Several measures of variation in fundamental frequency\n", "MDVP:Shimmer,MDVP:Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,MDVP:APQ,Shimmer:DDA - Several measures of variation in amplitude\n", "NHR,HNR - Two measures of ratio of noise to tonal components in the voice\n", "RPDE,D2 - Two nonlinear dynamical complexity measures\n", "DFA - Signal fractal scaling exponent\n", "spread1,spread2,PPE - Three nonlinear measures of fundamental frequency variation\n", "\n", "status - Health status of the subject (one) - Parkinson's, (zero) - healthy\n" ] }, { "cell_type": "markdown", "id": "0cbfedb0", "metadata": {}, "source": [ "### Exercice:\n", "\n", "1-\tÉtudier l’objet naive_bayes.GaussianNB de la librairie Python, scikit-learn.\n", "\n", "https://scikit-learn.org/stable/modules/naive_bayes.html\n", "https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html\n", "\n", "2-\tUtiliser une validation croisée pour calculer l’erreur de classification (ex. moyenne de 10 validations en retenant à chaque fois 10% de données pour la validation).\n", "\n" ] }, { "cell_type": "markdown", "id": "11ae8da9", "metadata": {}, "source": [ "## Solution" ] }, { "cell_type": "markdown", "id": "aa55f351", "metadata": {}, "source": [ "### 1) Load libraries:\n", "Let's first load the required libraries." ] }, { "cell_type": "code", "execution_count": 1, "id": "31af453b", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "markdown", "id": "2e700e39", "metadata": {}, "source": [ "### 2) Loading Data:\n", "Let's first load the required iris dataset using parkinsons and panda's DataFrame function." ] }, { "cell_type": "code", "execution_count": 2, "id": "6b364573", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | name | \n", "MDVP:Fo(Hz) | \n", "MDVP:Fhi(Hz) | \n", "MDVP:Flo(Hz) | \n", "MDVP:Jitter(%) | \n", "MDVP:Jitter(Abs) | \n", "MDVP:RAP | \n", "MDVP:PPQ | \n", "Jitter:DDP | \n", "MDVP:Shimmer | \n", "... | \n", "Shimmer:DDA | \n", "NHR | \n", "HNR | \n", "RPDE | \n", "DFA | \n", "spread1 | \n", "spread2 | \n", "D2 | \n", "PPE | \n", "status | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "phon_R01_S01_1 | \n", "119.992 | \n", "157.302 | \n", "74.997 | \n", "0.00784 | \n", "0.00007 | \n", "0.00370 | \n", "0.00554 | \n", "0.01109 | \n", "0.04374 | \n", "... | \n", "0.06545 | \n", "0.02211 | \n", "21.033 | \n", "0.414783 | \n", "0.815285 | \n", "-4.813031 | \n", "0.266482 | \n", "2.301442 | \n", "0.284654 | \n", "1 | \n", "
1 | \n", "phon_R01_S01_2 | \n", "122.400 | \n", "148.650 | \n", "113.819 | \n", "0.00968 | \n", "0.00008 | \n", "0.00465 | \n", "0.00696 | \n", "0.01394 | \n", "0.06134 | \n", "... | \n", "0.09403 | \n", "0.01929 | \n", "19.085 | \n", "0.458359 | \n", "0.819521 | \n", "-4.075192 | \n", "0.335590 | \n", "2.486855 | \n", "0.368674 | \n", "1 | \n", "
2 | \n", "phon_R01_S01_3 | \n", "116.682 | \n", "131.111 | \n", "111.555 | \n", "0.01050 | \n", "0.00009 | \n", "0.00544 | \n", "0.00781 | \n", "0.01633 | \n", "0.05233 | \n", "... | \n", "0.08270 | \n", "0.01309 | \n", "20.651 | \n", "0.429895 | \n", "0.825288 | \n", "-4.443179 | \n", "0.311173 | \n", "2.342259 | \n", "0.332634 | \n", "1 | \n", "
3 | \n", "phon_R01_S01_4 | \n", "116.676 | \n", "137.871 | \n", "111.366 | \n", "0.00997 | \n", "0.00009 | \n", "0.00502 | \n", "0.00698 | \n", "0.01505 | \n", "0.05492 | \n", "... | \n", "0.08771 | \n", "0.01353 | \n", "20.644 | \n", "0.434969 | \n", "0.819235 | \n", "-4.117501 | \n", "0.334147 | \n", "2.405554 | \n", "0.368975 | \n", "1 | \n", "
4 | \n", "phon_R01_S01_5 | \n", "116.014 | \n", "141.781 | \n", "110.655 | \n", "0.01284 | \n", "0.00011 | \n", "0.00655 | \n", "0.00908 | \n", "0.01966 | \n", "0.06425 | \n", "... | \n", "0.10470 | \n", "0.01767 | \n", "19.649 | \n", "0.417356 | \n", "0.823484 | \n", "-3.747787 | \n", "0.234513 | \n", "2.332180 | \n", "0.410335 | \n", "1 | \n", "
5 rows × 24 columns
\n", "