Ranzcr

Tutorial of Ranzcr EDA

강의 홍보

Competition

Intro

import os

import pandas as pd

from matplotlib import pyplot as plt
import seaborn as sns

Check File Size

  • Check Each Size of Dataset Folder in this competition
    • train_records = 4.5GB
    • test_tfrecords = 0.5MB
    • train (image data) = 6.5GB
    • test (image data) = 0.8MB
import os

def get_folder_size(file_directory):
  # file_list = os.listdir(file_directory)
  dir_sizes = {}
  for r, d, f in os.walk(file_directory, False):
      size = sum(os.path.getsize(os.path.join(r,f)) for f in f+d)
      size += sum(dir_sizes[os.path.join(r,d)] for d in d)
      dir_sizes[r] = size
      print("{} is {} MB".format(r, round(size/2**20), 2))      
  
base_dir = '../input/ranzcr-clip-catheter-line-classification'
get_folder_size(base_dir)
../input/ranzcr-clip-catheter-line-classification/test is 805 MB
../input/ranzcr-clip-catheter-line-classification/test_tfrecords is 555 MB
../input/ranzcr-clip-catheter-line-classification/train_tfrecords is 4563 MB
../input/ranzcr-clip-catheter-line-classification/train is 6592 MB
../input/ranzcr-clip-catheter-line-classification is 12524 MB

Check train file

  • Let’s descirbe train
train = pd.read_csv('../input/ranzcr-clip-catheter-line-classification/train.csv', index_col = 0)
test = pd.read_csv('../input/ranzcr-clip-catheter-line-classification/sample_submission.csv', index_col = 0)
display(train.head())
display(test.head())
ETT - AbnormalETT - BorderlineETT - NormalNGT - AbnormalNGT - BorderlineNGT - Incompletely ImagedNGT - NormalCVC - AbnormalCVC - BorderlineCVC - NormalSwan Ganz Catheter PresentPatientID
StudyInstanceUID
1.2.826.0.1.3680043.8.498.2669762895327322818937555779958242056100000010000ec89415d1
1.2.826.0.1.3680043.8.498.4630289159739875875981862867536515772900100100010bf4c6da3c
1.2.826.0.1.3680043.8.498.23819260719748494858948050424870692577000000001003fc1c97e5
1.2.826.0.1.3680043.8.498.6828664320232321280128351836714435874400000001000c31019814
1.2.826.0.1.3680043.8.498.1005020300922593825911900052881476217500000000010207685cd1
ETT - AbnormalETT - BorderlineETT - NormalNGT - AbnormalNGT - BorderlineNGT - Incompletely ImagedNGT - NormalCVC - AbnormalCVC - BorderlineCVC - NormalSwan Ganz Catheter Present
StudyInstanceUID
1.2.826.0.1.3680043.8.498.4692314557909600261710656729713516093200000000000
1.2.826.0.1.3680043.8.498.8400687018261108009182410976756156488700000000000
1.2.826.0.1.3680043.8.498.1221903329441311994751549472068754167200000000000
1.2.826.0.1.3680043.8.498.8499447438023596810990684554070609267100000000000
1.2.826.0.1.3680043.8.498.3579898779380566966257210888174520137200000000000

Definitions of Variables

  • What’s inside data?
    • StudyInstanceUID - unique ID for each image
    • ETT - Abnormal - endotracheal tube placement abnormal
    • ETT - Borderline - endotracheal tube placement borderline abnormal
    • ETT - Normal - endotracheal tube placement normal
    • NGT - Abnormal - nasogastric tube placement abnormal
    • NGT - Borderline - nasogastric tube placement borderline abnormal
    • NGT - Incompletely Imaged - nasogastric tube placement inconclusive due to imaging
    • NGT - Normal - nasogastric tube placement borderline normal
    • CVC - Abnormal - central venous catheter placement abnormal
    • CVC - Borderline - central venous catheter placement borderline abnormal
    • CVC - Normal - central venous catheter placement normal
    • Swan Ganz Catheter Present(??)
    • PatientID - unique ID for each patient in the dataset