Data Wash - A professional data cleaning and analysis tool with Python Flask backend and React frontend.
- File Upload: Support for CSV and Excel files (up to 16MB)
- Data Preview: View first 5 rows or scroll through entire dataset
- Dataset Information: Shape, missing values analysis, and statistical summary
- Data Types: Comprehensive column data type analysis and categorization
- Dataset Description: Detailed column analysis with statistics and sample values
- Column Management: Interactive column dropping with multi-selection
- Missing Value Imputation: Fill missing values using various methods (mean, median, mode, custom values)
- Data Visualization: Interactive plots and charts based on data types
- Smart Plot Selection: Plot types automatically adapt based on selected axes and data types
- Supported Plots:
- Scatter plots (numeric vs numeric)
- Line plots (numeric vs numeric)
- Bar charts (categorical data)
- Histograms (numeric distributions)
- Box plots (outlier detection)
- Correlation Analysis: Heatmap and correlation matrix for numeric columns
- Professional Design: Modern gradient backgrounds with glassmorphism effects
- Responsive Layout: Works on desktop and mobile devices with horizontal scrolling for tabs
- Interactive Navigation: 8-tab interface with disabled states and smart navigation
- Real-time Feedback: Loading states, error handling, and success messages
- Data Pagination: Efficient handling of large datasets
- Dynamic Forms: Add/remove functionality for column operations
- Smart Validation: Prevents duplicate selections and validates inputs
AISA/
βββ backend/
β βββ app.py # Flask API server
β βββ requirements.txt # Python dependencies
β βββ uploads/ # File upload directory
βββ frontend/
β βββ public/
β β βββ index.html
β β βββ manifest.json
β βββ src/
β β βββ components/
β β β βββ FileUpload.js
β β β βββ DataPreview.js
β β β βββ DataInfo.js
β β β βββ DataTypeInfo.js
β β β βββ DatasetDescription.js
β β β βββ ColumnDropper.js
β β β βββ MissingValueImputation.js
β β β βββ DataVisualization.js
β β βββ App.js
β β βββ App.css
β β βββ index.js
β β βββ index.css
β βββ package.json
βββ README.md
-
Navigate to backend directory:
cd backend -
Create virtual environment (recommended):
python -m venv venv venv\Scripts\activate # Windows
-
Install Python dependencies:
pip install -r requirements.txt
-
Run the Flask server:
python app.py
Server will start on
http://localhost:5000
-
Navigate to frontend directory:
cd frontend -
Install Node.js dependencies:
npm install
-
Start the React development server:
npm start
Application will open on
http://localhost:3000
POST /api/upload- Upload CSV/Excel fileGET /api/preview- Get first 5 rows of dataGET /api/data- Get complete datasetGET /api/info- Get dataset information and statisticsPOST /api/plot- Generate custom plotsGET /api/correlation- Get correlation matrix and heatmapPOST /api/plot-options- Get available plot types for selected axesPOST /api/column-analysis- Get detailed analysis for a specific columnPOST /api/drop-columns- Remove selected columns from datasetPOST /api/impute-missing- Apply missing value imputation rules
- Drag and drop or click to browse for CSV/Excel files
- Maximum file size: 16MB
- Supported formats:
.csv,.xlsx,.xls
- View first 5 rows for quick preview
- Switch to full dataset view with pagination
- Scroll through large datasets efficiently
- Dataset Info: View shape, missing values, and statistical summary
- Missing Values: Highlighted columns with missing data and severity levels
- Statistics: Descriptive statistics for all numeric columns
- View all column data types with descriptions
- Filter by type category (numeric, text, datetime, boolean)
- Sort by column name, data type, or category
- Color-coded type indicators
- Select any column for detailed analysis
- View statistics, sample values, and frequency distribution
- Numeric columns show mean, median, std deviation, etc.
- Text columns show most frequent values
- Add Multiple Columns: Use + button to add columns to drop list
- Smart Selection: Prevents selecting the same column twice
- Visual Feedback: See selected columns highlighted
- Batch Operations: Drop multiple columns at once
- Multiple Imputation Methods:
- Mean: For numeric columns (average value)
- Median: For numeric columns (middle value)
- Mode: Most frequent value (works for all types)
- Forward Fill: Use previous valid value
- Backward Fill: Use next valid value
- Custom Value: Specify your own replacement value
- Smart Method Selection: Available methods adapt to data type
- Rule-based System: Add multiple imputation rules
- Preview: See which rules will be applied before execution
- Custom Plots: Select X/Y axes and plot type
- Smart Suggestions: Plot types adapt to your data selection
- Correlation Analysis: View correlation matrix and heatmap
- Interactive Controls: Reset and regenerate plots easily
- Flask: Web framework for API endpoints
- Pandas: Data manipulation and analysis
- Matplotlib & Seaborn: Data visualization
- NumPy: Numerical computing
- OpenPyXL: Excel file support
- React: User interface library
- Axios: HTTP client for API calls
- React Dropzone: File upload component
- CSS3: Modern styling with gradients and animations
- File Processing: Secure file upload with type validation
- Data Streaming: Efficient handling of large datasets
- Error Handling: Comprehensive error management
- Responsive Design: Mobile-first approach
- Performance: Optimized rendering and API calls
- Chrome (recommended)
- Firefox
- Safari
- Edge
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is for educational and professional use.
Made with β€οΈ for professional data analysis