Gtsummary

Intro

PLS-SEM의 분석과정에서 척도(측정변수와 잠재변수)의 신뢰도와 타당도를 확보하는 것은 매우 중요하며, 신뢰도와 타당도가 확보되지 않으면 모델 추정 결과가 의미가 없기 때문임
즉, 구조모델의 추정을 실행하려면 사전에 반드시 측정모델에 대한 평가과정을 통해 신뢰도와 타당도 확보 필요

I. 주요 개념

(1) 신뢰도

잠재변수의 측정에 있어서 얼마나 일관성이 있는가의 정도 의미
- 검사도구의 일관성을 말하며, 일관성이란 잠재변수를 여러 번에 걸쳐 측정했을 때 매번 같은 결과를 도출할 수 있는 정도.
- 내적 일관성 신뢰(Internal Consistency Reliability)로 평가

(2) 타당도

타당도의 기본 정의는 실제 측정하고자 하는 잠재변수를 정확하게 측정하고 있는 정도
- PLS-SEM에서는 집중타당도(Convergent Validity)와 판별타당도(Discriminant Validity)를 사용한다.
- 전자는 하나의 잠재변수를 측정하기 위해 사용되는 척도의 ��성항목들 간에 상관관계가 높아야 집중타당도가 있다고 볼 수 있고, 후자는 하나의 잠재변수와 다른 잠재변수간 상관관계가 낮을수록 판별 타당도가 높다고 판단함.

(3) PLS-SEM 분석 결과의 쳬계적인 평가 단계

반영적 측정모델: 내적 일관성 신뢰도, 집중타당도, 판별타당도
형성적 측정모델: 집중타당도, 다중공선성, 외부가중치와 외부적재치의 유의성과 적합성
구조모델의 평가기준: 다중공선성, 결정계수 $R^2$, 효과크기 $f^2$, 예측적 적합성 $Q^2$, 경로계수의 유의성과 적합성
PLS-SEM의 평가 단계: 제 1단계는 측정모델(Outer Model)을 평가하는 것이며, 제 2단계는 구조모델(Inner Model)을 평가하는 것이다.

II. 설문조사 데이터 분석

이제 설문지를 분석해본다.
필수 패키지를 확인한다.

library(readr) 
library(dplyr)
library(kableExtra) 
library(psy) # 신뢰도
library(corrplot) # 상관계수
library(psychometric) # 타당도

(1) 데이터 수집

먼저 수집된 설문조사 데이터를 확인한다.

data <- read_csv('data/thesis_mater.csv') %>% 
  distinct() %>% # 중복데이터 제거
  rename(Position = founder_employee, # 출력을 위한 변수명 정리
         Age = age_of_respondent, 
         Education = Education_Level) %>% 
  slice(-c(1:10)) %>% 
  dplyr::select(-c(Firm_Age:Business_Area))

data %>% 
  head() %>% 
  kable() %>% 
  kable_styling("striped") %>% 
  scroll_box(width = "100%")

EI_1	EI_2	EI_3	EP_1	EP_2	EP_3	ER_1	ER_2	ER_3	SS_1	SS_2	SS_3	SC_1	SC_2	SC_3	SR_1	SR_2	SR_3	F1	F2	F3	NF1	NF2	NF3	Firm_Age	Firm_Size	WE1	WE2	WE3	gender	founder_employee	age_of_respondent	Education_Level	Business_Area
2	3	4	3	3	4	3	2	4	1	1	3	3	3	3	2	2	1	2	2	3	3	1	3	5 years above	Above 15 members	No, I don't have experience	Yes	Yes	Female	Employee	30-39	Undergraduate School	Others
5	5	2	3	5	3	4	4	4	2	2	2	2	2	2	2	2	2	2	2	2	3	2	2	Less than 2 years	Less than 5 members	No, I don't have experience	No	Yes	Male	Employee	Younger than 30	Undergraduate School	Media and Entertainment
1	2	2	1	1	2	1	2	1	2	2	1	1	2	2	1	2	1	2	1	1	1	1	1	5 years above	Less than 5 members	As founder or employee, I have startup experiences more than 3 times	No	Yes	Female	Founder of Company	Younger than 30	Undergraduate School	Others
3	3	2	1	2	1	2	1	3	2	1	3	1	1	1	2	3	3	3	3	2	3	2	2	Less than 2 years	Less than 5 members	No, I don't have experience	Yes	Yes	Male	Employee	Younger than 30	Undergraduate School	Others
5	3	5	2	5	4	4	4	4	4	5	4	5	5	5	5	5	5	4	5	4	4	5	5	3-4 years	Less than 5 members	As founder or employee, I have startup experiences more than 3 times	No	Yes	Male	Founder of Company	30-39	Undergraduate School	Others
1	3	3	1	3	3	2	3	1	4	1	2	3	3	1	2	2	1	1	2	3	1	3	1	5 years above	5-9 members	As founder or employee, I have startup experience, one time	No	No	Female	Employee	Younger than 30	Undergraduate School	Others

(2) 상관관계 확인

각 척도(Item)에서의 상관관계를 확인해본다.

M <- cor(data)

corrplot(M, type="upper", order="hclust", 
         col=RColorBrewer::brewer.pal(n=8, name="RdBu"))

Intro

지난 시간에 설문조사 전처리에 대해 배웠다면 이번에는 경영/사회과학 논문에서 필수적으로 기재해야 하는 표본의 특성을 간단한 프로그램으로 요약하는 것을 코딩한다.

(1) 주요 패키지

이번 포스트부터 gt 패키지를 사용하려고 한다.
- gt: ggplot2와 같이 Table를 문법으로 컨트롤 할 수 있도록 구현된 패키지이다.
- kableExtra: HTML로 출력할 수 있도록 도와주는 패키지이다.

library(readr)
library(dplyr)
library(gt)
library(gtsummary)

I. 데이터 가져오기

우선 데이터를 불러온다.

data <- read_csv('data/thesis_mater.csv') %>% 
  distinct() %>% # 중복데이터 제거
  rename(Position = founder_employee, # 출력을 위한 변수명 정리
         Age = age_of_respondent, 
         Education = Education_Level)
glimpse(data %>% select(Firm_Age:Business_Area))

전체 34개의 변수 중에서, 문자열 관련 데이터만 추출하였다.
어떤 데이터를 표본의 특성으로 삼아야 할까?
- 위 10개의 데이터에는 통제변수¹가 들어가 있다.
- 통제변수는 표본의 특징이 아니기 때문에 통제변인을 제외한 나머지 변수들을 추출한다.

## Rows: 103
## Columns: 10
## $ Firm_Age      <chr> "5 years above", "Less than 2 years", "5 years above", …
## $ Firm_Size     <chr> "Above 15 members", "Less than 5 members", "Less than 5…
## $ WE1           <chr> "No, I don't have experience", "No, I don't have experi…
## $ WE2           <chr> "Yes", "No", "No", "Yes", "No", "No", "No", "No", "No",…
## $ WE3           <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "No", "…
## $ gender        <chr> "Female", "Male", "Female", "Male", "Male", "Female", "…
## $ Position      <chr> "Employee", "Employee", "Founder of Company", "Employee…
## $ Age           <chr> "30-39", "Younger than 30", "Younger than 30", "Younger…
## $ Education     <chr> "Undergraduate School", "Undergraduate School", "Underg…
## $ Business_Area <chr> "Others", "Media and Entertainment", "Others", "Others"…

표본의 특성을 기술하는 데이터는 아래와 같이 추출한다.
- gender, founder_employee, age_of_respondent, educational_level, business_area

data2 <- data %>% 
  select(gender, Position, Age, Education, Business_Area)

glimpse(data2)

## Rows: 103
## Columns: 5
## $ gender        <chr> "Female", "Male", "Female", "Male", "Male", "Female", "…
## $ Position      <chr> "Employee", "Employee", "Founder of Company", "Employee…
## $ Age           <chr> "30-39", "Younger than 30", "Younger than 30", "Younger…
## $ Education     <chr> "Undergraduate School", "Undergraduate School", "Underg…
## $ Business_Area <chr> "Others", "Media and Entertainment", "Others", "Others"…

II. 표본 특성 표 출력

보통 논문에 들어가는 표본의 특징은 Category, Frequency, and Percentage(%) 정도만 필요하다.
이 때, Table을 가공해줄 수 있는 gtsummary 패키지를 활용한다.

set_gtsummary_theme(theme_gtsummary_journal(journal = "jama"))

data2 %>% 
  tbl_summary(by = gender) %>% 
  add_overall() %>% 
  add_n() %>% 
  modify_header(label = "**Variable**") %>% # update the column header
  bold_labels()

Variable	N	Overall, N = 103	Female, N = 62¹	Male, N = 41¹
Position	103
Employee		68 (66)	35 (56)	33 (80)
Founder of Company		35 (34)	27 (44)	8 (20)
Age	103
30-39		37 (36)	19 (31)	18 (44)
40-49		8 (7.8)	4 (6.5)	4 (9.8)
50 or above		2 (1.9)	2 (3.2)	0 (0)
Younger than 30		56 (54)	37 (60)	19 (46)
Education	103
Graduate School		25 (24)	15 (24)	10 (24)
High School		7 (6.8)	6 (9.7)	1 (2.4)
Undergraduate School		71 (69)	41 (66)	30 (73)
Business_Area	103
E-Commerce		16 (16)	11 (18)	5 (12)
Education		4 (3.9)	2 (3.2)	2 (4.9)
Energy		1 (1.0)	0 (0)	1 (2.4)
Enterprise Services		4 (3.9)	2 (3.2)	2 (4.9)
Fintech		9 (8.7)	6 (9.7)	3 (7.3)
Logistics		5 (4.9)	1 (1.6)	4 (9.8)
Manufacturing		3 (2.9)	2 (3.2)	1 (2.4)
Media and Entertainment		7 (6.8)	4 (6.5)	3 (7.3)
Medical and Healthcare		1 (1.0)	1 (1.6)	0 (0)
Online to Offline Commerce		2 (1.9)	1 (1.6)	1 (2.4)
Others		45 (44)	31 (50)	14 (34)
Real Estate and Household		1 (1.0)	0 (0)	1 (2.4)
Transportation/Automotive		4 (3.9)	0 (0)	4 (9.8)
Travel		1 (1.0)	1 (1.6)	0 (0)
¹Statistics presented: n (%) Read more… Recent Posts Paper: Orca: The World is in Your Mind Tech Brief: AI Regulation Volatility Demands Adaptive Strategies from Data Scientists Paper: LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing Paper: Agentic Abstention: Do Agents Know When to Stop Instead of Act? Tech Brief: AI Augmentation Drives Headcount Growth, Reshaping Roles Across Industries Categories ACEA ADsP Airflow Airport Lounge Anaconda Android Studio ANOVA Anthropic Api Apps Script Astype Automation AWS AWS Toolkit Bar Chart Basic Data Transformation Basic Statistics Batch Batch File Big Data Analyst Exam BigQuery BigQuery Machine Learning BigQuery With Machine Learning BigQuery With Python Blog Blueprint Categorical Features Certification Chatbot Chocolatey ChromeDriver Class ColumnTransformer Compute Engine Conda Conditional Statements Crawling Crontab Cursor AI Dash Dashboard Data Analysis Data Collection Data Crawling Data Engineering Data Leakage Data Science Project Data Structures Data Transformation Data Visualisation Database Navigator Decision Tree Decorators Deep Learning Deployment Design Development Settings Dictionary Dictionary Comprehension Discrete Features Django Docker Docker-Compose Dotenv Dplyr EDA Elastic Elastic Beanstalk Elasticsearch Embedding Encoding Environment Variables ERD Exam Type 2 Exam Type 3 Excel Exploratory Data Analysis Factory Pattern Faker Feature Engineering Ffmpeg Figure Object File Download File I/O Financial Data Flask Flutter Folium Font Function GA4 GCE Generative AI GeoJson Git Github Actions Github Blog Github Portfoilo GitHub Portfolio Global Development Google Adsense Google Analytics Google Cloud Google Cloud Platform Google Colab Google Compute Engine Google Tag Manager GPU Grafana GROUP BY Heroku Hexo Histogram HTML Hugging Face Hugo IAM Image Preprocessing Interactive Graph Intermediate International Development Interview Iteration Java Jupyter Notebook Kaggle Kaggle API Kakao Chatbot Kakao Open Builder Kibana KoNLP Korean Font LangChain Library LightGBM Line Graph Line Grpah Linux Logstash LSTM M1 Mac Mac Machine Learning Machine Learning Project MacOS M1 Make_column_transformer Mask R-CNN MathJax Matplotlib Mecab Merging Data Missing Values MLFlow MLOps MS-SQL Multipages MySQL Naver API Naver News Crawling Neural Network Nginx Nha Trang NiFi Nohup Nonparametric Numeric Features Object Detection OCE One-Hot Encoding OOP Open Builder OpenAI OpenAPI OpenCV Openpyxl Optimizer Oracle Oracle VM Box Outlier Values Pandas Papers PgAdmin Pie Plot Pipeline Pivot Table Plot_tree Plotly PLS-SEM Portfolio PostgreSQL PowerShell Execution Policy Process Mining Programming Project Psycopg2 Py-Openaq PyCaret PyCharm PyDataset PyMySQL PySpark Python Python Crawling Python Portfolio QGIS R R Installation R Markdown Ranzcr RcppMeCab RDS Reading List Recommendation Regression Regular Expression Report Automation Reticulate Retrospective RPA Rtweet SageMaker Sales Scatter Plot Scikit-Learn Scrapy Seaborn Selenium SEM Settings Shiny Singapore Sklearn Spark SPSS SQL SQL Developer SQLAlchemy Sqlite SSH St-Pages Statistics Statsmodels Streamlit Supervised Learning Table Chart Tableau Tech TensorFlow Text Mining Text Preprocessing Timing Transformers Travel Tuber Ubuntu Ubuntu 18.04 UTM Vagrant VC Code Vcrs Vietnam VirtualBox Virtualenv Visual Studio Code VS Code VSCode Wandb Web Crawling Web Development Whisper Windows Windows11 WSL2 XGBoost YouTube Tags ACEA Across Datasets ADsP Aggregates AI Airflow Airport Lounge Anaconda AND Network Android Studio ANOVA Anthropic Api Append Apply Apps Script Area Plot ARRAY_AGG Astype Auto Encoder Automation AWS AWS Toolkit Bar Chart Basic Data Structures Basic Data Transformation Basic Statistics Batch Batch File Big Data Analyst Exam BigQuery BigQuery Machine Learning BigQuery With Machine Learning BigQuery With Python BigQuery With R Binary Classification Blog Blueprint Bubble Chart Categorical Features Certification Chatbot Chocolatey ChromeDriver Class Classification Clustering CNN Colab With Kaggle ColumnTransformer Combining Vectors Compute Engine Conda Conditional Expressions Conditional Statements Convolutional Neural Network Corona Corona Dashboard Crawling Crontab Cross Join CSS Cursor AI Dash Dashboard Data Analysis Data Collection Data Crawling Data Engineering Data Leakage Data Science Project Data Transformation Data Types Data Visualisation Database Navigator DataFrame Decision Tree Decorators Deep Learning Deep Learning Linear Regression Deployment Design Development Settings Dictionary Dictionary Comprehension Discrete Features Disk.frame Distinct Django Docker Docker-Compose Dotenv Dplyr EDA Elastic Elastic Beanstalk Elasticsearch Embedding EMR Encoding Enumerate Environment Variables ERD Exam Type 2 Exam Type 3 Excel Exploratory Data Analysis Extend Factory Pattern Faker Fashion MNIST Feature Engineering Ffmpeg Figure Object File Download File I/O Filter Financial Data Flask Flutter Folium Font For-Loop GA4 GCE Generative AI GeoJson Geospatial Analysis Gghistostats Git Github Actions Github Blog Github Portfoilo GitHub Portfolio Global Development Google Adsense Google Analytics Google Cloud Google Cloud Platform Google Colab Google Compute Engine Google Tag Manager GPU Grafana Group By GRU Gt Gtsummary Hadoop Having Heatmap Heroku Hexo Histogram HTML HTML5 Hugging Face Hugo IAM Image Augmentation Image Preprocessing Image Segmentation Image Settings Indexing Insert Interactive Graph Intermediate International Development Interview Iteration Iterrows Itertuples Java JOIN Jupyter Notebook Kaggle Kaggle API Kakao Chatbot Kakao Open Builder Kibana KoNLP Korean Font Label Encoding Lambda LangChain Leaflet Library LightGBM Line Graph Line Grpah Linear Regression Linux List Lists Lists and Tuples Logstash LSTM M1 Mac M5 Forecasting Mac Machine Learning Machine Learning Project MacOS M1 Make_column_transformer Map Chart Mask R-CNN Mathematical Functions Mathjax Settings Matplotlib Mecab Merging Data Missing Values MLFlow MLOps Modeling Visualisation Module MS-SQL Multipages Multiple Linear Regression Multiprocessing MySQL Natural Language Generation Naver API Naver News Crawling Neural Network Nginx Nha Trang NiFi Nohup Nonparametric Numeric Features NumPy Object Detection OCE One-Hot Encoding OOP Open Builder OpenAI OpenAPI OpenCV Openpyxl OR Network Oracle Oracle VM Box Outer Join Outlier Values Outliers Pandas Pandas Dataframe Pandas Excel Paper Review Pd.concat PgAdmin Pie Plot Pins Pipeline Pivot Table Plot_tree Plotly PLS-SEM Polar Chart Portfolio PostgreSQL PowerShell Execution Policy Probability Process Analysis Process Mining Productive-Box Psycopg2 Py-Openaq PyCaret PyCharm PyDataset PyMySQL PySpark Python Python Crawling Python Excel Python NumPy Broadcasting Python Portfolio QGIS R R Basic R Installation R Markdown Random Signoid Random Signoid Bias Ranzcr RcppMeCab RDS Reactivity Reading List Recommendation Recurrent Neural Network Regression Regular Expression Reticulate Retrospective RNN Round RPA RStudio Rtweet SageMaker Sales Scatter Plot Scikit-Learn Scrapy Seaborn Selenium SEM Sentiment Analysis Settings Shiny Shiny Dashboard Shiny Deployment Shiny HTML Shiny Layout Shiny Project Shinydashboard SimpleRNN Singapore Sklearn Sort_values Spark Spectra Show SPLIT SPSS SQL SQL Developer SQLAlchemy SQLite SSH St-Pages Stacked Plot Statistics Statsmodels Streamlit STRUCT SubQuery Super Resolution Swifter Table Chart Table Handling Tableau Tech Brief TensorFlow Tensorflow for R Text Text Mining Text Preprocessing Tidymodels Timing Transfer Learning Transformers Travel Tuber Types of Functions Ubuntu Ubuntu 18.04 UNNEST UTM Vagrant VC Code Vcrs Vector Vietnam VirtualBox Virtualenv Visual Studio Code VS Code VSCode Wandb Web Crawling Web Development Whisper Windows Windows11 WSL2 XGBoost XOR Network YouTube © 2026 DSChloe. Generated with Hugo and Mainroad theme.

Gtsummary

ch 13 - Reliability

Intro

I. 주요 개념

(1) 신뢰도

(2) 타당도

(3) PLS-SEM 분석 결과의 쳬계적인 평가 단계

II. 설문조사 데이터 분석

(1) 데이터 수집

(2) 상관관계 확인

ch 12 - Demographic of Respondent in R

Intro

(1) 주요 패키지

I. 데이터 가져오기

II. 표본 특성 표 출력