Regression Model In R Programming Regression Model In R Programming Sachin Sharma 10/7/2021 library(tidyverse) ## — Attaching packages ————————————— tidyverse 1.3.1 — ## v ggplot2 3.3.5 v purrr 0.3.4 ## v tibble 3.1.4 v dplyr 1.0.7 ## v tidyr 1.1.3 v stringr 1.4.0 ## v readr 2.0.1 v forcats 0.5.1 ## — Conflicts —————————————— tidyverse_conflicts() — ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() library(ggplot2) library(naniar) library(dplyr) library(datasets) library(tinytex) library(DT) data("mtcars") head(mtcars) ## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 Transform certain variables into factors mtcars$cyl <- factor(mtcars$cyl) mtcars$am <- factor(mtcars$am,labels=c("Automatic","Manual")) mtcars$vs <- factor(mtcars$vs) mtcars$gear <- factor(mtcars$gear) mtcars$carb <- factor(mtcars$carb) boxplot(mpg ~ am, data = mtcars, col = (c("purple","red")), ylab = "Miles Per Gallon", xlab = "Type of Transmission", main = "MPG Vs AM") aggregate(mpg~am, data = mtcars, mean) ## am mpg ## 1 Automatic 17.14737 ## 2 Manual 24.39231 Difference of MPG between Automatic and Manual 24.39231 – 17.14737 ## [1] 7.24494 Therefore, we can see that the Manual cars have an MPG of 7.245 (approx.) more than automatic cars We can now use a t-test here automatic_car <- mtcars[mtcars$am == "Automatic",] manual_car <- mtcars[mtcars$am == "Manual",] t.test(automatic_car$mpg, manual_car$mpg) ## ## Welch Two Sample t-test ## ## data: automatic_car$mpg and manual_car$mpg ## t = -3.7671, df = 18.332, p-value = 0.001374 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -11.280194 -3.209684 ## sample estimates: ## mean of x mean of y ## 17.14737 24.39231 We can see that the p-value is 0.001374, thus we can state this is a significant difference. Now to quantify this, we can use the following code : model_1 <- lm(mpg ~ am, data = mtcars) summary(model_1) ## ## Call: ## lm(formula = mpg ~ am, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -9.3923 -3.0923 -0.2974 3.2439 9.5077 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 17.147 1.125 15.247 1.13e-15 *** ## amManual 7.245 1.764 4.106 0.000285 *** ## — ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.902 on 30 degrees of freedom ## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385 ## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285 Lets see with the help of corrplot , to check the correlation among the variables with mpg. Before plotting the corrplot, we will check the structure of the data ; df_1 <- subset(mtcars, select = c(mpg,cyl,disp,hp,drat,wt,qsec,vs)) head(df_1) ## mpg cyl disp hp drat wt qsec vs ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 ## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 ## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 ## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 str(df_1) ## 'data.frame': 32 obs. of 8 variables: ## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 … ## $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 … ## $ disp: num 160 160 108 258 360 … ## $ hp : num 110 110 93 110 175 105 245 62 95 123 … ## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 … ## $ wt : num 2.62 2.88 2.32 3.21 3.44 … ## $ qsec: num 16.5 17 18.6 19.4 17 … ## $ vs : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 … Here we can see that, cyl and vs columns are in factor, we will now convert this into numeric to plot corrplot and check the correlation. df_1$cyl <- as.character(df_1$cyl) df_1$cyl <- as.numeric(df_1$cyl) df_1$vs <- as.character(df_1$vs) df_1$vs <- as.numeric(df_1$vs) # Now we can check the structure of the data again str(df_1) ## 'data.frame': 32 obs. of 8 variables: ## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 … ## $ cyl : num 6 6 4 6 8 6 8 4 4 6 … ## $ disp: num 160 160 108 258 360 … ## $ hp : num 110 110 93 110 175 105 245 62 95 123 … ## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 … ## $ wt : num 2.62 2.88 2.32 3.21 3.44 … ## $ qsec: num 16.5 17 18.6 19.4 17 … ## $ vs : num 0 0 1 1 0 1 0 1 1 1 … Now we can see that all the columns are in numeric, now we can plot wit the help of ggcorrplot and corrplot to check the correlation : library(ggcorrplot) r <- cor(df_1) ggcorrplot(r,method = "circle", type = c("upper"), legend.title = "Corrplot MTCARS") ## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = ## "none")` instead. library(corrplot) ## corrplot 0.90 loaded r <- cor(df_1) corrplot(r, method = "circle") model_2 <- lm(mpg~am + cyl + disp + hp + wt, data = mtcars) anova(model_1, model_2) ## Analysis of Variance Table ## ## Model 1: mpg ~ am ## Model 2: mpg ~ am + cyl + disp + hp + wt ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 30 720.90 ## 2
DPLYR Tutorial – Data Manipulation using DPLYR Package in R Programming
DPLYR-TUTORIAL—Data-Manipulation-with-DPLYR-in-R- DPLYR-TUTORIAL—Data-Manipulation-with-DPLYR-in-R- Sachin Sharma August 31, 2021 Why to use dplyr ? It is really useful for data exploration and transformation Fast while dealing with data frames Functionality of dplyr It is useful while dealing with : ‘select’,‘filter’, ‘mutate’ , ’ arrange’ , ‘summarise’, which can be use as five basic verbs Can be very useful in handling inner joins, left join, semi-join, anti-join # loading packages library(dplyr) ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## filter, lag ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union library(datasets) #install.packages("hflights") library(hflights) # Lets explore data data("hflights") head(hflights) ## Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier FlightNum ## 5424 2011 1 1 6 1400 1500 AA 428 ## 5425 2011 1 2 7 1401 1501 AA 428 ## 5426 2011 1 3 1 1352 1502 AA 428 ## 5427 2011 1 4 2 1403 1513 AA 428 ## 5428 2011 1 5 3 1405 1507 AA 428 ## 5429 2011 1 6 4 1359 1503 AA 428 ## TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin Dest Distance ## 5424 N576AA 60 40 -10 0 IAH DFW 224 ## 5425 N557AA 60 45 -9 1 IAH DFW 224 ## 5426 N541AA 70 48 -8 -8 IAH DFW 224 ## 5427 N403AA 70 39 3 3 IAH DFW 224 ## 5428 N492AA 62 44 -3 5 IAH DFW 224 ## 5429 N262AA 64 45 -7 -1 IAH DFW 224 ## TaxiIn TaxiOut Cancelled CancellationCode Diverted ## 5424 7 13 0 0 ## 5425 6 9 0 0 ## 5426 5 17 0 0 ## 5427 9 22 0 0 ## 5428 9 9 0 0 ## 5429 6 13 0 0 ‘as_tibble’ creates a “a local data frame” Tibble data frame will print first ten rows and total columns which fit to the screen in a systematic manner in comparison to raw data # Convert to tibble flights <- as_tibble(hflights) flights ## # A tibble: 227,496 x 21 ## Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier FlightNum ## <int> <int> <int> <int> <int> <int> <chr> <int> ## 1 2011 1 1 6 1400 1500 AA 428 ## 2 2011 1 2 7 1401 1501 AA 428 ## 3 2011 1 3 1 1352 1502 AA 428 ## 4 2011 1 4 2 1403 1513 AA 428 ## 5 2011 1 5 3 1405 1507 AA 428 ## 6 2011 1 6 4 1359 1503 AA 428 ## 7 2011 1 7 5 1359 1509 AA 428 ## 8 2011 1 8 6 1355 1454 AA 428 ## 9 2011 1 9 7 1443 1554 AA 428 ## 10 2011 1 10 1 1443 1553 AA 428 ## # … with 227,486 more rows, and 13 more variables: TailNum <chr>, ## # ActualElapsedTime <int>, AirTime <int>, ArrDelay <int>, DepDelay <int>, ## # Origin <chr>, Dest <chr>, Distance <int>, TaxiIn <int>, TaxiOut <int>, ## # Cancelled <int>, CancellationCode <chr>, Diverted <int> Lets use filter to understand it , If we want to view all flights on February 1 we can use the following two methods METHOD – I flights[flights$Month ==2 & flights$DayofMonth ==1, ] ## # A tibble: 577 x 21 ## Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier FlightNum ## <int> <int> <int> <int> <int> <int> <chr> <int> ## 1 2011 2 1 2 1401 1539 AA 428 ## 2 2011 2 1 2 NA NA AA 460 ## 3 2011 2 1 2 NA NA AA 533 ## 4 2011 2 1 2 NA NA AA 1121 ## 5 2011 2 1 2 1746 2109 AA 1294 ## 6 2011 2 1 2 NA NA AA 1436 ## 7 2011 2 1 2 1032 1358 AA 1700 ## 8 2011 2 1 2 NA NA AA 1820 ## 9 2011 2 1 2 558 912 AA 1994 ## 10 2011 2 1 2 1820 2112 AS 731 ## # … with 567 more rows, and 13 more variables: TailNum <chr>, ## # ActualElapsedTime <int>, AirTime <int>, ArrDelay <int>, DepDelay <int>, ## # Origin <chr>, Dest <chr>, Distance <int>, TaxiIn <int>, TaxiOut <int>, ## # Cancelled <int>, CancellationCode <chr>, Diverted <int> METHOD – II Using Filter filter(flights, flights$Month ==2 , flights$DayofMonth ==1) ## # A tibble: 577 x 21 ## Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier FlightNum ## <int> <int> <int> <int> <int> <int> <chr> <int> ## 1 2011 2 1 2 1401 1539 AA 428 ## 2 2011 2 1 2 NA NA AA 460 ## 3 2011 2 1 2 NA NA AA 533 ## 4 2011 2 1 2 NA NA AA 1121 ## 5 2011 2 1 2 1746 2109 AA 1294 ## 6 2011 2 1 2 NA NA AA 1436 ## 7 2011 2 1 2 1032 1358 AA 1700 ## 8 2011 2 1 2 NA NA AA 1820 ## 9 2011 2 1 2 558 912 AA 1994 ## 10 2011 2 1 2 1820 2112 AS 731 ## # … with 567 more rows, and 13 more variables: TailNum <chr>, ## # ActualElapsedTime <int>, AirTime <int>, ArrDelay <int>, DepDelay <int>, ## # Origin <chr>, Dest <chr>, Distance <int>, TaxiIn <int>, TaxiOut <int>, ## # Cancelled <int>, CancellationCode <chr>, Diverted <int> If we want to use AND that means applying condition on two attributes filter(flights, UniqueCarrier =="AA" | UniqueCarrier == "UA" ) ## # A tibble: 5,316 x 21 ## Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier FlightNum ## <int> <int> <int> <int> <int> <int> <chr> <int> ## 1 2011 1 1 6 1400 1500 AA 428 ## 2 2011 1 2 7 1401 1501 AA 428 ## 3 2011 1 3 1 1352 1502 AA 428 ## 4 2011 1 4 2 1403 1513 AA 428 ## 5 2011 1 5 3 1405 1507 AA 428 ## 6 2011 1 6 4 1359 1503 AA 428 ## 7 2011 1 7 5 1359 1509 AA 428 ## 8 2011 1 8 6 1355
MID TERM ICSE CLASS X
[watupro 45]
CLASS XI – TERM 1 – TEST -2 (CBSE)
[watupro 44]
CLASS XI – MID TERM – MOCK TEST – I (CBSE)
[watupro 43]
Class X – Arithmetic Progression
[watupro 42]
CLASS XI – ARITHMETIC PROGRESSION -CBSE LEVEL
[watupro 41]
CLASS XI – RELATIONS AND FUNCTIONS
[watupro 40]
mock test sat math level 2
mock test sat math level 2
calculus practice questions sat math level 2
calculus practice questions sat math level 2