Take Specific Sheet in Csv File R Read,csv
Data could be in various formats. For each format R has a specific part and argument. This tutorial explains how to import data to R.
In this tutorial, you will learn
- Read CSV
- Read Excel files
- readxl_example()
- read_excel()
- excel_sheets()
- Import data from other Statistical software
- Read sas
- Read STATA
- Read SPSS
- All-time practices for Data Import
Read CSV
One of the near widely data store is the .csv (comma-separated values) file formats. R loads an array of libraries during the kickoff-up, including the utils packet. This bundle is convenient to open csv files combined with the reading.csv() function. Here is the syntax for read.csv
read.csv(file, header = Truthful, sep = ",")
Statement:
- file: PATH where the file is stored
- header: ostend if the file has a header or not, by default, the header is gear up to Truthful
- sep: the symbol used to split up the variable. By default, `,`.
We volition read the information file proper noun mtcats. The csv file is stored online. If your .csv file is stored locally, y'all can replace the PATH inside the code snippet. Don't forget to wrap it inside ' '. The PATH needs to be a string value.
For mac user, the path for the download folder is:
"/Users/USERNAME/Downloads/FILENAME.csv"
For windows user:
"C:\Users\USERNAME\Downloads\FILENAME.csv"
Note that, we should always specify the extension of the file name.
- .csv
- .xlsx
- .txt
- …
PATH <- 'https://raw.githubusercontent.com/guru99-edu/R-Programming/main/mtcars.csv' df <- read.csv(PATH, header = True, sep = ',') length(df)
Output:
## [1] 12
class(df$10)
Output:
## [one] "factor"
R, by default, returns character values every bit Factor. We can turn off this setting by adding stringsAsFactors = False.
PATH <- 'https://raw.githubusercontent.com/guru99-edu/R-Programming/master/mtcars.csv' df <-read.csv(PATH, header =Truthful, sep = ',', stringsAsFactors =FALSE) class(df$X)
Output:
## [1] "character"
The class for the variable X is at present a character.
Read Excel files
Excel files are very pop amid data analysts. Spreadsheets are piece of cake to work with and flexible. R is equipped with a library readxl to import Excel spreadsheet.
Use this code
require(readxl)
to check if readxl is installed in your machine. If you install r with r-conda-essential, the library is already installed. You should see in the command window:
Output:
Loading required packet: readxl.
If the package does not leave, you can install it with the conda library or in the final, use conda install -c mittner r-readxl.
Use the following command to load the library to import excel files.
library(readxl)
readxl_example()
We use the examples included in the package readxl during this tutorial.
Use lawmaking
readxl_example()
to see all the available spreadsheets in the library.
To check the location of the spreadsheet named clippy.xls, simple use
readxl_example("geometry.xls")
If you install R with conda, the spreadsheets are located in Anaconda3/lib/R/library/readxl/extdata/filename.xls
read_excel()
The office read_excel() is of great utilize when it comes to opening xls and xlsx extention.
The syntax is:
read_excel(PATH, sheet = NULL, range= Zero, col_names = Truthful) arguments: -PATH: Path where the excel is located -sheet: Select the sheet to import. Past default, all -range: Select the range to import. By default, all not-null cells -col_names: Select the columns to import. By default, all non-goose egg columns
Nosotros can import the spreadsheets from the readxl library and count the number of columns in the first sheet.
# Shop the path of `datasets.xlsx` example <- readxl_example("datasets.xlsx") # Import the spreadsheet df <- read_excel(example) # Count the number of columns length(df)
Output:
## [1] five
excel_sheets()
The file datasets.xlsx is composed of 4 sheets. Nosotros tin can find out which sheets are available in the workbook by using excel_sheets() role
example <- readxl_example("datasets.xlsx") excel_sheets(example)
Output:
[i] "iris" "mtcars" "chickwts" "quakes"
If a worksheet includes many sheets, information technology is like shooting fish in a barrel to select a particular sheet by using the sail arguments. We can specify the name of the sheet or the sheet alphabetize. We can verify if both function returns the same output with identical().
instance <- readxl_example("datasets.xlsx") quake <- read_excel(example, sheet = "quakes") quake_1 <-read_excel(case, sheet = 4) identical(quake, quake_1)
Output:
## [1] TRUE
We can control what cells to read in 2 ways
- Utilize n_max argument to render northward rows
- Use range statement combined with cell_rows or cell_cols
For case, we set up n_max equals to 5 to import the offset five rows.
# Read the first five row: with header iris <-read_excel(example, n_max =5, col_names =TRUE)
If we change col_names to Imitation, R creates the headers automatically.
# Read the first v row: without header iris_no_header <-read_excel(example, n_max =v, col_names =Imitation)
iris_no_header
In the data frame iris_no_header, R created v new variables named X__1, X__2, X__3, X__4 and X__5
We can besides employ the argument range to select rows and columns in the spreadsheet. In the lawmaking below, we use the excel style to select the range A1 to B5.
# Read rows A1 to B5 example_1 <-read_excel(example, range = "A1:B5", col_names =True) dim(example_1)
Output:
## [i] four 2
Nosotros can see that the example_1 returns 4 rows with 2 columns. The dataset has header, that the reason the dimension is four×2.
In the second instance, nosotros use the role cell_rows() which controls the range of rows to return. If we desire to import the rows 1 to five, nosotros can ready cell_rows(1:5). Note that, cell_rows(1:5) returns the aforementioned output as cell_rows(five:1).
# Read rows one to 5 example_2 <-read_excel(instance, range =cell_rows(1:5),col_names =True) dim(example_2)
Output:
## [1] 4 5
The example_2 however is a 4×five matrix. The iris dataset has 5 columns with header. Nosotros return the beginning four rows with header of all columns
In case we desire to import rows which do non begin at the start row, we have to include col_names = Faux. If we utilise range = cell_rows(2:5), it becomes obvious our data frame does not have header anymore.
iris_row_with_header <-read_excel(example, range =cell_rows(2:3), col_names=TRUE) iris_row_no_header <-read_excel(example, range =cell_rows(2:3),col_names =FALSE)
We can select the columns with the alphabetic character, similar in Excel. # Select columns A and B col <-read_excel(example, range =cell_cols("A:B")) dim(col)
Output:
## [1] 150 2
Annotation : range = cell_cols("A:B"), returns output all cells with not-null value. The dataset contains 150 rows, therefore, read_excel() returns rows up to 150. This is verified with the dim() role.
read_excel() returns NA when a symbol without numerical value appears in the cell. Nosotros can count the number of missing values with the combination of ii functions
- sum
- is.na
Here is the code
iris_na <-read_excel(example, na ="setosa") sum(is.na(iris_na))
Output:
## [i] fifty
Nosotros have 50 values missing, which are the rows belonging to the setosa species.
Import information from other Statistical software
We will import dissimilar files format with the heaven parcel. This package back up SAS, STATA and SPSS softwares. Nosotros can use the following part to open up different types of dataset, according to the extension of the file:
- SAS: read_sas()
- STATA: read_dta() (or read_stata(), which are identical)
- SPSS: read_sav() or read_por(). We demand to check the extension
Only one argument is required inside these function. We need to know the PATH where the file is stored. That'due south it, we are prepare to open all the files from SAS, STATA and SPSS. These three function accepts an URL as well.
library(haven)
haven comes with conda r-essential otherwise go to the link or in the terminal conda install -c conda-forge r-haven
Read sas
For our example, we are going to use the access dataset from IDRE.
PATH_sas <- 'https://github.com/guru99-edu/R-Programming/hulk/master/binary.sas7bdat?raw=truthful' df <- read_sas(PATH_sas) head(df)
Output:
## # A tibble: 6 10 4 ## Acknowledge GRE GPA RANK ## <dbl> <dbl> <dbl> <dbl> ## one 0 380 3.61 three ## 2 one 660 three.67 3 ## iii 1 800 four.00 one ## iv one 640 3.19 4 ## five 0 520 ii.93 four ## six 1 760 3.00 2
Read STATA
For STATA data files y'all can use read_dta(). We use exactly the aforementioned dataset but shop in .dta file.
PATH_stata <- 'https://github.com/guru99-edu/R-Programming/blob/principal/binary.dta?raw=true' df <- read_dta(PATH_stata) caput(df)
Output:
## # A tibble: half dozen 10 4 ## admit gre gpa rank ## <dbl> <dbl> <dbl> <dbl> ## 1 0 380 iii.61 3 ## ii 1 660 three.67 3 ## 3 1 800 iv.00 one ## 4 one 640 3.19 four ## 5 0 520 two.93 iv ## 6 1 760 iii.00 ii
Read SPSS
We use the read_sav()function to open a SPSS file. The file extension ".sav"
PATH_spss <- 'https://github.com/guru99-edu/R-Programming/blob/master/binary.sav?raw=truthful' df <- read_sav(PATH_spss) head(df)
Output:
## # A tibble: 6 x iv ## admit gre gpa rank ## <dbl> <dbl> <dbl> <dbl> ## 1 0 380 3.61 3 ## 2 ane 660 3.67 3 ## iii one 800 4.00 1 ## 4 1 640 three.19 4 ## 5 0 520 two.93 4 ## vi i 760 three.00 two
Best practices for Information Import
When we want to import information into R, it is useful to implement following checklist. Information technology volition brand it like shooting fish in a barrel to import data correctly into R:
- The typical format for a spreadsheet is to use the first rows as the header (usually variables proper name).
- Avoid to name a dataset with blank spaces; it tin lead to interpreting as a separate variable. Alternatively, prefer to utilise '_' or '-.'
- Brusk names are preferred
- Practise non include symbol in the name: i.east: exchange_rate_$_€ is not right. Prefer to proper name information technology: exchange_rate_dollar_euro
- Utilise NA for missing values otherwise; we need to clean the format later.
Summary
Following table summarizes the part to employ in gild to import dissimilar types of file in R. The cavalcade one states the library related to the function. The concluding cavalcade refers to the default argument.
Library | Objective | Function | Default Arguments |
---|---|---|---|
utils | Read CSV file | read.csv() | file, header =,True, sep = "," |
readxl | Read EXCEL file | read_excel() | path, range = Cipher, col_names = Truthful |
haven | Read SAS file | read_sas() | path |
haven | Read STATA file | read_stata() | path |
haven | Read SPSS fille | read_sav() | path |
Following table shows the different ways to import a selection with read_excel() office.
Part | Objective | Arguments |
---|---|---|
read_excel() | Read n number of rows | n_max = 10 |
Select rows and columns similar in excel | range = "A1:D10" | |
Select rows with indexes | range= cell_rows(1:3) | |
Select columns with letters | range = cell_cols("A:C") |
Source: https://www.guru99.com/r-import-data.html
0 Response to "Take Specific Sheet in Csv File R Read,csv"
Postar um comentário