Data Science Practical

Python programming in Anaconda -> Jupyter Download Link


1. Find the below data set and perform the following operations: -

Dataset name: mtcars_DataDescription

a. Read the dataset link

b. Find the head of the dataset. link

c. Find the Datatype of Dataset (each column) link

d. From the given dataset ‘mtcars.csv’, plot a histogram to check the frequency distribution of

the variable ‘mpg’ (Miles per gallon) and find the highest frequency of interval. link

e. Which can be inferred from scatter plot of ‘mpg’ (Miles per gallon) vs ‘wt’ (Weight of car) from

the dataset mtcars.csv link


2. Find the below data set and perform the following operations:-

Dataset name: Churn_DataDescription

a. Find the no. of duplicate records in the churn dataframe based on the cutomerID column. link

b. In the churn dataframe, what are the total no. of missing values for the variable TotalCharges? link

c. From the churn dataframe, what is the average monthly charge paid by a customer for the

services he/she has signed up for? link

d. In the churn dataframe, under the variable Dependents how many records have “1@#" ? link

e. Find the data type of the variable tenure from the churn dataframe. link


3. Find the below data set and perform the following operations:-

Dataset name: Diamond_DataDescription

a. Plot a boxplot for “price” vs “cut” from the dataset “diamond.csv”. Which of the categories

under “cut” have the highest median price? link

b. Create a frequency table (one-way table) for the variable “cut” from the dataset

“diamond.csv”. What is the frequency for the cut type “Ideal”? link

c. Show the subplot of the diamond carat weight distribution. link

d. Show the subplot of diamond depth distribution. link

e. Build the Model using linear regression and find the accuracy. 


4. Use the dataset named “People Charm case.csv” that deals with HR analytics and answer the

following questions:-

a. Which of the variables have missing values? link

b. What is the third quartile value for the variable “lastEvaluvation”? link

c. Construct a Crosstable for the variables ‘dept’ and “salary” and find out which department has

highest frequency value in the category low salary. link

d. Generate a boxplot for the variable “numberOfProjects” and get the median value for the

number of projects where the employees have worked on. link

e. Plot a histogram using the variable “avgMonthlyHours” and find the range in which the

number of employees worked for 150 hours per month? link

f. Generate a boxplot for the variables “lastEvaluation” and “numberOfProjects”. link


5. Use the dataset named “People Charm case.csv” that deals with HR analytics and answer the

following questions: -

1. Build a Logistic Regression model using all the variables. Use 75% of the data as the training

set and fix the random state as 2. The accuracy score for the predicted model is?

2. Build a Logistic Regression model using all the variables. Use 75% of the data as the training

set and fix the random state as 2 and find out how many samples are misclassified?

3. Build a k-Nearest Neighbors model using all the variables. Use 75% of the data as the training

set, fix the random state as 0 and the k value as 2. The accuracy score for the predicted model

is?


6. Problem Description:

Data from an online microlending platform has been collected. This data contains details of the

purpose for which the loans would be used and how the loan is funded. Additional information

on the country of loan recipient and the poverty levels of the country are also given.

It is to be seen whether a loan would be funded or not based on the available data.

Read the given data “lendingdata.csv” and save it as a dataframe called data, and answer the

questions below: -

1. How many columns are of ‘object’ data type? link

2. Find the total number of missing values in the data set? link

3. Identify which of the columns contain redundant information and can be dropped from the

dataframe.

4. What is the third quartile value of the variable “loan_amount”? link

5. What is the percentage split of the different categories in the column “repayment_interval”

after dropping the missing values? link

6. What is the minimum loan amount disbursed in the agriculture sector? link

Reference video


7. Identify what the web page is about using NLTK in Python. Reference link


8. Detecting the Spam or Ham using the NLP Programming. Reference link


9. Simple application of sentiment analysis using natural language processing techniques. Reference link



Numpy List

1. Create an array using Numpy. link

2. Create more than one dimensions array using Numpy. link

3. Create minimum dimensions array using Numpy. link

4. Check the data type of following array using Numpy. link

type1 = np.array([1, 2, 3, 4, 5, 6])
type2 = np.array([1.5, 2.5, 0.5, 6])
type3 = np.array(['a', 'b', 'c'])
type4 = np.array(["Canada", "Australia"], dtype='U5')
type5 = np.array([555, 666], dtype=float)

5. Check the following array shape using Numpy. link

array1d = np.array([1, 2, 3, 4, 5, 6])

array2d = np.array([[1, 2, 3], [4, 5, 6]])

array3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

6. Use the ndim method to determine the dimension of NumPy array. link

7. Use the resize and reshape method on Numpy array. link

8. Create the Program to Transform List or Tuple into NumPy array. link

9. Perform the following Indexing Operations using Numpy array.

array1d = np.array([1, 2, 3, 4, 5, 6]) link

1. Get first value

2. Get last value

3. Get 4th value from first

4. Get 5th value from last

5. Get multiple values

10. Perform the following Indexing Operations using Numpy array.

array2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) link

1. Get first row first col

2. Get first row second col

3. Get first row second col

4. Get second row second col

11. Perform the following Indexing Operations using Numpy array.

array3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]) link

12. Perform the following Single Dimensional Slicing Operations using Numpy array.

array1d = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) link

1. from index 4 to last index

2. From index 0 to 4 index

3. From index 4(included) up to index 7(excluded)

4. Excluded last element

5. Up to second last index(negative index)

6. From last to first in reverse order(negative step)

7. All odd numbers in reversed order

8. All even numbers in reversed order

9. All elements

13. Perform the following Multidimensional Dimensional Slicing Operations using Numpy

array.

array2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) link

1. 2nd and 3rd col

2. 2nd and 3rd row

3. Reverse an array

14. Perform the following operations to Manipulating the Dimensions and the Shape of

Arrays(Flips the order of the Axes)

array2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) link

1. Permute the dimensions of an array

2. Flip array in the left/right direction

3. Flip array in the up/down direction

4. Rotate an array by 90 degrees in the plane specified by axes

15. Perform the following operations to Manipulating the Dimensions and the Shape of

Arrays(Joining and Stacking) link

array1 = np.array([[1, 2, 3], [4, 5, 6]])

array2 = np.array([[7, 8, 9], [10, 11, 12]])

1. Stack arrays in sequence horizontally (column wise).

2. Stack arrays in sequence vertically (row wise)

3. Stack arrays in sequence depth wise (along third axis)

4. Appending arrays after each other, along a given axis

5. Append values to the end of an array

16. Perform the following Arithmetic Operations using Numpy Array.

array1 = np.array([[1, 2, 3], [4, 5, 6]])

array2 = np.array([[7, 8, 9], [10, 11, 12]])  link

1. array1 + array2

2. array1 - array2

3. array1 * array2

4. array2 / array1

5. array1 ** array2

17. Perform the following Scalar Arithmetic Operations using Numpy Array.

array1 = np.array([[10, 20, 30], [40, 50, 60]]) link

1. array1 + 2

2. array1 – 5

3. array1 * 2

4. array1 / 5

5. array1 ** 2


18. Perform the following Elementary Mathematical Functions using Numpy Array.

array1 = np.array([[10, 20, 30], [40, 50, 60]]) link

1. sin(array1)

2. cos(array1)

3. tan(array1)

4. sqrt(array1)

5. exp(array1)

6. log10(array1)

19. Perform the following Element-wise Mathematical Operations using Numpy Array.

array1 = np.array([[10, 20, 30], [40, 50, 60]])

array2 = np.array([[2, 3, 4], [4, 6, 8]])

array3 = np.array([[-2, 3.5, -4], [4.05, -6, 8]]) link

1. Addition of array1 and array2

2. Multiplication of array1 and array2

3. Power of array1 and array2

20. Perform the following Aggregate and Statistical Functions using Numpy Array.

array1 = np.array([[10, 20, 30], [40, 50, 60]]) link

1. Mean

2. Standard deviation

3. Variance

4. Sum of array elements

5. Product of array elements

21. Use the Where(), Select() and Choose() function to identify the element is less than 4,

mul by 2 else by 3.

np.array([[1, 2, 3], [4, 5, 6]])

22. Perform the following Logical Operations using Numpy Array.

thearray = np.array([[10, 20, 30], [14, 24, 36]]) link

1. logical_or(Condition array<10, array>15)

2. logical_and(Condition array<10, array>15)

3. logical_not(Condition array<20)

23. Perform the following Standard Set Operations using Numpy Array.

array1 = np.array([[10, 20, 30], [14, 24, 36]])

array2 = np.array([[20, 40, 50], [24, 34, 46]]) link

1. Find the union of two arrays

2. Find the intersection of two arrays

3. Find the set difference of two arrays