Compiling Reproducible Research is an essential step in the Data Science process. The aim is to make your analytic data and code available so that others may reproduce the same findings. This is necessary to provide scientific evidence of your research outcomes.

Here are some of the researches with R language produced by Mohammed Barakat during his study in the Data Science track. You can download pdf copies of the researches or visit his RPubs page where you can read and reproduce them.

Visit the web version of the research in RPubs

House Prices: Ensembled Prediction Models

Mohammed K. Barakat
March 24, 2018

Ask a home buyer to describe their dream house, and they probably won’t begin with the height of the basement ceiling or the proximity to an east-west railroad. This Kaggle competition’s dataset proves that there are many more house features that influence price negotiations than the number of bedrooms or a white-picket fence.

With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this research tries to predict the final price of each home.

The Ames Housing dataset was compiled by Dean De Cock for use in data science education. It’s an incredible alternative for data scientists looking for a modernized and expanded version of the often-cited Boston Housing dataset.

Titanic Tragedy: Survival Prediction with Machine Learning

Mohammed K. Barakat
February 9, 2018

The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships.

One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.

This research tries to answer the question of “what sorts of people were likely to survive”. Using Machine Learning tools and techniques the research predicts which passengers survived the tragedy.

More about the RMS Titanic tragedy is available in Wikipedia

Visit the web version of the research in RPubs

Predicting the Usefulness of a Yelp Review Using Machine Learning

Author: Mohammed K. Barakat
November 8, 2015

For many purchased products or offered services there is usually a way to reflect on the customer’s experience using online review. Yelp’s website is a place that collects such reviews of various businesses.

This research tries to answer the question *“Can we predict to what extent a user’s review for a business is useful by predicting the number of useful votes the review will receive and by predicting a “usefulness” category of the review?“* The prediction algorithm is based on the user’s profile and on quantitative features of the review she/he writes. This analysis is expected to be of interest to yelp.com, yelpers, and businesses as it helps exploit potential useful reviews as soon as they are posted to improve businesses and provide quicker recommendations to potential customers.

Download the research paper or visit the web version in RPubs

Using Machine Learning to Recognize the Quality of Performing Weightlifting Exercise

Mohammed K. Barakat
August 18, 2015

This Human Activity Recognition analysis is focused on recognizing the quality of performing weightlifting exercises. The approach used aims at investigating “how (well)” an activity is performed by the participant.

Six young participants were asked to perform one set of 10 repetitions of the Unilateral Dumbbell Biceps Curl in five different fashions: according to specification (Class A), throwing the elbows to the front (Class B), lifting the dumbbell only halfway (Class C), lowering the dumbbell only halfway (Class D), and throwing the hips to the front (Class E). Class A corresponds to the specified (ideal) execution of the exercise, while the other 4 classes correspond to common mistakes.

Using the classe variable as outcome, and some other variables in the training dataset as predictors the goal of this Machine Learning analysis is to predict the manner in which the participants did the exercise.

Download the research paper or visit the web version in RPubs

Statistical Evidence on Manual versus Automatic Cars for better Fuel Consumption

Mohammed K. Barakat
July 31, 2015

By analyzing a dataset of a collection of cars (mtcars), this study explores the relationship between miles per gallon (MPG) feature and a set of other car features. We are particularly interested in finding out if automatic or manual transmission is better for MPG. The study proves preference of transmission type and quantifies the difference.

The study uses the mtcars dataset and employs several statistical techniques to reach a robust conclusion. In summary, the study concluded that using manual-transmission cars is better than automatic for MPG. Besides, MPG has a statistically significant relationship with car weight and quarter mile time (acceleration).

Download the research paper or visit the web version in RPubs

Using the Length of Odontoblasts (Teeth Cells) of Guinea Pigs as Indicator of Vitamin C Intake

by: Mohammed K. Barakat
July 1, 2015

This report presents a study of the effect of Vitamin C intake on the odontoblasts (teeth cells) length carried out on 60 guinea pigs. The response (cell length) is measured at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).

Download the research paper or visit the web version in RPubs

Exponential Distribution and the Central Limit Theorem: A Simulation in R

by: Mohammed K. Barakat
July 1, 2015

This analysis investigates the exponential distribution and how it relates to the Central Limit Theorem. The analysis is performed using R-language where a simulation is done to illustrate the properties of the distribution of the means of 40 exponentials.

Download the research paper or visit the web version in RPubs

Damage to Public Health and Economy in the USA was Highly Effected by Only Few Storm Types between 1950 and 2011

by: Mohammed K. Barakat

Storm events can cause both public health and economic problems for communities. Many severe events can result in fatalities, injuries, and property damage.

Our hypothesis is that only few storm types highly contributed to the overall health and economic damage in the United States recorded between 1950 and 2011. So, we will explore the US National Oceanic and Atmospheric Administration’s (NOAA) storm database and use the Pareto principle to prove our hypothesis.

From this database we found that there is one indisputable type that was most harmful to public health. Whereas economic consequences were highly affected by three specific types.

Visit the web version of the research in RPubs