Analysis of Forest Fires Dataset

Thu, Feb 1, 2018 2-minute read

In this mini project, I will use the Forest Fires data is available at UCI to perform a model and feature Selection task.

Response Variable and Predictors:

Response Variable: area which is the burned area in forest.

  • We see the original paper used this variable after log transformation since variable is very skewed towards 0.0. After fitting the models, the outputs were post-processed with the inverse of the ln(x+1) transform

Predictors: We need to assign dummy variables for categorical variables month and day.

The area variable before log(area+1) transformation:

AreaBeforeTransformation

The area variable after log(area+1) transformation:

AreaAfterTransformation

As we can see from the histograms, log transformation helps the area variable to spread out.

Model and Feature Selection Process:

I will also try predict the area variable via regression models.

  • First, I fit the data with all features to Random Forest Regression with pruned depth hyperparameters.
  • Then I will use to Lasso(L1 regularization) Regression and ElasticNet(L1+L2 regularization) Regression to select features. I will not use Ridge(L2 regularization) since it does not any exact zero weighted features.
  • As last step, I will fit the data to Random Forest Regression with pruned depth hyperparameters onto both features selected by Lasso and ElasticNet.

How to Run the Analysis

Clone the repo and follow the instructions on Forest Fire Analysis Project Repository.

Analysis Code can be found here

Please click here for the Project Report

Data Information

The Forest Fires data is available at UCI, to reach it please click here.

The citation to this data set:

[Cortez and Morais, 2007] P. Cortez and A. Morais. A Data Mining Approach to Predict Forest Fires using Meteorological Data. In J. Neves, M. F. Santos and J. Machado Eds., New Trends in Artificial Intelligence, Proceedings of the 13th EPIA 2007 - Portuguese Conference on Artificial Intelligence, December, GuimarĂ£es, Portugal, pp. 512-523, 2007. APPIA, ISBN-13 978-989-95618-0-9. Available at: http://www.dsi.uminho.pt/~pcortez/fires.pdf