Statistical Analysis for Dataset

Closed - This job posting has been filled and work has been completed.
Data Science & Analytics Quantitative Analysis Posted 3 years ago

Fixed Price

Delivery by April 4, 2013




I am looking for an experienced statistician to help analyze a loan
portfolio.  The dataset has close to 75k in loan records with 42
attributes per loan.  My interest is in a subset of the full dataset:
the number of loans that are 31 days past due or in default and were
originated since 2011 (5K records).  In order to properly analyze this subset I
will provide the full datasheet.  In essence, I am trying to
understand the characteristics of the loan that are currently
problematic in order to avoid similar loans in the future.

In order to accomplish this goal I am trying to identify the
combination of variables (loan attributes) that are statistically
valid in identifying future loan defaults.  Ideally, I also would
like to understand the weight of such factors in determining the

A couple of comments:
-I am not interested at this point at entertaining analysis regarding
profitability or return; therefore, staying away from risk – reward
analytics is preferable;
-I am particularly interested in identifying loans that default within
the first 6 months, 12 months and 18 months.
-Finally, some loan attributes are discrete (state of residency, loan
purpose), while others are represented by continuum variables
(debt-to-income ratios, revolving line utilizations).  If it
simplifies the analysis, I am fine with creating ranges for the
continuum variables.

I welcome your inputs in terms of the outcome of your analysis, but at
a minimum, here is what I want to receive:
-a descriptive analysis of the loans that i) were issued starting in
2011 and ii) are currently in a past due status by 31 days or more, or
in default
-a list of the top 5 to 10 variables that are more relevant in
determining weather a loan will default (for example, State, % Credit
Line utilization, # of inquires, FICO)
-a combination of factors that, after conducting a hypothesis testing,
could be use to infer default with a high degree of probability.  I
suspect there would be various combinations (ie: Texas resident with
FICO of less than 700, or % Credit utilization xx with loan values
over $20K).

I would be happy to discuss in more details.
Please include an estimate of the your price for this project.

Skills: analysis

About the Client

(5.00) 5 reviews

United States
New York 08:27 PM

11 Jobs Posted
55% Hire Rate, 1 Open Job

$774 Total Spent
7 Hires, 0 Active

$2.66/hr Avg Hourly Rate Paid
21 Hours

Member Since Nov 12, 2012