Straightforward R code for contingency analysis

Completed Posted Apr 10, 2014 Paid on delivery
Completed Paid on delivery

The project involves contingency analysis of data following the binominal distribution. Imagine a system composed of n points which we observed over certain time (observation times are in years, and these times differ among locations). At each of the points we can observe none, one or several events during the observation period (which is specific to that point). During one year, a point may have none or 1 event only. It follows that during one year a system may exhibit anything between 0 and n events, where n is the number of points which are active (recording) in that year.

We need to know what is the probability to observe simultaneous occurrence of k events across our system during a year, given

1 - number of sites covering a particular period

2 - total number of years with such events during that period, and

3 - length of that period, in years

The code should evaluate the theoretically expected frequencies of years when events are recorded at 0,1, ... n points. n is a maximum number events which we observed at different points during one year. In other words, we need to calculate joint probabilities of event occurrence. To calculate expected frequencies we assume the binominal distribution of the events:

SEE FORMULA IN ATTACHED FILE

where N is the total number of recording points in the analysis of a specific period; X – number of events in a single year; p – the probability of a site exhibiting an event in any year, and q – inverse of this probability. The differences between expected and observed frequencies are to be estimated by the Chi-square test.

Finally, the results should be bootstrapped. The bootstrapping part should be arranged like that:

1. bootstrapping operates on a moving time frame of a length specified by the user.

2. frame moves from the start of the specified period to its end, shifting by one year at a time.

3. at each frame position, the code calculates the required statistics.

The code will be fed by the data files with (a) record of events at each point and (b) the time period covered by each point (i.e. the period when our point was "recording" events, so to say).

As user input, the code should take specifications on the length of the period to analyse (data in original input file will cover larger period then the period we are interested in) and the length of the time frame for bootstrapping.

As output, I would need observed and expected frequencies of years with 0, 1 ... n events, and respective Chi-square statistics, for each position of the time frame. All these variables should come with bootstrap-generated 90% and 95% confidence envelopes.

I need the code to be written in R and well commented. Please, don't bid on this project if you cannot write the code in R.

Excel file with in-cell formulas to calculate expected frequencies will be provided. This file will also contain examples.

Algorithm Mathematics Odd Jobs Statistics

Project ID: #5785533

About the project

5 proposals Remote project Active Apr 10, 2014

Awarded to:

extinct

Hi, Many thanks for taking your time to write up an elaborate description, that was really helpful. I have read through and I can deliver the code in R. It happens that I am also a masters level statistician which I t More

$130 CAD in 3 days
(12 Reviews)
5.2

5 freelancers are bidding on average $137 for this job

kmittal

The project looks simpler that the ones I worked on till now. It will take 1-2 days but I have kept a margin of 2 more days because of weekend.

$160 CAD in 4 days
(17 Reviews)
4.8
mshoaib123

HI Brother, I am Data Scientist working in Multinational Company. My work is to see the hidden pattern in the large and complex data sets and predictive analytics, Data mining,Machine Learning and also uses the stati More

$166 CAD in 3 days
(4 Reviews)
2.9
lytvynenko

A proposal has not yet been provided

$130 CAD in 3 days
(0 Reviews)
1.3
nikhilgupta84

Hi, I am Nikhil from India. I can do this project quite easily. You will get 100% accuracy and satisfaction. Thanks, Nikhil Gupta

$111 CAD in 3 days
(0 Reviews)
0.0