The file "tckn_est_obj_unit.pas" contains the complete core of the code used to fit the CKMR-based stock assessment described in the accompanying "NC" paper (Bravington/Grewe/Davies, 2016: "Absolute abundance of southern bluefin tuna estimated by close-kin mark-recapture"; Nature Communications; unless otherwise stated, all cross-references here are to that paper). Specifically, the code shows how the log-likelihood  (here "lglk") was calculated according to CKMR principles, also incorporating auxiliary data such as length- and age-compositions. The input data include the list of identified kin-pairs; the actual finding of those pairs is done separately, as described thoroughly in the NC paper.

The code is in Pascal, but there are no special tricks in the lglk section, which reads very much like C. It forms part of an R package which handles the entire data organization and estimation, as summarized in "Methods: Parameter estimation" subsection. Because of non-back-compatible changes since 2013 in R itself and supporting software, the package no longer runs on current R versions, unfortunately. I am in the process of updating the package to address that, and will upload the full package to this address when complete. However, the key insights are all in the Pascal code provided here; running the package simply produces the results in the NC paper.

Because the code has to deal with real data and "corner cases", it is unavoidably complex, with details well beyond what can be described in the NC paper or Supplement. Although inspecting the code may prove useful to anyone doing their own CKMR analysis, the complexity may also obscure the (actually rather simple) principles. For general understanding, it is better to start with the NC paper and Supplement. Further statistical aspects specific to this SBT study may be found in the "FRDC report" where this work was first described, especially section 15.  The citation is

"Fishery-independent estimate of spawning biomass of Southern Bluefin Tuna through identification of close-kin using genetic markers"
Mark V. Bravington, Peter M. Grewe and Campbell R. Davies: 
CSIRO, Hobart, Australia
2014
Final report 2007/034 for Fisheries Research and Development Corporation, Australia
2014
http://frdc.com.au/research/Final_Reports/2007-034-DLD.pdf

For statistical and theoretical background on CKMR in general, and beyond our SBT study, see also this recent paper:

"Close-Kin Mark-Recapture"
Mark V. Bravington, Hans J. Skaug, and Eric C. Anderson
2016
Statistical Science


WHERE TO LOOK IN THE CODE

For understanding CKMR, the key is just lines 858--859, which calculate Expected Relative Reproductive Output (ERRO) as per eqn (2). These feature in Binomial log-likelihood terms in lines 860, 894, and 918, as explained below. Much of the rest of the code calculates the raw materials required for the ERRO calculations. A considerable portion is devoted to important but profoundly dull book-keeping, e.g. handling "oversize" and "undersize" fish in a logically consistent fashion. 

The code is exactly as used in 2013 to produce the results in the NC paper--- it has not been "cleaned up for inspection". Unfortunately, this does mean that comments are laconic, and that a substantial portion of the code is duplicated and never actually used (the duplication was a device to get round some technical limitations).


LINE-BY-LINE BREAKDOWN

This mainly covers the lglk calculation, which is all done in one long routine. A couple of other sections are briefly mentioned afterwards.

310--955:  Notwithstanding the routine name 'lata_lglk', this routine actually computes all 4 components of the lglk in eqn (3). Its argument is a vector of "raw" parameters passed from R (eg by 'nlminb'), each on domain (-Inf, +Inf).

361--672: Unpack/untransform the raw parameters into meaningful biological/demographic variables, eg all numbers-at-age-and-year, length-at-age, fecundity-at-size, selectivities, mortality rates.

	638--672: Relative fecundity by sex, length, and/or age, plus quantities required to interpolate fecundity of an adult in the years prior to its capture (see lines 1046--1076, described below). These fecundities are required later to compute ERRO as per eqn (2).

676--679: Apply tiny penalty to all parameters except random effects. This provides numerical stability for parameters that are necessary for demographic book-keeping, but basically unimportant, and hard to estimate precisely. 

694--755: Lglk for age-subsample data [ Lambda 'a|ls' in eqn (3) ], and length/sex composition data [ Lambda 'ls' in eqn (3) ].

  706: 'a|ls' multinomial lglk
  
  714: 'ls' multinomial lglk. Note that these length/sex composition data show some overdispersion, the extent of which is pre-estimated externally. The "data" actually fitted here have been pre-adjusted to compensate before being passed to 'lata_lglk', so that they can be treated approximately as true multinomials. 

	722--733: Add lglk for the (few) fish where length was measured but (by accident) not sex.

	735--755: Prepare to compute total RO from fish too small to appear in the samples (a minor contribution); qv section 15.3.3 of FRDC report. Needed later in CKMR part of lglk, and computed here for efficiency.

760--786: Total RO by year and sex; will constitute the denominator of ERRO.

802--929: True CKMR part of lglk, ie Lambda 'g' in eqn (3). Also handles age data for identified parents (see note on lines 838 and 922).

  834--835: "Retrojected" individual growth based on current length and age. Individual growth follows a von Bertalanffy curves with individual L-infinity, assumed to be a random variable with a t-distribution around the sex-specific mean.
  
  838: Conditional distribution of age for identified parents.

	858--859: The key lines in the whole program: compute ERRO for a specific type of adult-juvenile comparison.

  861: Multiply by the number of such comparisons, to get expected POPs for that type of comparison. (Not used directly in lglk calculation, but important for diagnostics.)
  
  869--891: Tedious book-keeping to deal with adults for which some covariates were not measured.
  
  894: CKMR lglk *as if* there were no recaptures (kin-pairs). The probability distribution is Binomial, with "size" equal to the number of pairwise comparisons with those particular covariates, and probability being the corresponding ERRO.
  
  898--921: Adjust CKMR lglk by looping over each individual recapture that *did* happen (only a small number, 45 in this case); see Line 918. This is computationally efficient and numerically stable.
  
  922: Add lglk component for measured age in this parent, given their other covariates and the fact that they are an identified parent. The rationale for this breakdown is fairly complicated; see section 15.2 of FRDC report.
  
  924--929: Book-keeping adjustments for those parents where some covariates were not measured. (As it turned out, there weren't any.)
  
  931--943: Random-effect lglk contribution for recruitment deviations.
  
  945--955: Allowance for uncertainty in the parameters of daily fecundity-at-size, which were estimated in a completely separate study (Farley et al.; ref 22 in NC paper). The uncertainty is encapsulated via random effects with known variance.
  
1046--1076: retrospective individual fecundity at length-and-sex. Measured length is an integer (in centimetres), but projected length in the past is not, so cubic spline interpolation is used. In rare cases, mild extrapolation is required, but it is well-behaved. 

129--249: Data setup, passed in from R. Quantities computed here do not depend on parameters, and are unchanged during estimation.

1078--end: Unused duplicates of parts of 'lata_lglk'.

The remainder of the code is mostly variable declarations, and accessor functions for retrieving computed quantities to be called from R.
