The application of diffuse pollution models included in EUROHARP encompassed varying levels of parameterisation and approaches to the preparation of input data depending on the model and modelling team involved. Modellers consistently faced important decisions in relation to data interpretation, especially in those catchments with unfamiliar physical or climatic characteristics, where catchment conditions were beyond the range for which a particular model was originally developed, or where only limited input data were available. In addition to a broad discussion of data issues, this paper compares the performance of the four sub-annual output models tested in EUROHARP (EveNFlow, NL-CAT, SWAT and TRK) in three test catchments without the modelling teams having sight of measured flow and nitrate concentration data. Model performance in this "blind test" indicate that the range of predictions generated by any individual models pre and post calibration exceed the differences between the estimates yielded by all four models. Comparison of Analysis of Variance (ANOVA) statistics for simulated and observed flow, concentration and loads underscores the benefits of calibration for these intermediate and complex model formulations. Interpretation of input data (e. g. rainfall interpolation method and pedotransfer functions selected) appeared equally (or more) important than process representation. In the absence of calibration data, modeller unfamiliarity with a particular catchment and its environmental processes sometimes resulted in questionable assumptions and input errors which highlight the problems facing modellers charged with implementing policies under the Water Framework Directive (2000/60/EC) in poorly monitored catchments. Catchment data owners and modellers must therefore work more closely given that the output from diffuse pollution models is clearly modeller-limited as well as model-limited.