Papers by Kevin Hammond (and others)

This page contains details of publications by Kevin Hammond and other members of the Functional Programming Groups at Glasgow, Heriot-Watt and St Andrews with whom I have been collaborating. The list is ordered by date of publication (and should be probably be reordered by topic!). A source of papers relating to the Parade project can be found in the Glasgow archive.


Author: H-W. Loidl
Title: "Granularity in Large-Scale Parallel Functional Programming"

Pubn: PhD thesis, Department of Computing Science, University of Glasgow,
Date: March 1998.
Format: GZipped PostScript

Abstract: This thesis demonstrates how to reduce the runtime of large non-strict functional programs using parallel evaluation. The parallelisation of several programs shows the importance of granularity, i.e. the computation costs of program expressions. The aspect of granularity is studied both on a practical level, by presenting and measuring runtime granularity improvement mechanisms, and at a more formal level, by devising a static granularity analysis.

By parallelising several large functional programs this thesis demonstrates for the first time the advantages of combining lazy and parallel evaluation on a large scale: laziness aids modularity, while parallelism reduces runtime. One of the parallel programs is the Lolita system which, with more than 47,000 lines of code, is the largest existing parallel non-strict functional program. A new mechanism for parallel programming, evaluation strategies, to which this thesis contributes, is shown to be useful in this parallelisation. Evaluation strategies simplify parallel programming by separating algorithmic code from code specifying dynamic behaviour. For large programs the abstraction provided by functions is maintained by using a data-oriented style of parallelism, which defines parallelism over intermediate data structures rather than inside the functions.

A highly parameterised simulator, GranSim, has been constructed collaboratively and is discussed in detail in this thesis. GranSim is a tool for architecture-independent parallelisation and a testbed for implementing runtime-system features of the parallel graph reduction model. By providing an idealised as well as an accurate model of the underlying parallel machine, GranSim has proven to be an essential part of an integrated parallel software engineering environment. Several parallel runtime-system features, such as granularity improvement mechanisms, have been tested via GranSim. It is publicly available and in active use at several universities worldwide.

In order to provide granularity information this thesis presents an inference-based static granularity analysis. This analysis combines two existing analyses, one for cost and one for size information. It determines an upper bound for the computation costs of evaluating an expression in a simple strict higher-order language. By exposing recurrences during cost reconstruction and using a library of recurrences and their closed forms, it is possible to infer the costs for some recursive functions. The possible performance improvements are assessed by measuring the parallel performance of a hand-analysed and annotated program.


Author: P.W. Trinder, K. Hammond, H.-W. Loidl, S.L. Peyton Jones
Title: "Algorithm + Strategy = Parallelism" (HTML)
Pubn: In Journal of Functional Programming, 8(1)
Date: Jan 1998
Format: GZipped PostScript  and HTML

Abstract: The process of writing large parallel programs is complicated by the need to specify both the parallel behaviour of the program and the algorithm that is to be used to compute its result. This paper introduces evaluation strategies, lazy higher-order functions that control the parallel evaluation of non-strict functional languages. Using evaluation strategies, it is possible to achieve a clean separation between algorithmic and behavioural code. The result is enhanced clarity and shorter parallel programs.

Evaluation strategies are a very general concept: this paper shows how they can be used to model a wide range of commonly used programming paradigms, including divide-and-conquer, pipeline parallelism, producer/consumer parallelism, and data-oriented parallelism. Because they are based on unrestricted higher-order functions, they can also capture irregular parallel structures. Evaluation strategies are not just of theoretical interest: they have evolved out of our experience in parallelising several large-scale parallel applications, where they have proved invaluable in helping to manage the complexities of parallel behaviour. Some of these applications are described in detail here. The largest application we have studied to date, Lolita, is a 60,000 line natural language engineering system. Initial results show that for these programs we can achieve acceptable parallel performance, for relatively little programming effort.


Author: P. Trinder, K. Hammond, H-W. Loidl, S. Peyton Jones, J. Wu
Title: "Go-faster Haskell; Or: Data-intensive Programming in Parallel Haskell"
Pubn: Submitted to ICFP'98 --- International Conference on Functional Programming.


Author: H-W. Loidl, P. Trinder
Title: "Engineering Large Parallel Functional Programs"
Pubn: In IFL'97 --- International Workshop on the Implementation of Functional Languages 1997, LNCS, St. Andrews, Scotland, UK.
Date: Sep 10-12, 1997.
Format: GZipped PostScript

Abstract: The design and implementation of useful programming languages, whether sequential or parallel, should be driven by large, realistic applications. In constructing several medium- and large-scale programs in Glasgow Parallel Haskell, GPH, a parallel extension of Haskell, the group at Glasgow has investigated several important engineering issues:


Author: H-W. Loidl, R. Morgan, P. Trinder, S. Poria, C. Cooper, S. Peyton Jones, R. Garigliano
Title: "Parallelising a Large Functional Program; Or: Keeping LOLITA Busy"
Pubn: In IFL'97 --- International Workshop on the Implementation of Functional Languages 1997, LNCS, St. Andrews, Scotland, UK.
Date: Sep 10-12, 1997.
Format: GZipped PostScript

Abstract: In this paper we report on the ongoing parallelisation of Lolita, a natural language engineering system. Although Lolita currently exhibits only modest parallelism, we believe that it is the largest parallel functional program ever, comprising more than 47,000 lines of Haskell. Lolita has the following interesting features common to real world applications of lazy languages:

Our expectations in parallelising the program were to achieve moderate speedups with small changes in the code. To date speedups of up to 2.4 have been achieved for Lolita running under a realistic simulation of our 4 processor shared-memory target machine. Most notably, the parallelism is achieved with a very small number of changes to, and without requiring an understanding of most of the application. On the Sun SPARCserver target machine wall-clock speedup is currently limited by physical memory availability.


Author: H-W. Loidl, K. Hammond
Title: "Making a Packet: Cost-Effective Communication for a Parallel Graph Reducer "
Pubn: In IFL'96 --- Intl Workshop on the Implementation of Functional Languages 1996, Bonn/Bad-Godesberg, Germany, Sep 16 -- 18, 1996.
LNCS 1268, pp. 184--199, Springer-Verlag.
Date: 1996
Format: GZipped PostScript

Abstract: This paper studies critical runtime-system issues encountered when packing data for transmission in a lazy, parallel graph reduction system. In particular, we aim to answer two questions:

In order to answer the first question, we compare various packing schemes, of which one extreme packs just the node that is demanded (``incremental fetching''), and the other packs all the graph that is reachable from that node (``bulk fetching''). The second question is addressed by considering various mechanisms for latency hiding during communication, ranging from fully synchronous communication with no attempt to mask latency, to full thread migration during asynchronous communication.

In order to make our results as general as possible, we have used the GranSim simulator to study a wide variety of parallel machine configurations. Based on these measurements we propose concrete improvements for parallel graph reducers such as the GUM implementation of Glasgow Parallel Haskell.


Author: P.W. Trinder, K. Hammond, H-W. Loidl, S. Peyton Jones, J. Wu
Title: "A Case Study of Data-intensive Programs in Parallel Haskell"
Pubn: In Glasgow Workshop on Functional Programming 1996, Ullapool, Scotland, Jul. 8--10, 1996.
Date: 1996
Format: GZipped PostScript

Abstract: "An invisible car came out of nowhere, struck my vehicle and vanished." "I pulled away from the side of the road, glanced at my mother-in-law, and headed for the embankment." "As I approached the intersection a sign suddenly appeared in a place where no stop sign had ever appeared before."

Luckily, we don't normally have to deal with problems as bizarre as these. One interesting application that does arise at the Centre for Transport Studies consists of matching police reports of several accidents so as to locate accident blackspots. The application provides an interesting, data-intensive, test-bed for the persistent functional language PFL. We report here on an approach aimed at improving the performance of this application using Glasgow Parallel Haskell.

The accident application is one of several large parallel Haskell programs under development at Glasgow. Our objective is to achieve wall-clock speedups over the best sequential implementations, and we report modest wall-clock speedups for a demonstration program. From experience with these and other programs the group is developing a methodology for parallelising large functional programs. We have also developed strategies, a mechanism to separately specify a function's algorithm and its dynamic behaviour.


Author: H-W. Loidl, K. Hammond.
Title: "A Sized Time System for a Parallel Functional Language"
Pubn: In Glasgow Workshop on Functional Programming 1996, Ullapool, Scotland, Jul. 8--10, 1996.
Date: 1996
Format: GZipped PostScript

Abstract: This paper describes an inference system, whose purpose is to determine the cost of evaluating expressions in a strict purely functional language. Upper bounds can be derived for both computation cost and the size of data structures. We outline a static analysis based on this inference system for inferring size and cost information. The analysis is a synthesis of the sized types of Hughes et al., and the polymorphic time system of Dornic et al., which was extended to static dependent costs by Reistad and Gifford.

Our main interest in cost information is for scheduling tasks in the parallel execution of functional languages. Using the GranSim parallel simulator, we show that the information provided by our analysis is sufficient to characterise relative task granularities for a simple functional program. This information can be used in the runtime-system of the Glasgow Parallel Haskell compiler to improve dynamic program performance.


Author: P.W. Trinder, K. Hammond, J.S. Mattson Jr, A. Partridge, and S.L. Peyton Jones
Title: "GUM: a Portable Parallel Implementation of Haskell"
Pubn: PLDI'96 --- Proceedings of Programming Languages Design and Implementation, Philadelphia, USA, 1996 (an earlier, more detailed version of this paper has been published in IFL'95 --- Intl. Workshop on the Implementation of Functional Languages Bastad, Sweden, September, 1995)
Date: 1996/1995
Format: GZipped PostScript

Abstract: GUM is a portable, parallel implementation of the Haskell functional language. Despite sustained research interest in parallel functional programming, GUM is one of the first such systems to be made publicly available. GUM is message-based, and portability is facilitated by using the PVM communications harness that is available on many multi-processors. As a result, GUM is available for both shared-memory (Sun SPARCserver multiprocessors) and distributed-memory (networks of workstations) architectures. The high message-latency of distributed machines is ameliorated by sending messages asynchronously, and by sending large packets of related data in each message. Initial performance figures demonstrate absolute speedups relative to the best sequential compiler technology. To improve the performance of a parallel Haskell program GUM provides tools for monitoring and visualising the behaviour of threads and of processors during execution.


Author: Tony Davie and Kevin Hammond
Title: "Functional Hypersheets" (HTML)
Pubn: 1996 Glasgow Workshop on Functional Programming
Date: 1996
Format: HTML

Abstract We propose to integrate lazy functional programming, hyperprogramming and persistence into a program development environment that permits either single-user or distributed value and program construction. The views of the system that users will see will be via generalised spreadsheets containing formulae for the development of values of any type. These hypersheets will constitute a local or distributed network of hyperlinks. Hyperlinks to objects and the values they refer to will be able to evolve with time; the values obtained when a user evaluates an object defined in a hypersheet will persist as long as any version of the object is accessible. We advocate an environment in which hyperlinks and the values they refer to undergo controlled evolution. This provides an incremental progression of values towards their normal form as users' solutions to problems develop.


Author: Pieter Hartel, Marc Feeley et al.
Title: "Benchmarking Implementations of Functional Languages with ``Pseudoknot'', a Float-Intensive Benchmark" (HTML)
Pubn: Journal of Functional Programming
Date: 1996
Format: HTML

Abstract

Over 25 implementations of different functional languages are benchmarked using the same program, a floating-point intensive application taken from molecular biology. The principal aspects studied are compile time and execution time for the various implementations. An important consideration is how the program can be modified and tuned to obtain maximal performance on each language implementation.

With few exceptions, the compilers take a significant amount of time to compile this program, though most compilers were faster than the current GNU GCC compiler. Compilers that generate C or Lisp are often slower than those that generate native code directly; the cost of compiling the intermediate form is normally a large fraction of the total compilation time.

There is no clear distinction between the runtime performance of eager and lazy implementations when appropriate annotations are used: lazy implementations have clearly come of age when it comes to implementing largely strict applications, such as the Pseudoknot program. The speed of C can be approached by some implementations, but to achieve this performance, special measures such as strictness annotations are required by non-strict implementations.

The benchmark results have to be interpreted with care. Firstly, a benchmark based on a single program cannot cover a wide spectrum of ``typical'' applications. Secondly, the many compilers used are varied in the kind and level of optimisations offered, so the efforts to obtain optimal versions of the program are similarly varied.


Author: Andrew Gordon and Kevin Hammond
Title: "Monadic I/O in Haskell 1.3" (HTML)
Pubn: Workshop on Haskell, La Jolla, CA
Date: June 1995
Format: HTML

Abstract

We describe the design and use of monadic I/O in Haskell 1.3, the latest revision of the lazy functional programming language Haskell. Haskell 1.3 standardises the monadic I/O mechanisms now available in many Haskell systems. The new facilities allow fairly sophisticated text-based application programs to be written portably in Haskell. The standard provides implementors with a flexible framework for extending Haskell to incorporate new language features. Apart from the use of monads, the main advances over the previous standard are: character I/O based on handles (analogous to ANSI C file pointers), an error handling mechanism, terminal interrupt handling and a POSIX interface. Apart from a tutorial description of the new facilities we include a worked example: a derived monad for combinator parsing.


Author: H-W. Loidl and K. Hammond.
Title: "On the Granularity of Divide-and-Conquer Parallelism"
Pubn: In Glasgow Workshop on Functional Programming 1995, Ullapool, Scotland, Jul. 8--10, 1995. Springer Verlag.
Date: 1995
Format: GZipped PostScript

Abstract: This paper studies the runtime behaviour of various parallel divide-and-conquer algorithms written in a non-strict functional language, when three common granularity control mechanisms are used: a simple cut-off, a priority thread creation and a priority scheduling mechanism. These mechanisms use granularity information that is currently provided via annotations to improve the performance of the parallel programs.

The programs we examine are several variants of a generic divide-and-conquer program, an unbalanced divide-and-conquer algorithm and a parallel determinant computation. Our results indicate that for balanced computation trees a simple, low-overhead mechanism performs well whereas the more complex mechanisms offer further improvements for unbalanced computation trees.


Author: K. Hammond, H-W. Loidl, and A.S. Partridge.
Title: "Visualising Granularity in Parallel Programs: A Graphical Winnowing System for Haskell" (HTML)
Pubn: In HPFC'95 --- Conf. on High Performance Functional Computing, pp. 208--221, Denver, Colorado, Apr. 10--12, 1995.
Date: 1995
Format: GZipped PostScript  and HTML

Abstract: To take advantage of distributed-memory parallel machines it is essential to have good control of task granularity. This paper describes a fairly accurate parallel simulator for Haskell, based on the Glasgow compiler, and complementary tools for visualising task granularities. Together these tools allow us to study the effects of various annotations on task granularity on a variety of simulated parallel architectures. They also provide a more precise tool for the study of parallel execution than has previously been available for Haskell programs. These tools have already confirmed that thread migration is essential in parallel systems, demonstrated a close correlation between thread execution times and total heap allocations, and shown that fetching data synchronously normally gives better overall performance than asynchronous fetching, if data is fetched on demand.


Author: K. Hammond, P.W. Trinder
Title: "Database Manipulation in Haskell 1.3 "
Pubn: In Proceedings of the 1995 Glasgow Workshop on Functional Programming, Ullapool, Scotland, July 1995. Springer-Verlag.
Date: 1995


Author: K. Hammond, H-W. Loidl, J.S. Mattson Jr, A. Partridge, S.L. Peyton Jones, and P.W. Trinder
Title: "GRAPHing the Future"
Pubn: In IFL'94 --- Intl. Workshop on the Implementation of Functional Languages, Norwich, England, Sep. 7--9, 1994.
Date: 1994
Format: GZipped PostScript

Abstract: At Glasgow our research into parallel functional programming has been moving away from our novel architecture, GRIP towards the provision of a general parallel runtime environment. We call this GRAPH (Graph Reduction for an Assortment of Parallel Hardware).

This paper describes the design of a new memory and load management model for GRAPH, which is intended to match shared- and distributed-memory machines better and to provide a framework for our research into parallel functional databases and granularity control. This model is currently under implementation. No performance results can therefore be provided at the present time.


Author: H-W. Loidl, K. Hammond
Title: "GRAPH for PVM: Graph Reduction for Distributed Hardware"
Pubn: In IFL'94 --- Intl. Workshop on the Implementation of Functional Languages , Norwich, England, Sep. 7--9, 1994.
Date: 1994
Format: GZipped PostScript

Abstract: We describe a version of the GRAPH system (Graph Reduction for an Assortment of Parallel Hardware) designed to execute parallel functional programs on a range of distributed-memory MIMD machines running the PVM communications harness. GRAPH was developed from the runtime system for the novel GRIP multiprocessor. Although this system has proved highly successful, being a novel architecture it is hard to compare results with other architectures or implementations. The CPUs used in the GRIP design are also rather old.

The principal extensions from GRAPH for GRIP to GRAPH for PVM are intended to handle high latencies more efficiently, by exploiting asynchronous communication, multi-threaded scheduling, and by grouping packets into larger entities where possible. The main innovation is the development of new, sophisticated packet flushing and packet fetching algorithms whose purpose is to allow graphs to be exported to and fetched from global memory without inter-processor synchronisation. As communication and reduction can be interleaved, a new form of synchronising communication and reduction is necessary.

We also give a short outlook on the design of a new portable distributed graph reduction system, GRAPH for UMM, that adepts the ideas and techniques originally developed for the GRIP system to a distributed memory environment.

GRAPH for PVM has been implemented on a network of SUN workstations and it is currently being tested and debugged. Although no extensive performance measurements have been performed yet, the available data shows a significant decrease in the overall communication and thus an improvement in the overall performance of the system.


Author: K. Hammond
Title: "Parallel Functional Programming: An Introduction" (HTML)
Pubn: In PASCO'94 --- First Intl. Symposium on Parallel Symbolic Computation, Hagenberg/Linz, Austria, 26-28 September. World Scientific.
Date: 1994
Format: Compressed PostScript and HTML

Abstract: This paper introduces the general area of parallel functional programming, surveying the current state of research and suggesting areas which could profitably be explored in the future. No new results are presented. The paper contains 97 references selected from the 500 or so publications in this field.


Author: K. Hammond, J.S. Mattson Jr, and S.L. Peyton Jones.
Title: "Automatic Spark Strategies and Granularity for a Parallel Functional Language Reducer "
Pubn: In CONPAR'94 --- Conference on Algorithms and Hardware for Parallel Processing, Linz, Austria, September 6-8. LNCS 854, Springer Verlag
Date: 1994
Format: Compressed PostScript

Abstract: This paper considers the issue of dynamic task control in the context of a parallel Haskell implementation on the GRIP multiprocessor. For the first time, we report the effect of our task control strategies on task granularity, as measured in terms of dynamic heap allocations. This gives a concrete means of measuring the effectiveness of these strategies other than wall-clock timings, which are notoriously uninformative.


Author: Cordelia V Hall, Kevin Hammond, Simon L Peyton Jones and Philip L Wadler
Title: "Type Classes in Haskell" (HTML)
Pubn: ESOP '94 (Extended version in TOPLAS)
Date: January 1994
Format: Compressed PostScript and HTML

Abstract

This paper defines a set of type inference rules for resolving overloading introduced by type classes. Programs including type classes are transformed into ones which may be typed by the Hindley-Milner inference rules. In contrast to other work on type classes, the rules presented here relate directly to user programs. An innovative aspect of this work is the use of second-order lambda calculus to record type information in the program.


Author: G. Akerholt, K. Hammond, S.L. Peyton Jones, P. Trinder
Title: "Processing Transactions on GRIP"
Pubn: In PARLE'93 --- Parallel Languages and Architectures Europe, Munich, Germany, June 1993
Date: 1993
Format: GZipped PostScript

Abstract: The GRIP architecture allows efficient execution of functional programs on a multi-processor built from standard hardware components. State-of-the-art compilation techniques are combined with sophisticated runtime resource-control to give good parallel performance. This paper reports the results of running GRIP on an application which is apparently unsuited to the basic functional model: a database transaction manager incorporating updates as well as lookup transactions. The results obtained show good relative speedups for GRIP, with real performance advantages over the same application executing on sequential machines.


Author: Simon L Peyton Jones, Cordy Hall, Kevin Hammond, Will Partain and Philip Wadler
Title: "The Glasgow Haskell compiler: a technical overview"
Pubn: Proc. UK Joint Framework for Information Technology (JFIT) Technical Conference, Keele, 1993.
Date: March 1993
Format: Compressed Postscript

Abstract

We give an overview of the Glasgow Haskell compiler, focusing especially on way in which we have been able to exploit the rich theory of functional languages to give very practical improvements in the compiler.

The compiler is portable, modular, generates good code, and is freely available.


Author: K. Hammond
Title: "Getting a GRIP "
Pubn: In IFL'93 --- Intl. Workshop on the Parallel Implementation of Functional Languages, Nijmegen, The Netherlands, September 1993.
Date: 1993
Format: GZipped PostScript

Abstract: This paper describes a portable implementation of the Graph Reduction in Parallel (GRIP) software, built on the Parallel Virtual Machine (PVM) process control system. Using this system, it is easy for anyone to ``get a GRIP'' without needing to invest in special-purpose hardware. An important contribution of this paper is its detailed description of GRIP's Intelligent Memory Unit (IMU) software.


Author: K. Hammond and S.L. Peyton Jones
Title: "Profiling Scheduling Strategies on the GRIP Multiprocessor "
Pubn: In IFL'92 --- Fourth Intl. Workshop on the Parallel Implementation of Functional Languages, pp. 73-98, RWTH Aachen, Germany, September 1992.
Date: 1992
Format: GZipped PostScript

Abstract: It is widely claimed that functional languages are particularly suitable for programming parallel computers. A claimed advantage is that the programmer is not burdened with details of task creation, placement, scheduling, and synchronisation, these decisions being taken by the system instead. Leaving aside the question of whether a pure functional language is expressive enough to encompass all the parallel algorithms we might wish to program, there remains the question of how effectively the compiler and run-time system map the program onto a real parallel system, a task usually carried out mostly by the programmer. This is the question we address in this paper.

We first introduce the system architecture of GRIP, a shared-memory parallel machine supporting an implementation of the functional language Haskell. GRIP executes functional programs in parallel using compiled supercombinator graph reduction, a form of declarative rule system.

We then describe several strategies for run-time resource control which we have tried, presenting comprehensive measurements of their effectiveness. We are particularly concerned with strategies controlling task creation, in order to improve task granularity and minimise communication overheads. This is, so far as we know, one of the first attempts to make a systematic study of task-control strategies in a high-performance parallel functional-language system. GRIP's high absolute performance render these results credible for real applications.


Author: G. Akerholt, K. Hammond, S.L. Peyton Jones, P.W. Trinder
Title: "A Parallel Functional Database for GRIP"
Pubn: In IFL'91 --- Intl Workshop on the Parallel Implementation of Functional Languages, pp. 7-30, Southampton, UK, June 5-7, 1991.
Date: 1991

Abstract: GRIP is a shared memory multiprocessor designed for efficient parallel evaluation of functional languages using compiled graph reduction. In this paper, we consider the feasibility of implementing a data base manager on GRIP, and present results obtained from a pilot implementation. A database implemented in a pure functional language must be modified non-destructively, i.e. the original database must be preserved and a new copy constructed. The naive implementation provides evidence for the feasibility of a pure functional database in the form of modest real-time speed-ups, and acceptable real-time performance. The functional database is also used to investigate the GRIP architecture, compared with an idealized machine. The particular features investigated are the thread-creation costs and caching of GRIP's distributed memory


Kevin Hammond <kh@dcs.st-and.ac.uk>