Glasgow Haskell Compiler

Last updated

The Glasgow Haskell Compiler
Original author(s) Kevin Hammond
Developer(s) Simon Marlow, Simon Peyton Jones, The Glasgow Haskell Team [1]
Initial releaseDecember 1992;31 years ago (1992-12) [2]
Stable release
9.10.1  OOjs UI icon edit-ltr-progressive.svg / 10 May 2024;5 months ago (10 May 2024) [3]
Repository
Written in Haskell, C
Operating system Linux, macOS Catalina [4] and later, Windows 2000 and later, FreeBSD
Platform x86-64, [4] AArch64
Available inEnglish
Type Compiler
License BSD New
Website www.haskell.org/ghc

The Glasgow Haskell Compiler (GHC) is a native or machine code compiler for the functional programming language Haskell. [5] It provides a cross-platform software environment for writing and testing Haskell code and supports many extensions, libraries, and optimisations that streamline the process of generating and executing code. GHC is the most commonly used Haskell compiler. [6] It is free and open-source software released under a BSD license.

Contents

History

GHC originally begun in 1989 as a prototype, written in Lazy ML (LML) by Kevin Hammond at the University of Glasgow. Later that year, the prototype was completely rewritten in Haskell, except for its parser, by Cordelia Hall, Will Partain, and Simon Peyton Jones. Its first beta release was on 1 April 1991. Later releases added a strictness analyzer and language extensions such as monadic I/O, mutable arrays, unboxed data types, concurrent and parallel programming models (such as software transactional memory and data parallelism) and a profiler. [2]

Peyton Jones, and Marlow, later moved to Microsoft Research in Cambridge, where they continued to be primarily responsible for developing GHC. GHC also contains code from more than three hundred other contributors. [1] From 2009 to about 2014, third-party contributions to GHC were funded by the Industrial Haskell Group. [7]

GHC names

Since early releases the official website [8] has referred to GHC as The Glasgow Haskell Compiler, whereas in the executable version command it is identified as The Glorious Glasgow Haskell Compilation System. [9] This has been reflected in the documentation. [10] Initially, it had the internal name of The Glamorous Glasgow Haskell Compiler. [11]

Architecture

GHC is written in Haskell, [12] but the runtime system for Haskell, essential to run programs, is written in C and C--.

GHC's front end, incorporating the lexer, parser and typechecker, is designed to preserve as much information about the source language as possible until after type inference is complete, toward the goal of providing clear error messages to users. [2] After type checking, the Haskell code is desugared into a typed intermediate language known as "Core" (based on System F, extended with let and case expressions). Core has been extended to support generalized algebraic datatypes in its type system, and is now based on an extension to System F known as System FC. [13]

In the tradition of type-directed compiling, GHC's simplifier, or "middle end", where most of the optimizations implemented in GHC are performed, is structured as a series of source-to-source transformations on Core code. The analyses and transformations performed in this compiler stage include demand analysis (a generalization of strictness analysis), application of user-defined rewrite rules (including a set of rules included in GHC's standard libraries that performs foldr/build fusion), unfolding (called "inlining" in more traditional compilers), let-floating, an analysis that determines which function arguments can be unboxed, constructed product result analysis, specialization of overloaded functions, and a set of simpler local transformations such as constant folding and beta reduction. [14]

The back end of the compiler transforms Core code into an internal representation of C--, via an intermediate language STG (short for "Spineless Tagless G-machine"). [15] The C-- code can then take one of three routes: it is either printed as C code for compilation with GCC, converted directly into native machine code (the traditional "code generation" phase), or converted to LLVM IR for compilation with LLVM. In all three cases, the resultant native code is finally linked against the GHC runtime system to produce an executable.

Language

GHC complies with the language standards, both Haskell 98 [16] and Haskell 2010. [17] It also supports many optional extensions to the Haskell standard: for example, the software transactional memory (STM) library, which allows for Composable Memory Transactions.

Extensions to Haskell

Many extensions to Haskell have been proposed. These provide features not described in the language specification, or they redefine existing constructs. As such, each extension may not be supported by all Haskell implementations. There is an ongoing effort [18] to describe extensions and select those which will be included in future versions of the language specification.

The extensions [19] supported by the Glasgow Haskell Compiler include:

Type system extensions

An expressive static type system is one of the major defining features of Haskell. Accordingly, much of the work in extending the language has been directed towards data types and type classes.

The Glasgow Haskell Compiler supports an extended type system based on the theoretical System FC. [13] Major extensions to the type system include:

Extensions relating to type classes include:

Portability

Versions of GHC are available for several system or computing platform, including Windows and most varieties of Unix (such as Linux, FreeBSD, OpenBSD, and macOS). [22] GHC has also been ported to several different processor architectures. [22]

See also

Related Research Articles

ML is a general-purpose, high-level, functional programming language. It is known for its use of the polymorphic Hindley–Milner type system, which automatically assigns the data types of most expressions without requiring explicit type annotations, and ensures type safety; there is a formal proof that a well-typed ML program does not cause runtime type errors. ML provides pattern matching for function arguments, garbage collection, imperative programming, call-by-value and currying. While a general-purpose programming language, ML is used heavily in programming language research and is one of the few languages to be completely specified and verified using formal semantics. Its types and pattern matching make it well-suited and commonly used to operate on other formal languages, such as in compiler writing, automated theorem proving, and formal verification.

Generic programming is a style of computer programming in which algorithms are written in terms of data types to-be-specified-later that are then instantiated when needed for specific types provided as parameters. This approach, pioneered by the ML programming language in 1973, permits writing common functions or types that differ only in the set of types on which they operate when used, thus reducing duplicate code.

Template metaprogramming (TMP) is a metaprogramming technique in which templates are used by a compiler to generate temporary source code, which is merged by the compiler with the rest of the source code and then compiled. The output of these templates can include compile-time constants, data structures, and complete functions. The use of templates can be thought of as compile-time polymorphism. The technique is used by a number of languages, the best-known being C++, but also Curl, D, Nim, and XL.

Programming languages can be grouped by the number and types of paradigms supported.

<span class="mw-page-title-main">F Sharp (programming language)</span> Microsoft programming language

F# is a general-purpose, high-level, strongly typed, multi-paradigm programming language that encompasses functional, imperative, and object-oriented programming methods. It is most often used as a cross-platform Common Language Infrastructure (CLI) language on .NET, but can also generate JavaScript and graphics processing unit (GPU) code.

In computer programming, especially functional programming and type theory, an algebraic data type (ADT) is a kind of composite type, i.e., a type formed by combining other types.

Template Haskell is an experimental language extension to the Haskell programming language implemented in the Glasgow Haskell Compiler. In early incarnations it was also known as Template Meta-Haskell.

In functional programming, a monad is a structure that combines program fragments (functions) and wraps their return values in a type with additional computation. In addition to defining a wrapping monadic type, monads define two operators: one to wrap a value in the monad type, and another to compose together functions that output values of the monad type. General-purpose languages use monads to reduce boilerplate code needed for common operations. Functional languages use monads to turn complicated sequences of functions into succinct pipelines that abstract away control flow, and side-effects.

C-- is a C-like programming language, designed to be generated mainly by compilers for high-level languages rather than written by human programmers. It was created by functional programming researchers Simon Peyton Jones and Norman Ramsey. Unlike many other intermediate languages, it is represented in plain ASCII text, not bytecode or another binary format.

<span class="mw-page-title-main">C Sharp (programming language)</span> Programming language

C# is a general-purpose high-level programming language supporting multiple paradigms. C# encompasses static typing, strong typing, lexically scoped, imperative, declarative, functional, generic, object-oriented (class-based), and component-oriented programming disciplines.

In computer science, a type class is a type system construct that supports ad hoc polymorphism. This is achieved by adding constraints to type variables in parametrically polymorphic types. Such a constraint typically involves a type class T and a type variable a, and means that a can only be instantiated to a type whose members support the overloaded operations associated with T.

In functional programming, a generalized algebraic data type is a generalization of a parametric algebraic data type (ADT).

This article describes the features in the programming language Haskell.

Haskell is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell has pioneered several programming language features such as type classes, which enable type-safe operator overloading, and monadic input/output (IO). It is named after logician Haskell Curry. Haskell's main implementation is the Glasgow Haskell Compiler (GHC).

The following outline is provided as an overview of and topical guide to C++:

<span class="mw-page-title-main">Yesod (web framework)</span>

Yesod is a web framework based on the programming language Haskell for productive development of type-safe, representational state transfer (REST) model based, high performance web applications, developed by Michael Snoyman, et al. It is free and open-source software released under an MIT License.

<span class="mw-page-title-main">Elm (programming language)</span> Functional programming language

Elm is a domain-specific programming language for declaratively creating web browser-based graphical user interfaces. Elm is purely functional, and is developed with emphasis on usability, performance, and robustness. It advertises "no runtime exceptions in practice", made possible by the Elm compiler's static type checking.

Ur also called Ur/Web is a multi-paradigm, high-level, pure, strict, functional programming language. It is a dialect of the language ML, designed for web development, created by Adam Chlipala at the Massachusetts Institute of Technology that one program can emit code for a server, web browser client, and SQL specific to a given database backend. The full implementation is free and open-source software released under an MIT License.

<span class="mw-page-title-main">PureScript</span> Strongly-typed language that compiles to JavaScript

PureScript is a strongly-typed, purely-functional programming language that transpiles to JavaScript, C++11, Erlang, and Go. It can be used to develop web applications, server side apps, and also desktop applications with use of Electron or via C++11 and Go compilers with suitable libraries. Its syntax is mostly comparable to that of Haskell. In addition, it introduces row polymorphism and extensible records. Also, contrary to Haskell, the PureScript language is defined as having a strict evaluation strategy, although there are non-conforming back-ends which implement a lazy evaluation strategy.

References

  1. 1 2 "The GHC Team". Haskell.org. Retrieved 1 September 2016.
  2. 1 2 3 Hudak, P.; Hughes, J.; Peyton Jones, S.; Wadler, P. (June 2007). "A History of Haskell: Being Lazy With Class" (PDF). Procedures of the Third ACM SIGPLAN History of Programming Languages Conference (HOPL-III). Retrieved 1 September 2016.
  3. "Download – The Glasgow Haskell Compiler". Haskell.org.
  4. 1 2 "Deprecation of 32-bit Darwin and Windows platforms". GHC Team.
  5. "The Glorious Glasgow Haskell Compilation System User's Guide". Haskell.org. Retrieved 27 July 2014.
  6. "2017 state of Haskell survey results". taylor.fausak.me. 15 November 2017. Retrieved 11 December 2017.
  7. "Industrial Haskell Group". Haskell.org. 2014. Retrieved 1 September 2016.
  8. "GHC The Glasgow Haskell Compiler". Haskell.org. Retrieved 14 January 2022.
  9. "Repository: configure.ac". gitlab.haskell.org. 12 January 2022. Retrieved 14 January 2022.
  10. "The Glorious Glasgow Haskell Compilation System User's Guide, Version 7.6.3". downloads.haskell.org. Retrieved 14 January 2022.
  11. "ghc-0.29-src.tar.gz" (tar gzip). downloads.haskell.org. File: ghc-0.29/ghc/PATCHLEVEL. Retrieved 14 January 2022.
  12. "GHC Commentary: The Compiler". Haskell.org. 23 March 2016. Archived from the original on 23 March 2016. Retrieved 26 May 2016.
  13. 1 2 Sulzmann, M.; Chakravarty, M. M. T.; Peyton Jones, S.; Donnelly, K. (January 2007). "System F with Type Equality Coercions". Procedures of the ACM Workshop on Types in Language Design and Implementation (TLDI).
  14. Peyton Jones, S. (April 1996). "Compiling Haskell by program transformation: a report from the trenches". Procedures of the European Symposium on Programming (ESOP).
  15. Peyton Jones, S. (April 1992). "Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine, Version 2.5". Journal of Functional Programming. 2 (2): 127–202. doi:10.1017/S0956796800000319.
  16. "Haskell 98 Language and Libraries: The Revised Report". Haskell.org. Retrieved 28 January 2007.
  17. "Haskell 2010 Language Report". Haskell.org. Retrieved 30 August 2012.
  18. "Welcome to Haskell' (Haskell Prime)". Haskell.org. Archived from the original on 20 February 2016. Retrieved 26 May 2016.
  19. "GHC Language Features". Haskell.org. Archived from the original on 29 June 2016. Retrieved 25 May 2016.
  20. Coutts, D.; Leshchinskiy, R.; Stewart, D. (April 2007). "Stream Fusion: From Lists to Streams to Nothing at All". Procedures of the ACM SIGPLAN International Conference on Functional Programming (ICFP). Archived from the original on 23 September 2007.
  21. Mitchell, Neil; Fletcher, Shayne (3 May 2020). "Record Dot Syntax". ghc-proposals. GitHub . Retrieved 30 June 2020.
  22. 1 2 Platforms at gitlab.haskell.org