Research article

A new attribute-linked residential property price dataset for England and Wales, 2011 to 2019

Authors
  • Bin Chi orcid logo (Centre for Advanced Spatial Analysis (CASA), University College London, 90 Tottenham Court Road, London W1T 4TJ, UK)
  • Adam Dennett orcid logo (Centre for Advanced Spatial Analysis (CASA), University College London, 90 Tottenham Court Road, London W1T 4TJ, UK)
  • Thomas Oléron-Evans orcid logo (Centre for Advanced Spatial Analysis (CASA), University College London, 90 Tottenham Court Road, London W1T 4TJ, UK)
  • Robin Morphet orcid logo (Centre for Advanced Spatial Analysis (CASA), University College London, 90 Tottenham Court Road, London W1T 4TJ, UK)

This is version 2 of this article, the published version can be found at: https://doi.org/10.14324/111.444/ucloe.000019

Abstract

Current research on residential house price variation in the UK is limited by the lack of an open and comprehensive house price database that contains both transaction price alongside dwelling attributes such as size. This research outlines one approach which addresses this deficiency in England and Wales through combining transaction information from the official open Land Registry Price Paid Data (LR-PPD) and property size information from the official open Domestic Energy Performance Certificates (EPCs). A four-stage data linkage is created to generate a new linked dataset, representing 79% of the full market sales in the LR-PPD. This new linked dataset offers greater flexibility for the exploration of house price (£/m2) variation in England and Wales at different scales over postcode units between 2011 and 2019. Open access linkage codes will allow for future updates beyond 2019.

Keywords: Land Registry Price Paid Data, Domestic Energy Performance Certificates, data linkage, England and Wales

Rights: © 2021 The Authors.

3356 Views

3Citations

Published on
27 May 2021
Peer Reviewed

 Open peer review from James Gleeson

Review

Review information

DOI:: 10.14293/S2199-1006.1.SOR-ECON.ATVQYC.v1.RPGYXZ
License:
This work has been published open access under Creative Commons Attribution License CC BY 4.0 , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Conditions, terms of use and publishing policy can be found at www.scienceopen.com .

Keywords: Land Registry Price Paid Data , data linkage , Domestic Energy Performance Certificates , Built environment , England and Wales , Energy , Urban studies , Sustainable and resilient cities

Review text

The lack of official or rigorously derived data on house prices per square metre in England and Wales constitutes an important gap in the evidence base for housing policy-making and market monitoring. This gap makes it more difficult to discern spatial patterns in the housing market, as measures of house prices either ignore spatial variation in housing types or account for them using complex and opaque weighting procedures.

Simple average price measures that do not take account of variation in housing types can lead to misunderstanding, for example by making prices in an area seem more expensive simply because it features larger properties. The lack of data on prices per square metre also make it difficult to compare costs in England with those in other countries.

Prices per square metre are also valuable in operational terms, as they are a key input into the analysis of viability in housing development, which in turn affects the amounts of infrastructure and affordable housing that planning authorities are able to secure from new development. Finally, linking data on property prices and energy efficiency could enable valuable new analysis of willingness to pay for higher energy standards.

The introduction of new open datasets on the sale prices and energy performance of dwellings in England and Wales has been very welcome, but the lack of unique property identifiers in in these datasets and the often messy nature of residential addresses makes linking these datasets much more difficult.

The authors of this paper have developed a sophisticated linking method to overcome these challenges, and have achieved a high matching rate. The resulting data will very valuable in itself for a wide range of purposes, but by clearly explaining the method followed the researchers will hopefully also enable others to apply it to new data or to develop it further.



Note:
This review refers to round 1 of peer review.

 Open peer review from

Review

Review information

DOI:: 10.14293/S2199-1006.1.SOR-ECON.AKZ4VF.v1.RXBZDQ
License:
This work has been published open access under Creative Commons Attribution License CC BY 4.0 , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Conditions, terms of use and publishing policy can be found at www.scienceopen.com .

Keywords: Land Registry Price Paid Data , data linkage , Domestic Energy Performance Certificates , Built environment , England and Wales , Energy , Urban studies , Sustainable and resilient cities

Review text

This paper describes a sound and logical process that creates a valuable housing dataset. By linking LR-PPD and EPC data, countrywide housing transaction records being enriched with highly useful floor area attributes. Such a dataset is very much welcomed with a high matching rate and open access code. The benefits of this process are evident with stats figures provided in the summary.

I found the paper is well organized and presented. It is easy to follow although a very complex process was described. The diagrams illustrate the logic behind the steps well. This makes the matching results justifiable.

I have the following questions or suggestions which hopefully will help to improve it further:

  1. This paper, as it is titled, is a data description summary. It will be great if it can be extended into a method paper where more details about the rules can be included.
  2. Relate to this, I find the paper describes the logic of data processing very well. But there are limited examples provided.  For instance, in P5, 95 new variables in EPC and 180 variables in LR-PPD were mentioned to be included. Would be better to see a couple of examples. Also, I appreciate there are 251 matching rules they are detailed and complex. It would be great to see some examples too. This will make the logic clearer. At the moment the paper is rather conceptual.
  3. In terms of validations, it would be great if a manual random check of the matching results can be included. This will introduce the data with extra (a) examples of matching results (b) accuracy descriptions at the end.
  4. After aggregating to the census unit, whether it is possible to compare with the census housing figures such as the distribution of house types or other types of commonly sorted attributes?
  5. As mentioned, PPD data is updated regularly. It may be worth checking different versions of data to see if they result in different matching rates. This will give us a reliability test.
  6. It may be also worthwhile to include more limitations discussed at the end.

Thanks to Bin and others for sharing this interesting article and generating a useful dataset. Hope more details of the method and wider application will be made available soon.



Note:
This review refers to round 1 of peer review.

 Open peer review from Xuxin Mao

Review

Review information

DOI:: 10.14293/S2199-1006.1.SOR-ECON.ABI2UO.v1.RMGBOB
License:
This work has been published open access under Creative Commons Attribution License CC BY 4.0 , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Conditions, terms of use and publishing policy can be found at www.scienceopen.com .

Keywords: Land Registry Price Paid Data , data linkage , Domestic Energy Performance Certificates , Built environment , England and Wales , Energy , Urban studies , Sustainable and resilient cities

Review text

The lack of house price per square metre data has caused serious problem in comparing the UK and international housing markets and proposing suitable policies to tackle the domestic housing crisis.

The publication presents an exciting academic research in this area and filled an important research gap well documented by reviewing related literature on developing and applying related house price datasets.

Based on technically advanced linking and cleansing methods, the research illustrats how to generate a comprehensive house price dataset covering around 90% properties including flats within a house. It also takes account of various property attributes like energy efficiency. Meanwhile, it illustrates in great details how to link, clean and update data which paves a way for wider academic use.

One highlight of this publication is its state-of-art data linkage method. It involves a matching method containing a four-stage (251 matching rules) linking various sources with algorithm testing the matching efficiency.

The publication is also well-written with properly displayed figures and logically organised chapters. The authors manage to illustrate the complicated process via a workflow figure which greatly facilitates understanding of an interested academic.

While it is quite difficult to pick a weak point from the publication, some suggestions are provided if the authors are interested in further related research. Instead of manual correction, there might be possibility of adopting some natural language processing or other text analytical tools to deal with name mismatch issue mentioned in Section 5. The potential automation process may not only improve efficiency but also provide opportunities for text-based attribute analysis. Secondly,the authors can point a lot of areas in which this research can be used. For example, a detailed up-to-date analysis of energy efficiency analysis can be based on the updating data set. The data sets also generate great potential for international comparation studies.

In summary, the publication clearly presents an innovative house price dataset and technically advanced methodology, which will generate huge research impact on the related research area.



Note:
This review refers to round 1 of peer review.