How to create a database for Charity Commission data
NCVO uses the data from the Charity Commission which is publicly available to produce the Almanac. We have put this guide together to help you create your own database using the Charity Commission register and make the data accessible to you and your organisation.
It will tell you:
- what data is available
- how to access and download the data
- how to get the data into a database.
What data is available
The NCVO research team receives a lot of requests on charities, including:
- queries on the number of charities in a local authority area
- which charities work with children and young people
- what are the top 100 charities by assets, and many more.
To be able to answer these questions and provide the most up-to-date information on charities, we created a database using data that is publicly available from the Charity Commission for England and Wales.
This data is provided as a data dump and updated on a monthly basis. As charities have different financial periods and are allowed to submit their accounts to the Charity Commission within ten months after the end of their financial year, these updates are crucial to us.
The data dump contains the following information.
- All registered charities in England and Wales
- Additional information on each charity
- Financial data for each charity
- Additional financial data for charities with an income of over £500k (‘part b’)
The Charity Commission has also set up an API that allows users to search and retrieve information from the database.
How to access and download the data
Each month, the Charity Commission provide two zip files to download from their site.
- Charity register extract
- Table build scripts
The ‘charity register extract’ contains all the charity data described above.
The charity register extract data is stored in a ‘.bcp’ format, which stands for Bulk Copy Program. This file format is used to import data into a Microsoft SQL Server database. However, this is an enterprise piece of software and as such organisations need to pay for it.
Microsoft provide a free 'express' version of the software but it does have size limitations (maximum of 10GB of data storage) that can have significant impact if data capacity needs are not properly anticipated.
Most likely, the Charity Commission use Microsoft SQL Server but at NCVO we use ‘MySQL’, an open-source database server, which provides an excellent free alternative. However, in order to import the data, the .bcp files first need to be translated into MySQL format like CSV.
If you are using Microsoft SQL Server, you have the option to import the data in the .bcp source files into a Microsoft SQL Server database. But as we do not use this method, we are unable to give you any guidance on this.
How to get the data into a database
David Kane, a former researcher at NCVO, has created a ‘GitHub repository’ – a digital storage space that includes all the necessary files, links, instructions and Python scripts.
All the files are free to download and the instructions will take you through the following steps:
- Download files from the Charity Commission
- Convert .bcp files to CSV files
- Create database for the data
- Import data in the CSV files to the database
The instructions rely on knowledge running Python programs and querying databases using SQL. If you are not too familiar with Python and SQL, you may need to do some additional research.
In terms of getting started with Python, you can install Anaconda, which provides you with most Python libraries you will need.