Allen Brain Atlas API
The Allen Mouse Brain Atlas provides genome-wide in situ hybridization (ISH) image data for approximately 20,000 genes in adult mice. Each data set is processed through an informatics analysis pipeline to obtain spatially mapped quantified expression information.
From the API, you can:
Download quantified expression values by structure
Download quantified expression values as 3-D grids
Query the differential and correlative search services
Query the image synchronization service
This document provides a brief overview of the data, database organization and example queries. API database object names are in camel case. See the main API documentation for more information in data models and query syntax.
Experimental Overview and Metadata
Experimental data from this atlas is associated with the "Mouse Brain" Product.
Multiple genes were assayed using each Specimen. Typically, the sectioning scheme divided each brain into eight interleaving SectionDataSets with 200 µm sampling density (= 8 x 25 µm thickness).
Each gene was assayed with at least one sagittal SectionDataSet. A subset of genes also has a coronal SectionDataSet and/or replicate experiments. A sagittal SectionDataSet spans the left hemisphere starting with the most lateral section located at where the hippocampus starts to appear to just past the midline yielding ~20 SectionImages at 200 µm sampling density. A coronal SectionDataSet spans both hemispheres starting with the most posterior section showing the cerebellum and hindbrain to the most anterior section showing the olfactory bulb, yielding ~60 SectionImages at 200 µm sampling density. The left side of a coronal SectionImage corresponds to the left hemisphere.
A manual QC protocol defines the criteria for failing experiments due to production issues, discarding damaged SectionImages, verifying and adjusting the tissue bounding boxes, as well as for identifying "dark" artifacts such as bubbles and tears.
From the API, detailed information about Genes, Probes, SectionDataSets and SectionImages can be obtained using RMA queries.
From the above query, gene Pdyn has one sagittal SectionDataSet (id=69782969) and one coronal SectionDataSet (id=71717084). In the web application, images from the experiment are visualized in an experiment detail page. All displayed information, images and structural expression values is also available through the API.
Figure: Experiment detail page of a sagittal SectionDataSet (id=69782969) for gene Pdyn showing meta-information, images and computed structure expression graph.
Mouse Brain ISH Data: Informatics Data Processing Pipeline
The Allen Mouse Brain Atlas (the Atlas) provides genome-wide in situ hybridization (ISH) data for approximately 20,000 genes in male P56 C57BL/6J mice. The informatics data processing pipeline produces results that enable the navigation, analysis and visualization of this data. The informatics data processing pipeline consists of the following components:
- a 3-D reference model,
- an alignment module,
- an expression detection module,
- an expression gridding module, and
- a structure unionizer module.
The output of the pipeline is quantified expression values at a grid voxel level and at a structure level according to the integrated reference atlas ontology. The grid level data are used downstream to provide an on-the-fly differential and correlative gene search service and to support visualization of spatial relationships.
3-D Reference Models
The backbone of the automated pipeline is an annotated 3-D reference space based on the same Specimen used for the coronal plates of the integrated reference atlas. A brain volume was reconstructed from the SectionImages using a combination of high frequency section-to-section histology registration with low-frequency histology to (ex-cranio) MRI registration. This first-stage reconstructed volume was then aligned with a sagittally sectioned Specimen. Once a straight mid-sagittal plane was achieved, a synthetic symmetric space was created by reflecting one hemisphere to the other side of the volume.
The 3-D ReferenceSpace is in PIR orientation (+x = posterior, +y = inferior, +z = right). Over 800 Structures were extracted from the 2-D coronal reference atlas plates and interpolated to create symmetric 3-D annotations. Structures in the reference atlas are arranged in a hierarchical organization. Each structure has one parent and denotes a "part-of" relationship. Structures are assigned a color to visually emphasize their hierarchical positions in the brain.
Figure: The 3-D reference space is in PIR orientation where x axis = Anterior-to-Posterior, y axis = Superior-to-Inferior and z axis = Left-to-Right.
Three volumetric data files are available for download:
- atlasVolume: uchar (8bit) grayscale Nissl volume of the reconstructed brain at 25 µm resolution.
- annotation: ushort (16bit) structural annotation volume at 25 µm resolution. The value represents the ID of the finest level structure annotated for the voxel. Note: the 3-D mask for any structure is composed of all voxels annotated for that structure and all of its descendents in the structure hierarchy.
- gridAnnotation - 200 µm: ushort (16bit) structural annotation volume at grid (200 µm) resolution for gene expression analysis.
- gridAnnotation - 100 µm: ushort (16bit) structural annotation volume at grid (100 µm) resolution for projection analysis.
Volumetric data is stored in an uncompressed format with a simple text header file in MetaImage format.
Example Matlab code snippet to read in the 25µm atlas and annotation volume:
Example Matlab code snippet to read in the 200 µm grid annotation volume:
The aim of image alignment is to establish a mapping from each SectionImage to the 3-D reference space. The module reconstructs a 3-D Specimen volume from its constituent SectionImages and registers the volume to the 3-D reference model by maximizing image correlation.
Once registration is achieved, information from the 3-D reference model can be transferred to the reconstructed Specimen and vice versa. The resulting transform information is stored in the database. Each SectionImage has an Alignment2d object that represents the 2-D affine transform between a image pixel position to a location in the Specimen volume. Each SectionDataSet has an Alignment3d object that represents the 3-D affine transform between a location in the Specimen volume and point in the 3-D reference model.
Spatial correspondence between any two SectionDataSets from different Specimens can be established by composing these transforms. For convenience, a set of "Image Sync" API methods is available to find corresponding position between SectionDataSets, the 3-D reference model and structures. Note that all locations on SectionImages are reported in pixel coordinates and all locations in 3-D ReferenceSpaces are reported in microns. These methods are used by the Web application to provide the image synchronization feature in the multiple image viewer (see Figure). Its usage is also demonstrated in the "Image Sync" example application.
- Fetch alignment transforms parameters for the sagittal Pdyn SectionDataSet
- Sync a location between the sagittal and coronal Pdyn SectionDataSets
Figure: Point-based image synchronization on the Web application.
Multiple SectionDataSets in the Zoom-and-Pan (Zap) viewer can be synchronized to the same approximate location in both sagittal and coronal planes. Screenshots taken before and after synchronization show genes Dpp6 and Myo16 and the relevant coronal and sagittal plates of the reference atlas. Gene Myo16 shows enriched expression in the medial habenula (MH).
For every ISH SectionImage, a grayscale mask is generated that identifies pixels corresponding to gene expression. The detection algorithm is based adaptive thresholding and mathematical morphology. The expression mask image is the same size and pixel resolution as the primary ISH image and can be downloaded.
Figure: Web application presentation of expression detection for gene Pde10a.
Screenshot of expression detection mask for Pde10a showing dense high expression in the striatum and low expression in the isocortex. The intensity is color-coded to range from blue (low expression intensity), through green (medium intensity) to red (high intensity).
For each SectionDataSet, the Gridding module creates a low resolution 3-D summary of the gene expression and projects the data to the common coordinate space of the 3-D reference model. Casting all data into a canonical space allows for easy cross-comparison of gene expression data from every Product. The expression data grids can also be viewed directly as 3-D volumes or used for analysis (i.e. differential and correlative searches).
Each image in a SectionDataSet is divided into a 200 x 200 µm grid. Pixel-based gene expression statistics are computed using information from the primary ISH and the expression mask:
- expression density = sum of expressing pixels / sum of all pixels in division
- expression intensity = sum of expressing pixel intensity / sum of expressing pixels
- expression energy = expression intensity * expression density
Each per-image 2-D expression grid is smoothed and rotated to form a 3-D grid. Z--direction smoothing is applied to the 3-D grid which is then transformed into the standard reference space. Grid data can be downloaded for each SectionDataSet using the 3-D Expression Grid Data Service. The service returns a zip file containing the volumetric data for expression density, intensity and/or energy in an uncompressed format with a simple text header file in MetaImage format. Structural annotation for each grid voxel can be obtained via the ReferenceSpace gridAnnotation volume file.
Note: while the reference space spans both hemispheres, sagittal SectionDataSets only span the left hemisphere. Voxels with no data are assigned a value of "-1".
- Download expression energy grid file for the coronal Pdyn SectionDataSet
- Download expression density and intensity grid files for the same SectionDataSet
The expression data grid can be viewed in the Brain Explorer® 2 desktop program. Each grid voxel is rendered as a colorized sphere where the diameter represents expression energy and the color encoding expression intensity. In addition, a preview of the expression data grid is shown on the Web application as a series of maximum density projection images.
Example Matlab code snippet to read in the 200 µm energy grid volume:
Expression statistics can be computed for each structure delineated in the reference atlas by combining/unionizing grid voxels with the same 3-D structural label. While the reference atlas is typically annotated at the lowest level of the ontology tree, statistics at upper level structures can be obtained by combining measurements of the hierarchical children. This process produces expression density, intensity and energy measurements for each experiment and structures of interest.
StructureUnionize data is used in the web application to display expression summary bar graphs for a set of coarse structures. Its usage is also demonstrated in the "structure networks" example application.
Expression Grid Search Service
A novel on-the-fly expression grid search service has been implemented to allow users to instantly search over the ~25,000 SectionDataSets to find genes with specific expression patterns:
- The Differential Search function allows users to find genes which have higher expression in one structure (or set of structures) compared to another structure (or set of structures).
- The NeuroBlast function enables the user to find genes that have a similar spatial expression profile to a seed gene when compared over a user-specified domain.
The expression grid search service is available through both the Web application and API.
To perform a Differential Search, a user specifies a set of target structures and a set of contrast structures. In the service, the set of voxels belonging to any of the target structures forms the target voxel set, and voxels belonging to any of the contrast structures form the contrast voxel set. For each SectionDataSet a fold change is computed as the ratio of average expression energy in the target voxel set over the average expression energy in the contrast voxel set. The return list is sorted in descending order by fold-change.
Example: Differential search for genes with higher expression in the thalamus than the isocortex
- Pipe1: Set up the contrast structure list by finding the structure isocortex within the Mouse Brain ontology
- Pipe2: Set up the target structure list by finding the structure thalamus within the Mouse Brain ontology
- Connect the two pipes toservice::mouse_differentialto perform the differential search
- Visualize the same search result in the web application
Figure: Screenshot of top returns of a differential search for genes with higher expression in the thalamus than the isocortex. Mini-expression summary graphs show enrichment in the thalamus (red) compared to other brain regions.
To perform a NeuroBlast search, a user selects a seed SectionDataSet and a domain over which the similarity comparison is to be made. All voxels belonging to any of the domain structures forms the domain voxel set. Pearson's correlation coefficient is computed between the domain voxel set from the seed SectionDataSet and every other SectionDataSet in the Product. The return list is sorted by descending correlation coefficient.
Example: NeuroBlast search for gene with similar expression to the sagittal Pdyn SectionDataSet
- Pipe: Set up the seed SectionDataSet by finding the sagittial SectionDataSet for gene Pdyn
- Connect the pipe toservice::mouse_correlationto perform the NeuroBlast search
- Visualize the same search result in the Web application
Figure: Screenshot of top returns of a NeuroBlast search for genes with similar expression as the sagittal Pdyn SectionDataSet.
In order to perform these computations quickly over the entire data set, a subset of voxels are loaded in memory. The full expression grid is 67x41x58=159,326 voxels spanning both hemispheres and includes background voxels. To load all voxels for all image series into memory would require 14GB of RAM. To reduce memory requirements and increase the efficiency of calculations, voxels spanning over 80% of all experiments were identified. Only these ~26,000 voxels were then used in the ''full'' search service requiring 4 GB of RAM and partially spanning one hemisphere.
To take advantage the data on both hemispheres in coronal data, a second ''coronal only'' search service is also available as an option. The coronal service spans both hemispheres covering 58,387 voxels and searches over the ~4,000 coronal image series.
It should be noted that this on-the-fly search service is derived from a fully automated processing pipeline. False positive and false negative results can occur due to artifacts on the tissue section or slide and/or algorithmic inaccuracies. Users should confirm results with visual inspection of the ISH images.