ALLEN BRAIN ATLAS API
The Ivy Glioblastoma Atlas Project (Ivy GAP) is a foundational resource for exploring the anatomic and genetic basis of glioblastoma at the cellular and molecular levels.
Six studies were designed to identify the molecular signatures and measure heterogeneity. In Situ Hybridization (ISH) was used to screen for gene expression enriched in particular structures and cell clusters, and laser microdissection followed by RNA sequencing were used to generate the transcriptomes and identify the genetic markers.
Overall, the dataset spans 42 individual tumors. Each tumor was sub-divided into sub-blocks for processing in one of the six studies. See whitepapers for more details about the tumor specimens, tissue and informatics processing.
From the API, you can:
Download quantified ISH expression values by tumor feature
Query the ISH expression differential search services
Download RNA-Seq expression values
Query the RNA-Seq correlative and differential search services
This document provides a brief overview of the data, database organization and example queries. API database object names are in camel case. See the main API Documentation for more information on data models and query syntax.
Experimental Overview and Metadata
Experimental data from the four ISH surveys is associated with the "Glioblastoma" Product.
Multiple genes were assayed using each sub-block Specimen. The Specimen is cryosectioned into 20µm thick sections. ISH sections are interleaved with H&E sections such that every ISH is adjacent to an H&E section, yielding a single image SectionDataSet for each gene and one H&E SectionDataSet with 11-15 images.
Each sub-block Specimen is associated with a SpecimenType identifying the study to which this sub-block belongs.
Each H&E section is processed through a semi-automated annotation application that labels anatomic features using a statistical machine learning algorithm. The algorithm associates each 45x45 pixel neighborhood to a tumor feature label. Additionally, a separate algorithm counts the number of nuclei and the nuclei fraction coverage within each neighborhood.
Tumor feature statistics for each sub-block Specimen are generated by summing up the 45x45 pixel neighborhoods over all H&E images. This process produces area, normalized area, nuclei count, and nuclei fraction coverage for each sub-block Specimen and feature of interest.
Anatomic tumor features (also referred to as structures) are hierarchically organized into a tree in which a child structure is a “part of” its parent structure. Structures are assigned colors to visually emphasize their relationships in the hierarchy. See the structure ontology page for more information.
- All donors in the "Glioblastoma" Product
- All sub-block Specimens which are part of the "Anatomic Structures ISH Survey"
- All TumorFeatures in sub-block Specimen "W32-1-1-K.01"
- All ISH SectionDataSets in sub-block Specimen "W32-1-1-K.01"
From the above query, specimen "W32-1-1-K.01" has 21 ISH SectionDataSet (id=278269453 is one of them). In the web application, images from a specimen are displayed in a specimen page. All displayed information, images and structural expression values are also available through the API.
image download page to learn how to download images at different resolutions and regions of interest.See the
Figure: Specimen page of specimen "W32-1-1-K.01" showing all 21 ISH SectionDataSet on the left image panel and 10 H&E images on the right image panel. The color codes for the anatomic tumor features outlined in the H&E images are displayed on the left.
ISH Expression Quantification
For every ISH image, pixels with gene expression are detected and a grayscale mask generated. The detection algorithm is based on adaptive thresholding and mathematical morphology. Then the ISH image is registered to its closet H&E image using a multi-resolution elastic registration algorithm. Finally, feature label and nuclei count information is transferred from 45x45 pixel neighborhood in the H&E image onto the expression data.
Expression statistics for each tumor feature are computed by combining values from all the neighborhoods with the same label. This process produces expression density, intensity and energy measurements for each experiment and anatomical feature. For an anatomical feature, expression energy is defined as sum of expressing pixel divided by sum product of pixels in each neighborhood label for that feature and its nuclei fraction coverage.
- Download tumor feature expression values for POSTN SectionDataSet (id=265857641) in sub-block "W8-1-1-E.1.03"
ISH Expression Search Service
The expression search service allows users to instantly search over the ~18000 SectionDataSets to find genes with specific expression patterns:
- The Expression Search function allows users to find genes which expression in one structure (or set of structures)
- The Differential Search function allows users to find genes which have higher expression in one structure (or set of structures) compared to another structure (or set of structures)
The expression search functionality is available through the Web application and the API.
To perform an Expression Search, a user specifies a set of target structures. For each StructureDataSet, the average expression energy is computed for the target structures. The returned results are sorted in descending order of average expression energy.
See the connected service page for definitions of service::gbm_ish_expression parameters
- Find genes (SectionDataSets) with expression in pseudopalisading cells around necrosis (CTpan)
- Find genes (SectionDataSets) with expression in hyperplastic blood vessels (LEhbv, IThbv, CThbv)
- Visualize expression search in the web application
To perform a Differential Search, a user specifies a set of target structures and a set of contrast structures. For each StructureDataSet, the sum expression energy is computed for the target structures and for the contrast structures. The returned results are sorted in descending order of the ratio of the sum expression energy of the target structures over the sum expression energy of the contrast structures.
See the connected service page for definitions of service::gbm_ish_differential parameters
- Find genes (SectionDataSets) with higher expression in pseudopalisading cells around necrosis (CTpan) than cellular tumor cells (CT)
- Find genes (SectionDataSets) with higher expression in infiltrating tumor (IT) than cellular tumor (CT) and leading edge (LE)
- Visualize expression search in the web application
It should be noted that the expression quantification values used by the search services are generated by a fully automated processing pipeline. False positive and false negative results can occur due to artifacts on the tissue section or slide and/or algorithmic inaccuracies. Users should confirm results by visually inspecting the ISH images.
Experimental Overview and Metadata
Experimental data from the two RNA-Seq surveys is associated with the "Human Glioblastoma RNASeq" Product.
For the "Anatomic Structures RNA-Seq" study, sampling locations were manually identified after H&E staining. For the "Cancer Stem Cells RNA-Seq", 17 reference gene probes were used to identify 35 types of putative cancer stem cell clusters.
Each sampling site is associated with a Structure with the following naming scheme:
- XX-reference-histology: tumor feature XX sampled by reference histology
- XX-reference-genes: tumor feature XX sampled by reference gene(s)
- XX-reference-controls: tumor feature XX sampled by low expression of reference genes
Structures are organized hierarchically into a tree in which children structures are “parts of” their parent structure. Structures are assigned colors that visually emphasize the hierarchical relationships. See the structure ontology page for more information.
See whitepapers for more details about the RNA-Seq data generation and normalization. All gene and sampling site information can be accessed through the API.
- All donors in the "Human Glioblastoma RNASeq" Product
- All samples in the "Human Glioblastoma RNASeq" Product
RNA-Seq Expression Download Service
Normalized gene-level expression values can be downloaded in several ways:
- From the web application Download page
- From the connected data service in the API
Using the connected service, expression values can be obtained by specifying:
- a list of "probes" (genes),
- a list of "donors" (tumor specimens, optional), and
- a list of structures (optional)
See the connected service page for definitions of service::gbm_expression parameters.
Download expression values for gene ESM1 restricted to samples from tumor 'W1-1-2'
- Find Specimen ID for tumor 'W1-1-2' (id=703393)
- Find Gene ID for human gene ESM1 (id=10924)
- Use tumor and gene ID as parameters to service::gbm_expression
The output of the service has two top level ordered arrays "probes" and "samples". For example:
Each probe (Gene) contains information about:
- the Gene (id, name)
- the associated NCBI Gene (id, acronym, name, entrez-id), along with
- a vector of normalized expression values and z-scores in the same order as the "samples" array.
Each sample contains information about:
- the Tumor (id and name returned in the "donor" field)
- the associated Structure (id, name, acronym and color)
RNA-Seq Expression Search Service
The Differential Search finds genes that show the greatest difference in expression values between two sets (target and contrast) of user-defined structures. For each probe, a 2-sample t-test is performed followed by a Benjamini and Hochberg false discovery rate correction. The null hypothesis is that the average expression level of samples in the contrast set of structures is greater than or equal to the average expression level of samples in the target set of structures. A statistically significant result (p-value less than user-defined threshold) allows us to reject the null hypothesis and conclude that the average expression level of samples in the target set of structures is greater than the average expression level of samples in the contrast set of structures. Resulting p-values are sorted in ascending order. Search results can also be sorted by fold-change (log ratio of expression) in descending order.
The differential search functionality is available through the Web application and the API.
See the connected service page for definitions of service::gbm_differential parameters.
- Differential search for genes with higher expression in microvascular proliferation (sampled by reference histology) than in cellular tumor, infiltrating tumor, leading edge and pseudopalisading cells around necrosis.
Figure: Screenshot of top returns of a differential search for genes with higher expression in microvascular proliferation.
The Correlative Search finds genes with expression profiles similar to that of a selected seed "probe" over all samples within a user-specified structure and for user-specified tumors/donors. Pearson's correlation coefficients are computed for all probes and the results ranked in descending order.
The correlative search functionality is available through the Web application and the API.
See the connected service page for definitions of service::gbm_correlation parameters.
- Correlative search for genes with a similar expression to gene VEGFA (id=7379) over all samples