How to Run Geoquery in Limma

GEOquery and limma perform differential expression analysis using original submitter-supplied processed data tables as input. Several graphical plots help users explore and interpret results.

Plots showing the distribution of moderated t-statistic values computed during a limma test. These plots help assess the quality of the dataset. GDS2MA and GDS2eSet functions convert the data structures returned by getGEO into limma MALists and ExpressionSets respectively.

This includes appropriately making the genes slot of the resulting MAList and the featureData slot of the resulting ExpressionSet.

limma-geoquery

GSE datasets are available as a SOFT format file, and can be downloaded quickly using the getGEO function. This function parses the data and converts it into limma data structures MAList and ExpressionSet, as well as Biobase data structures featureData and phenoData.

The function also includes a boolean option to decide whether to download GPL information (to reduce data download times) and to include it in the MAList or ExpressionSet.

The function also applies a log transformation to the data. This is optional, as limma can process data values without applying a log transformation.

limma-geo2r

GEOquery and limma are widely used tools for differential expression (DE) analysis of microarray or RNA-seq data. GEOquery parses data tables from the GEO website into R data structures, and limma is a statistical test for identifying differentially expressed genes in microarray data.

It is capable of handling a wide variety of experimental designs and data types, and it includes an adjustment for multiple testing to help prevent false positives. limma also provides several graphical plots to aid in the interpretation of results.

The first step in running limma-geoquery is to load the data using the getGEO function from the GEO package. Once the data has been loaded, you can start the analysis. You will need to change the dataset ID for your own data set. For example, if the data is GSE33126, you would need to change this to your own dataset ID.

The limma-geoquery analysis will produce a table with the top differentially expressed genes along with their p-values after multiple-test correction. It will also produce a graphical display of the distribution of the p-values for all contrasts that are evaluated.

This plot helps to assess whether the p-values obtained by limma are consistent with the theoretical quantiles of a Student’s t-test distribution.

Depending on the distribution of the data, limma-geo2r may automatically log2-transform the data. This is done to better fit the assumptions of the fold-change analysis model.

You can disable this feature if you want to avoid the overhead of transforming the data. If you do enable this feature, limma-geo2r will use heuristics to determine if the data needs to be transformed.