More

How can we retrieve the values ​​of the intervals in the legend in R

How can we retrieve the values ​​of the intervals in the legend in R


I have an elevation model plotted in R Code:

library(raster) data(volcano) r <- raster(volcano) plot(r, col = topo.colors(20))

Plot:

how can we retrieve the values ​​of the intervals in the legend, that is, in the example-->100,120,140,160,180


The legend is a summary of the raster values. Therefore, you will need to extract the pertinent raster values. This should do it:

library(raster) data(volcano) r = raster(volcano) min = minValue(r) max = maxValue(r) l = c(min:max) result = l[l %% 20 == 0] > result [1] 100 120 140 160 180

Spplot : Plot methods for spatial data with attributes

aspect ratio for spatial axes defaults to "iso" (one unit on the x-axis equals one unit on the y-axis) but may be set to more suitable values if the data are e.g. if coordinates are latitude/longitude

depending on the class of obj, panel.polygonsplot (for polygons or lines), panel.gridplot (grids) or panel.pointsplot (points) is used for further control custom panel functions can be supplied that call one of these panel functions, but do read below how the argument sp.layout may help

NULL or list see notes below

if not FALSE, identify plotted objects (currently only working for points plots). Labels for identification are the row.names of the attribute table row.names(as.data.frame(obj)) . If TRUE, identify on panel (1,1) for identifying on panel i,j , pass the value c(i,j)

optional may be useful to plot a transformed value. Defaults to z

x+y|name for multiple attributes use e.g. exp(x)

x+y|name to plot the exponent of the z-variable

if FALSE, use symbol key if TRUE, use continuous, levelplot-like colorkey if list, follow syntax of argument colorkey in levelplot (see below for an example)

grob placement justification

logical if TRUE, trellis.par.set is called, else a list is returned that can be passed to trellis.par.set()

height of scale bar width is 1.0

logical if TRUE, a check is done to see if empty rows or columns are present, and need to be taken care of. Setting to FALSE may improve speed.

vector with fill colours in case the variable to be plotted is a factor, this vector should have length equal to the number of factor levels

vector with color values, default for col.regions


Autoscaling

All graphs and charts, except for the 3D graphs, automatically enable autoscaling, which means they adjust their horizontal and vertical scales to fit the data you wire to them. By default, autoscaling is enabled for graphs and charts. However, autoscaling can slow performance. Right-click the graph or chart and select X Scale»AutoScale X or Y Scale»AutoScale Y from the shortcut menu to turn autoscaling on or off.

Note For the Compass Plot, Error Bar Plot, Feather Plot, and XY Plot Matrix, select Autoscale X or Autoscale Y from the shortcut menu.

Use the Operating tool or the Labeling tool to change the horizontal or vertical scale directly.

Note LabVIEW does not include hidden plots when you autoscale the axes of a graph or chart. If you want to include the hidden plots when you autoscale, make the hidden plots transparent instead. Right-click the plot image in the plot legend and select Color from the shortcut menu to change the color of plots.

When a graph or chart scale resizes, other elements on the graph or chart move and resize. To disable this behavior so the plot area size stays fixed, right-click the graph or chart and select Advanced»Auto Adjust Scales from the shortcut menu. If you disable this behavior, the scales might clip or overlap each other.

Note The Auto Adjust Scales option does not apply to the Compass Plot, Error Bar Plot, Feather Plot, XY Plot Matrix, or the 3D graphs.


3.2 Construction of an Ordinary Life Table

Knowledge of ordinary life table construction is essential in the construction of a multiple-decrement life table. There are a number of methods available to construct an ordinary life table using data on age-specific death rates. The most common methods are those of Reed Merrell, Greville, Keyfitz, Frauenthal, and Chiang (for a discussion of these methods see Namboodiri and Suchindran, 1987).

In this section we construct an ordinary life table with data on age-specific death rates based on a simple method suggested by Fergany (1971. "On the Human Survivorship Function and Life Table Construction," Demography 8 (3):331-334). In this method the age-specific death rate ( nmx ) will be converted into the proportion dying in the age interval ( nqx ) using a simple formula:

Formula (1)

where e is the symbol for the base number of a natural log (a constant equal to 2.71828182. ) and n is the length of the age interval. ( Note: do not confuse the symbol e here with the ex 0 used in "expected life" notation.)

Once nqx is calculated with age-specific death rates, the remaining columns of the life table are easily calculated using the following relationships:

(As in Table 3.1.2, multiply Column 3 by Column 2.)
(As in Table 3.1.2, subtract Column 4 from Column 3.)
(Divide Column 4 in Table 3.1.2 by the corresponding age-specific death rate.
Note: Table 3.1.2 did not use the Fergany method.)
(Obtain cumulative sums of Column 5 in Table 3.1.2.)
(In Table 3.1.2, divide Column 6 by Column 3.)

Example Converting the Age-Specific Death Rate into the Proportion Dying in the Age Interval

Table 2.5.2 of Lesson 2.5 shows that the age-specific death rate for age group 1-4 ( 4m1 ) for Costa Rican males in 1960 is .00701 per person. (Keep in mind that tables presenting age-specific death rates will usually present the rate as "number of deaths per 1000 people," but in the calculations used in constructing an ordinary life table, the age-specific death rate is "number of deaths per person.")

Using formula (1) from above,

Fergany Method, Step by Step

In this example we use the age-specific death rates from Table 2.5.2 of Lesson 2.5 to complete the construction of a life table for 1960 Costa Rican males. We will follow the Fergany Method.

Obtain age-specific death rates. Note that age-specific death rates are per person (Column 2 of Table 2.5.2).

Convert age-specific death rates (nM x ) to the proportion dying in the age interval ( n qx ) values using the following formula (formula (1) from above):
, where n is the length of the age interval

Step 2 Examples

For age interval 0-1:
n = 1
Age-specific death rate () = .07505
= 0.072303

For age interval 1-4:
n = 4
Age-specific death rate () = 0.00701
= 0.027651

For age interval 5-9:
n = 5
Age-specific death rate () = 0.00171
= 0.008514

For age interval 85+:
Because everybody in the population eventually dies, the nqx value of this age interval is set to 1. ( nqx value for an open-ended class is always set to 1.)

Use nqx to compute lx values in Column 3.

( Note: This computational formula is easy to implement on the spreadsheet. First compute l1 and copy to the remaining cells of the column.)

Step 3 Examples

Calculate the number of deaths in age intervals () in Column 4 as:

Note: Sometimes it is easy to implement Steps 3 and 4 simultaneously:
First write = 100,000
Then calculate:

In Column 5, compute the person-years of life in the indicated age interval () as:

(Column 4 / age-specific death rate)

Step 5 Example

In Column 6, compute the cumulative person-years of life after a specified age ( Tx ):

(Sum values in Column 6 from a specified age to the end of the table.)

Step 6 Examples

The final column of the life table (Column 7) is the expectation of life at specified ages. This column is computed as:

The life table construction is complete with the implementation of Step 7.

Selected Features of the Life Table

We will examine some features of the constructed life table that are relevant to the construction and interpretation of a multiple-decrement life table:

1. Sum of the values in Column 4 will be equal to 100,000 (= ).

2. Sum of the values in Column 4 from a specified age will be equal to the value at this age as shown in Column 3.

For example:

Thus, one can interpret as the cumulative number of deaths after a specified age.

3. Age at which people in the life table cohort die is also important in our understanding of the age pattern of death. The column (Column 4 of the life table) gives the frequency distribution of the age at death in the population.

A graph of this frequency distribution will show the age pattern of death in the population. Unfortunately, this frequency distribution is given in age intervals of unequal length (and an open-ended interval at the end). Therefore a graph adjusting for the unequal age intervals is more appropriate for this life table.

Figure 3.2.1 shows the pattern of the age distribution of deaths from the life table above (Table 3.2.1). Note that in this example the open-ended age interval 85+ is closed at 85-100. The proportion of deaths in each age group is divided by the length of the age interval. The graph is drawn by connecting the values at the midpoint of each interval.

Figure 3.2.1: Age Distribution of Deaths for 1960 Costa Rican Males

The graph shows that a high proportion of the cohort dies in infancy. The deaths decrease until early adulthood, rise until age 80, and then begin to decrease again at the extreme ages. Note that the sharp decrease at the far right is due to the small number of extremely old survivors in this population.

4. The cumulative number of deaths from the beginning of life also can be calculated by summing the appropriate numbers in Column 4. For example, the number of persons in the cohort dying before reaching age 15 is:

Note that this number also can be calculated as:

Thus, proportion dying before reaching age 15 is:

Exercise 7

Note to students: This longer exercise will require the use of spreadsheet software. Good luck!

Use the data on age-specific deaths of the 1960 Costa Rican females from Exercise 5 to construct a life table using Fergany's Method as described above. (You downloaded the data file you need here as part of Exercise 5.)

Then use your constructed life table to do the following:

  1. Draw graphs of the and columns. Briefly describe these graphs.
  2. Draw a graph of the age distribution of deaths (adjusting for the unequal age intervals) using the column in the life table. Comment on the age pattern of mortality depicted in this graph.
  3. Verify that is the sum of the ndx column from age 65 to the end of the table.

Once you have finished your work, compare your results to the answer key below.

Answers To Exercises

Exercise 6

The radix of the life table is usually 100,000 but may be a different number. Where in an ordinary life table can you always look to find out what the radix is?

B. In the first row of Column 3. The radix is simply the starting number of newborns for the life table. Since Column 3 gives the starting number of people at each age interval, the first row gives the number of people starting at age 0. In this case, it is 100,000, as usual.

According to Column 7 of Table 3.1.2, a newborn in the US in 1997 may expect to reach age 76.5. Once that child gets to age 50, what age would he/she expect to reach?

C. 79.7. Column 7 tells, on average, how many more years of life are expected for people who made it to the start of the age interval. So a 50-year-old would expect another 29.7 years to live on average (50 + 29.7 = 79.7).

According to Table 3.1.2, of those born in the US in 1997 who make it to age 70, what percentage are expected to die before they reach age 75?

A. 14%. Column 2 gives the proportion of persons alive at the beginning of the age interval who die during the age interval. So a 70-year-old has a .14365 (rounded to 14%) chance of dying during the 70-75 age interval.

According to Table 3.1.2, what is the probability of a newborn in the US in 1997 surviving to age 20?

C. .986. Since Column 3 gives the number of people surviving to the beginning of the age interval (98,558 made it to age 20) and you know the number of people that started (100,000), the probability of making it to age 20 is 98,558/100,000 = .98558.

Exercise 7

Use the data on age-specific deaths of the 1960 Costa Rican females from Exercise 5 to construct a life table using Fergany's Method as described above. (You downloaded the data file you need here as part of Exercise 5.)

Then use your constructed life table to do the following:

  1. Draw graphs of the and columns. Briefly describe these graphs.
  2. Draw a graph of the age distribution of deaths (adjusting for the unequal age intervals) using the ndx column in the life table. Comment on the age pattern of mortality depicted in this graph.
  3. Verify that is the sum of the ndx column from age 65 to the end of the table.

Then use your constructed life table to do the following:

1. Draw graphs of the and columns. Briefly describe these graphs.

The proportion of people who die during the age interval is a little higher in the first two age intervals, low and flat until about age 45, and rises fairly steeply after that until it is 1.0 for the 85+ age group.

Naturally, the number of people alive at the start of each interval starts dropping more rapidly around age 45.

2. Draw a graph of the age distribution of deaths (adjusting for the unequal age intervals) using the ndx column in the life table. Comment on the age pattern of mortality depicted in this graph.

The greatest mortality rate is in the very first age interval. After the second age interval, mortality rates are low and flat before they start rising at around 47.5 (age interval midpoint), peaking at 82.5. The steep drop in the last age group is partly because of the small number of survivors and partly because it is an open-ended interval. If the table continued with five-year intervals, the drop would appear to be more gradual.

3. Verify that is the sum of the ndx column from age 65 to the end of the table.


Content

County data

The atlas for county data (www.cdc.gov/diabetes/atlas/countydata/atlas.html) displays a map of the United States showing crude and age-adjusted estimates of the prevalence and incidence of diabetes and the prevalence of obesity and physical inactivity by county. It also presents data on the prevalence of diabetes, obesity, and physical inactivity by sex. In this atlas, the user can interact with maps and data tables. The user can select an indicator to be displayed in both the map and the table by clicking on the &ldquoIndicator&rdquo button and selecting from the drop-down list. The default display shows all US counties (Figure 1). To display county data by state, the user would click on the &ldquoSelect State&rdquo button to select a state from the drop-down list of all states.

Figure 1. Screenshot of the default display of US county data on diabetes and its risk factors in the Diabetes Interactive Atlas (www.cdc.gov/diabetes/atlas/countydata/atlas.html). [A text description of this figure is also available.]

The data table can be sorted according to any column heading in the table, including county name, state name, indicator value, lower and upper confidence limits of indicator value, and total number of adults by indicator. Row or multiple rows selected on the data table will be highlighted on the map. Likewise, if the user clicks on a county or multiple counties or rolls over a county in the map, those counties are highlighted in the table. The &ldquoLegend Settings&rdquo button allows the user to choose different data classifications (ie, equal intervals, continuous, natural breaks, or quantiles) and different numbers of data classes (from 2 to 10 classes) to view an indicator. The time animation bar located near the top of the Web page allows the user to view trends over time for the United States and to select any 1 year for viewing. Other features of the atlas are functions for zooming in and out, printing, exporting, and downloading, and a tutorial, &ldquoHow to Use the Atlas.&rdquo

County rankings

The atlas for county rankings (www.cdc.gov/diabetes/atlas/countyrank/atlas.html) has all of the features of the atlas for county data. It shows a map of the United States by county and indicates whether the age-adjusted rates (of the chosen indicator) in the counties rank above or below or are no different from the US median rates. Ranks for county data on diagnosed diabetes, obesity, and physical inactivity are available however, ranks are not available for county estimates of diabetes prevalence by sex or for incidence because most of these measures have a coefficient of variation greater than 0.3. The rank estimates have large confidence intervals and are highly variable (20). These confidence intervals need to be considered before reaching conclusions about counties based on ranks. For example, in 2010 Cook County, Illinois, ranked 1,508th in the prevalence of diagnosed diabetes. However, the lower and upper limits of the rank for Cook County were 1,224th (5th percentile) and 1,774st (95th percentile).

Maps and motion charts

Maps and motion charts: the &ldquoAll States&rdquo option

The Web page &ldquoMaps and Motion Charts &mdash All States&rdquo (www.cdc.gov/diabetes/atlas/obesityrisk/atlas.html) presents more information and the default display is more complicated than other displays of the atlas. The default display (Figure 2) shows 4 images depicting data on all 50 states: 1) a choropleth map of age-adjusted diabetes prevalence (which can be switched to a table view), 2) a bubble chart of age-adjusted diabetes and obesity prevalence, 3) a bar chart of age-adjusted diabetes prevalence, and 4) a chart of the US median age-adjusted prevalence of diabetes from 1994 to 2010. In addition, the page shows a time animation bar. The default display of the state data is the national view however, the user can click on the &ldquoSelect Region&rdquo button and view state data by US census region or division. The bar chart shows the indicator value with lower and upper confidence limits for each state for each year. The degree of uncertainty for each estimate is discerned by examining the error bars, which indicate lower and upper limits. For example, a precise estimate will have a narrow interval. The time-series chart displays an orange trend line that represents the US median prevalence for each year. When the user moves the mouse over a state in the map, bubble chart, or bar chart, the trend line for that state is shown and can be compared with the US median.

Figure 2. Screenshot of the default display of the maps and motion charts on diabetes and its risk factors for all states in the Diabetes Interactive Atlas (www.cdc.gov/diabetes/atlas/obesityrisk/atlas.html). [A text description of this figure is also available.]

By clicking the &ldquoPlay&rdquo button on the time animation bar, the user can see changes in an indicator over time and across states in the map, bubble chart, and bar chart. The motion of the bubble chart allows investigation of the complex interplay of data between an indicator and a known risk factor, obesity. The parameters defined for the default motion chart data are the following: the x-axis is the age-adjusted percentage of obesity, the y-axis is the age-adjusted prevalence of diabetes, the bubble color indicates data class, and the bubble size is proportional to the number of adults with diabetes. Although the x-axis always indicates obesity, the user can select a different indicator (ie, diabetes incidence or physical inactivity) for the y-axis.

The user can select multiple states by holding down the &ldquoControl&rdquo key while clicking the mouse. Those states will be highlighted in all the data frames. Other features in the atlas include zooming in and out, printing, exporting, downloading, and an online tutorial.

Maps and motion charts: the &ldquoSelect a State&rdquo option

By clicking on &ldquoSelect a State&rdquo (www.cdc.gov/diabetes/atlas/obesityrisk/county_statelist.html), the user is taken to a page that displays the names of the 50 states, and by clicking on a state, the user is taken to county-level age-adjusted estimates for the selected state. The &ldquoSelect a State&rdquo Web pages have all of the components and functionality found in &ldquoAll States.&rdquo The atlas includes a transparent map tool that helps users who may know a city name but not a county name. By moving the tool&rsquos slider bar to high transparency, the user can find the city on the background map, and then click on the city to highlight the county and county name.


3 Answers 3

You can do something like this with simultaneous-quantile regression with a set dummies corresponding to the 4 groups. This allows you to test and construct confidence intervals comparing coefficients describing different quantiles that you care about.

Here's a toy example where we cannot reject the joint null that the 25th, 50th, and 75th quartile of car prices are all equal in all 4 MPG groups (the p-value is 0.374):

Though there seem to be large differences between group 1 and groups 2-4 for the 3 quantiles in the graph. However, this is not a lot of data, so the failure to reject with the formal test is perhaps not that surprising because of the "micronumerosity".

Interestingly, the Kruskal-Wallis test of the hypothesis that 4 groups are from the same population rejects:

Assuming that your curves represent the empirical CDFs obtained from data, the usual way to test for a difference between more than two groups would be some kind of multi-sample non-parametric test akin to the Kolmogorov-Smirnov test, or a rank-based ANOVA test like the multi-sample Kruskal-Wallis test. There are a number of papers in the statistical literature looking at multi-sample non-parameetric tests of this kind (see e.g., Kiefer 1959, Birnbaum and Hall 1960, Conover 1965, Sen 1973 for early literature). If you reduce down to a pairwise comparison of interest, you can of course use the traditional two-sample tests.

There is an R package called ksamples that implements the multi-sample Kruskal-Wallis test and some other multi-sample non-parametric tests. I am not aware of a package that does the multi-sample KS test, but others may be able to point you to additional resources.

For comparing 2 distributions at a time ("pairwise"), it's possible to find all the ranges of values for which the CDFs are statistically significantly different, while controlling the familywise error rate (FWER) at your desired level. This (new) approach is described in detail in this 2018 Journal of Econometrics paper, as well as in this 2019 Stata Journal article. R and Stata code (and open drafts of articles, and replication files) are at https://faculty.missouri.edu/

kaplandm. Both articles include examples with real data. Everything is fully nonparametric, and the "strong control" of FWER is exact even in small samples.


6 Answers 6

This is partly a response to @Sashikanth Dareddy (since it will not fit in a comment) and partly a response to the original post.

Remember what a prediction interval is, it is an interval or set of values where we predict that future observations will lie. Generally the prediction interval has 2 main pieces that determine its width, a piece representing the uncertainty about the predicted mean (or other parameter) this is the confidence interval part, and a piece representing the variability of the individual observations around that mean. The confidence interval is fairy robust due to the Central Limit Theorem and in the case of a random forest, the bootstrapping helps as well. But the prediction interval is completely dependent on the assumptions about how the data is distributed given the predictor variables, CLT and bootstrapping have no effect on that part.

The prediction interval should be wider where the corresponding confidence interval would also be wider. Other things that would affect the width of the prediction interval are assumptions about equal variance or not, this has to come from the knowledge of the researcher, not the random forest model.

A prediction interval does not make sense for a categorical outcome (you could do a prediction set rather than an interval, but most of the time it would probably not be very informative).

We can see some of the issues around prediction intervals by simulating data where we know the exact truth. Consider the following data:

This particular data follows the assumptions for a linear regression and is fairly straight forward for a random forest fit. We know from the "true" model that when both predictors are 0 that the mean is 10, we also know that the individual points follow a normal distribution with standard deviation of 1. This means that the 95% prediction interval based on perfect knowledge for these points would be from 8 to 12 (well actually 8.04 to 11.96, but rounding keeps it simpler). Any estimated prediction interval should be wider than this (not having perfect information adds width to compensate) and include this range.

Let's look at the intervals from regression:

We can see there is some uncertainty in the estimated means (confidence interval) and that gives us a prediction interval that is wider (but includes) the 8 to 12 range.

Now let's look at the interval based on the individual predictions of individual trees (we should expect these to be wider since the random forest does not benefit from the assumptions (which we know to be true for this data) that the linear regression does):

The intervals are wider than the regression prediction intervals, but they don't cover the entire range. They do include the true values and therefore may be legitimate as confidence intervals, but they are only predicting where the mean (predicted value) is, no the added piece for the distribution around that mean. For the first case where x1 and x2 are both 0 the intervals don't go below 9.7, this is very different from the true prediction interval that goes down to 8. If we generate new data points then there will be several points (much more than 5%) that are in the true and regression intervals, but don't fall in the random forest intervals.

To generate a prediction interval you will need to make some strong assumptions about the distribution of the individual points around the predicted means, then you could take the predictions from the individual trees (the bootstrapped confidence interval piece) then generate a random value from the assumed distribution with that center. The quantiles for those generated pieces may form the prediction interval (but I would still test it, you may need to repeat the process several more times and combine).

Here is an example of doing this by adding normal (since we know the original data used a normal) deviations to the predictions with the standard deviation based on the estimated MSE from that tree:

These intervals contain those based on perfect knowledge, so look reasonable. But, they will depend greatly on the assumptions made (the assumptions are valid here because we used the knowledge of how the data was simulated, they may not be as valid in real data cases). I would still repeat the simulations several times for data that looks more like your real data (but simulated so you know the truth) several times before fully trusting this method.


ACKNOWLEDGEMENTS

This paper was conceived as part of the “Filling in gaps in global understanding of ecological stability and coexistence” (FIGS) workshop, funded by a UFZ Program Synthesis Grant, UFZ IP-11 Project Integration Funds, and the “TULIP” French Laboratory of Excellence (ANR-10-LABX-41 ANR-11-IDEX-0002-02). The contributions of ATC, SH, JMC, and AM, were also supported, in part, by the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig funded by the German Research Foundation (FZT 118). ML, CdM, and YRZ were supported by the BIOSTASES Advanced Grant funded by the European Research Council under the European Union's Horizon 2020 research and innovation programme (grant agreement no. 666971). GB was funded by the Swedish Research Council (grant VR 2017-05245) LGS was funded by an NSF EPSCoR Track 1 RII grant (NSF award #EPS-1655726). The authors also thank K. Thompson, the Harpole and Chase lab groups at iDiv and UFZ, and the Hillebrand lab group at HIFMB, for helpful feedback on earlier drafts of this manuscript. Open Access funding enabled and organized by Projekt DEAL.


How can we retrieve the values ​​of the intervals in the legend in R - Geographic Information Systems

Raster analysis is similar in many ways to vector analysis. However, there are some key differences. The major differences between raster and vector modeling are dependent on the nature of the data models themselves. In both raster and vector analysis, all operations are possible because datasets are stored in a common coordinate framework. Every coordinate in the planar section falls within or in proximity to an existing object, whether that object is a point, line, polygon, or raster cell.

In vector analysis, all operations are possible because features in one layer are located by their position in explicit relation to existing features in other layers. Inherent in the arc-node vector data model is chiralty, or left- and right-handedness of arcs (as shown in the polygon data model image from Spatial Data Model). As a corollary to this, containment and overlap are inherent relationships between layers. For example, a point on one layer is on one side of an arc in another layer, or inside or outside of a polygon in yet another layer. The complexity of the vector data model makes for quite complex and hardware-intensive operations.

Raster analysis, on the other hand, enforces its spatial relationships solely on the location of the cell. Raster operations performed on multiple input raster datasets generally output cell values that are the result of computations on a cell-by-cell basis. The value of the output for one cell is usually independent of the value or location of other input or output cells. In some cases, output cell values are influenced by neighboring cells or groups of cells, such as in focal functions.

Raster data are especially suited to continuous data. Continuous data change smoothly across a landscape or surface. Phenomena such as chemical concentration, slope, elevation, and aspect are dealt with in raster data structures far better than in vector data structures. Because of this, many analyses are better suited or only possible with raster data. This section and the next section will explain the fundamentals of raster data processing, as well as some of the more common analytical tools.

ArcGIS can deal with several formats of raster data. Although ArcGIS can load all supported raster data types as images, and analysis can be performed on any supported raster data set, the output of raster analytical functions are always ArcInfo format grids. Because the native raster dataset in ArcGIS is the ArcInfo format grid, from this point on, the term grid will mean the analytically enabled raster dataset.

ArcGIS 's interface to raster analysis is through the Spatial Analyst Extension. The Spatial Analyst, when loaded, provides additions to the ArcGIS GUI, including new menus, buttons, and tools. The features added to ArcGIS with the Spatial Analyst are listed here.

Grid layers are graphical representations of the ArcGIS and ArcInfo implementation of the raster data model. Grid layers are stored with a numeric value for each cell. The numeric cell values are either integer or floating-point. Integer grids have integer values for the cells, whereas floating-point grids have value attributes containing decimal places.

Cell values may be stored in summary tables known as Value Attribute Tables (VATs) within the info subdirectory of the working directory. Because the possible number of unique values in floating-point grids is high, VATs are not built or available for floating-point grids.

VATs do not always exist for integer grids. VATs will exist for integer grids that have:

  • a range of values (maximum minus minimum) less than 100,000 and
  • a number of unique values less than 500

It is possible to convert floating-point grids to integer grids, and vice versa, but this frequently leads to a loss of information. For example, if your data have very precise measurements representing soil pH, and the values are converted from decimal to integer, zones which were formerly distinct from each other may become indistinguishable.

Grid zones are groups of either contiguous or noncontiguous cells having the same value.

Grid regions are groups of contiguous cells having the same value. Therefore, a grid zone can be composed of 1 or more grid regions.

Although Raster Calculations (which will be discussed shortly) can be performed on both integer and floating-point grids, normal tabular selections are only possible on integer grids that have VATs. This is because a tabular selection is dependent on the existence of a attribute table. Those grids without VATs have no attribute tables, and are therefore unavailable for tabular selections.

Grid layer properties

Grid layer properties can be determined by viewing Properties.

The General tab shows the Layer Name as it appears in the Table of Contents

The Source tab shows the Data Source file location and a number of other pieces of information, such as the Cell Size, the number of Rows and Columns, the grid Type (Float or Integer), and the Status (Temporary or Permanent).

The Extent tab shows the lower-left and upper-right coordinates.

The Display and Symbology tabs are used to alter the display of the layer.

Adding grid layers to data frames

Grid layers are added to data frames in the same manner as feature or image layers, by using the File > Add Data menu control, the Add Layer button , or by dragging from ArcCatalog. Grid data sources can be added to any ArcMap document. However, in order to load grid data sources for analysis into a data frame within the map document, the Spatial Analyst Extension must be loaded.

Also, in order to access many Spatial Analyst functions, it is necessary to add the Spatial Analyst toolbar.

If the Spatial Analyst Extension is not loaded, it is still possible to add grid data sources to a data frame, but only as simple images. Image layers cannot be queried or analyzed in any way. Image layers are usually not associated with any meaningful attribute values, other than a simple numeric value used for color mapping.

Displaying grid layers

Grid layer displays are altered in almost exactly the same manner as feature layers. Changes to the display of grid layers are done using the Legend Editor. Like polygon feature layers, shading of fills can be changed by altering the symbols of individual classes, by changing the Color Ramp, legend labels, and classification properties. One exception is that grids cannot be displayed with anything other than a solid fill symbol.

Here, the Pack Forest elevation floating-point elevation grid is displayed with in 5 equal-interval, natural breaks classes, with a gray monochromatic color scheme. Note that the No Data class is not included in the 5 classes.

Here the legend has been changed to a Stretched Color Ramp (an option not available for vector data).

Examining cell values in grid layer

As with vector data, to see the spread of values for a grid, view the layer properties. The histogram displays cell values on the X-axis and cell counts on the Y-axis.

For all grid layers, individual cell values can be queried using the Identify tool . Clicking on a cell for the active grid layer will display the attribute values for the layer. The Identify Results dialog will display the name of the grid layer, the X and Y coordinates of the cell, and the cell's value.

For integer layers with VATs, it is possible to perform tabular selections. Here are all cells with an elevation between 1000 and 1500 ft. In order to make the selection it is necessary to open the VAT and perform the Select By Attributes in table Options.

As with normal feature layer selections, cells meeting the query criteria are displayed in the default selection color.

Managing grid layer files

When the Spatial Analyst performs operations that create new grids on the fly, these new grids are by default temporarily stored in the working directory. If the layer is deleted from the data frame, the grid will also be deleted from the disk. Frequently, grid queries and analyses are not formatted properly in order to obtain the desired result. The incorrect grid can be deleted from the map document, and it will also be removed from the file system (unlike shapefiles, which need to be manually deleted). After the correct result is obtained, the new temporary grid can be saved permanently. In order to make sure that newly created grids are saved, right-click and select Make Permanent. When you do save grid layers, you can choose the file system directory and the name of the layer, rather than accepting the default name and location of the dataset assigned by ArcGIS.

If there are permanently stored grids in a map document, and these are deleted from the map document, they will not be automatically deleted from the disk. If you want to delete the data source you will need to manually delete in the same manner that you manually delete shapefiles or other data sources (that is, with ArcCatalog). Be aware of this, because grid dataset files are very large in size, and can easily fill up a drive, especially a puny 128 MB removable drive.

In order to be able to copy, rename, or delete a layer, all references to the layer must be removed from the map document. Sometimes, even if the layer is removed from the data frame and the attribute table is deleted, ArcGIS "holds on" to a layer. In these cases, it becomes necessary to completely close a ArcGIS entirely before a data source can be deleted.

If you need to delete a grid data source, never use the operating system, use only ArcCatalog. Otherwise you will end up corrupting the file system by leaving "junk" data in the info directory. Cleaning up after this requires the use of ArcInfo's command-line interface.

There are limitations for storing grid data sources you should be aware of:

No spaces in directory or file names! This is a requirement of the complete pathname to a grid data source. Here is an unacceptable pathname:

C:projectsdatagrid datasoil_loss

and an acceptable pathname

13 character limitation on grid names. Here is an unacceptable grid name:


Research involving human participants: All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Rui Li is an associate professor at the Department of Geography and Planning, State University of New York at Albany. He received his doctoral degree in Geography from The Pennsylvania State University. His main research interest is geographic information science with a special focus on the interaction among map representation, environments, and human spatial behaviors. He investigates the spatial cognition and user experiences with geospatial technologies and utilizes findings to inform the design and implementations of cognitively efficient technologies including navigation systems.