Understanding Shapefile (.shp) File Format

Understanding Shapefile (.shp) File Format

shpShapefile is a vector data format for storing geographical data and associated attribute information. It is developed and regulated by Esri as open specification for data interoperability among Esri and other GIS software products. Shapefile can be point, line, or polygon features such as:

Point Features: Well, Post Office, Temple, Hospital, Mosque, School

Linear Features: Road, River, Highways, Rail track, Street, coastlines

Areal Features: Pond, Soil Type, Lake, Reserved Forest, political boundaries, state or county boundaries, climate zones.

Point features are zero dimensional and have no area. Line or linear features are one dimensional with no area and area features are two dimensional with an area.

shps-to-layers

These are few resources for freely available shapefiles are:

Shapefiles can be created by exporting any data source to a shapefile, digitizing shapes directly, using programming software, or by creating a program.

In shapefile, each feature represents a single geographic feature and its attributes. Attributes are held in a dBASE format file. Each attribute has a one-to-one relationship with the associated element or feature.

Shapefile consists of several supporting files. There are three major files i.e. a main file that contains the feature geometry (.shp), an index file that stores the index of the feature geometry (.shx), and a dBASE table (.dbf) that stores the attribute information of features. An inclusive list of major files as follows:

Mandatory files:

  1. .shp — shape format; the feature geometry itself. Variable-record-length file in which each record describes a shape with a list of its vertices.
  2. .shx — shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly.  The index file (.shx) contains a 100-byte header followed by 8-byte, fixed-length records.
  3. .dbf — attribute format; one-to-one relationship between geometry and attributes is based on record number. Attribute records in the dBASE file must be in the same order as records in the main file.

Optional files:

  1. .prj — provides projection information and coordinate system;, a plain text file describing the projection using well-known text format
  2. .sbn and .sbx — a spatial index of the features
  3. .fbn and .fbx — a spatial index of the features for shapefiles that are read-only
  4. .ain and .aih — an attribute index of the active fields in a table
  5. .ixs — a geocoding index for read-write shapefiles
  6. .mxs — a geocoding index for read-write shapefiles (ODB format)
  7. .atx — an attribute index for the .dbf file in the form of shapefile.columnname.atx (ArcGIS 8 and later)
  8. .shp.xml — contain geospatial metadata in XML format.
  9. .cpg — used to specify the code page (only for .dbf) for identifying the character encoding to be used

Merits of Shapefile

  • Shapefiles are simple as they store the primitive geometric data types of points, lines, and polygons. In the shapefile, table of records will store properties/attributes for each primitive shape. Elements (points/lines/polygons) together with data attributes can create infinitely many representations about geographic data.
  • There are many non-ESRI applications that can view, use and output instances of the Shapefile format, although the instances that are output can easily be corrupted, and may not be properly formatted.
  • They have advantages over other data sources such as faster drawing speed and edit ability.
  • Shapefiles handle single features that overlap or those are noncontiguous.
  • They require less disk space and are easier to read and write.

Limitation of Shapefile

  • Shapefiles do not have the ability to store topographical information. ArcInfo coverages and personal/file/enterprise geodatabases do have the ability to store feature topology.
  • The edges of a polyline or polygon are composed of points. The spacing of the points implicitly determines the scale at which the feature is useful visually. Exceeding that scale results in jagged representation. Additional points would be required to achieve smooth shapes at greater scales. For features better represented by smooth curves, the polygon representation requires much more data storage than, for example, splines, which can capture smoothly varying shapes efficiently. None of the shapefile types supports splines.
  • The size of both .shp and .dbf component files cannot exceed 2 GB (or 231 bits) — around 70 million point features at best
  • A shape file is physically capable of storing a mixture of different shape types.

Reference:

http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf

http://www.digitalpreservation.gov/formats/fdd/fdd000280.shtml

 Note: The above content is collected information from various resources, with a single objective to provide comprehensive knowledge in single article.

Categories: GIS

About Author

GIS Resources

GIS Resources is an initiative of Spatial Media and Services Enterprises with the purpose that everyone can enrich their knowledge and develop competitiveness. GIS Resources is a global platform, for latest and high-quality information source for the geospatial industry, brings you the latest insights into the developments in geospatial science and technology.

Comments

  1. dosala narsinga rao
    dosala narsinga rao 20 April, 2014, 00:12

    Very Informative, Good for beginners.

    Reply this comment

Write a Comment

Your e-mail address will not be published.
Required fields are marked*

This site uses Akismet to reduce spam. Learn how your comment data is processed.