General feature format
In bioinformatics, the general feature format is a file format used for describing genes and other features of DNA, RNA and protein sequences.
GFF Versions
The following versions of GFF exist:- , generally deprecated
- * , a derivative used by Ensembl
- * , with additional pragmas and attributes for sequence_alteration features
GFF general structure
All GFF formats are tab delimited with 9 fields per line. They all share the same structure for the first 7 fields, while differing in the content and format of the ninth field. The general structure is as follows:Position index | Position name | Description |
1 | sequence | The name of the sequence where the feature is located. |
2 | source | Keyword identifying the source of the feature, like a program or an organization. |
3 | feature | The feature type name, like "gene" or "exon". In a well structured GFF file, all the children features always follow their parents in a single block. In GFF3, all features and their relationships should be compatible with the . |
4 | start | Genomic start of the feature, with a 1-base offset. This is in contrast with other 0-offset half-open sequence formats, like BED. |
5 | end | Genomic end of the feature, with a 1-base offset. This is the same end coordinate as it is in 0-offset half-open sequence formats, like BED. |
6 | score | Numeric value that generally indicates the confidence of the source in the annotated feature. A value of "." is used to define a null value. |
7 | strand | Single character that indicates the strand of the feature; it can assume the values of "+", "-",, ".". |
8 | phase | phase of CDS features; it can be either one of 0, 1, 2 or ".". See the section below for a detailed explanation. |
9 | attributes | All the other information pertaining to this feature. The format, structure and content of this field is the one which varies the most between the three competing file formats. |
The 8th field: phase of CDS features
Simply put, CDS means "CoDing Sequence". The exact meaning of the term is defined by Sequence Ontology. According to the GFF3 specification:Meta Directives
In GFF files, additional meta information can be included and follows after the ## directive. This meta information can detail GFF version, sequence region, or species.GFF software
Servers
Servers that generate this format:Clients
Clients that use this format:Name | Description | Links |
GBrowse | GMOD genome viewer | |
IGB | Integrated Genome Browser | Integrated Genome Browser |
Jalview | A multiple sequence alignment editor & viewer | Jalview |
STRAP | Underlining sequence features in multiple alignments. Example output: | |
JBrowse | JBrowse is a fast, embeddable genome browser built completely with JavaScript and HTML5 | |
ZENBU | A collaborative, omics data integration and interactive visualization system |
Validation
The modENCODE project hosts an with generous limits of 286.10 MB and 15 million lines.The Genome Tools software collection contains a gff3validator tool that can be used offline to validate and possibly tidy GFF3 files. An is also available.