AMA Manual of Style - Stacy L. Christiansen, Cheryl Iverson 2020
Genetics
Nomenclature
14.6.1 Nucleic Acids and Amino Acids.
Standards for molecular nomenclature are set jointly by the International Union of Biochemistry and Molecular Biology (IUBMB) and the International Union of Pure and Applied Chemistry (IUPAC).1 The recommendations in this section are based on conventions put forth by the IUBMB-IUPAC Joint Commission on Biochemical Nomenclature and the Nomenclature Committee of the IUBMB.2
14.6.1.1 DNA.
The nucleic acids DNA and RNA are nucleotide polymers. DNA is the molecule forming the substrate for the genetic code and is contained in the chromosomes of higher organisms. It is made up of (1) molecules called bases, (2) the sugar 2-deoxyribose, and (3) phosphate groups. The DNA bases fall into 2 classes: pyrimidines (including cytosine and thymine) and purines (including adenine and guanine).
Structurally, DNA in the nucleus of a living cell is a double-stranded, antiparallel helical polymer of deoxyribose linked by phosphate groups; 1 of 4 bases projects from each sugar molecule of the sugar-phosphate chain. A base-sugar unit is a nucleoside. A base-sugar-phosphate unit is a nucleotide (Figure 14.6-1).
Figure 14.6-1. Nucleosides and Nucleotides: General Structure
The carbons in the sugar moiety are numbered with prime symbols, not apostrophes (eg, 3′ carbon, 5′ carbon). Sometimes chemical moieties are specified in connection with the 3′ and 5′ ends:
3′ hydroxyl end (3′ OH end)
5′ phosphate (5′ P) end
5′ OH end
(See 13.13, Elements and Chemicals.)
The phosphates that join the DNA nucleotides link the 3′ carbon of one deoxyribose to the 5′ carbon of the next deoxyribose. The end of the DNA strand with an unattached 5′ carbon is known as the 5′ end (or terminal) and the end with an unattached 3′ carbon as the 3′ end (or terminal) (Figure 14.6-2) (see 13.13, Elements and Chemicals).
Figure 14.6-2. DNA Double Helix
The carbons and nitrogens of the bases are numbered 1 through 6 (pyrimidines) or 1 through 9 (purines), and the carbons of deoxyribose are designated by numbers with prime symbols 1′ through 5′ (Figure 14.6-3).
This section presents nomenclature for nucleotides of DNA, especially nomenclature used for DNA sequences (ie, nucleotide polymers). For nomenclature of nucleotides as DNA precursors and energy molecules, see 14.6.1.3, Nucleotides as Precursors and Energy Molecules.
A 1-letter designation represents each base, nucleoside, or nucleotide (Table 14.6-1). The letters are commonly used without expansion.
Table 14.6-1. Abbreviations for DNA Nucleotides
Abbreviation |
Base |
Nucleoside; nucleotidea residue in DNA |
Molecular class |
A |
adenine |
deoxyadenosine |
purine |
C |
cytosine |
deoxycytidine |
pyrimidine |
G |
guanine |
deoxyguanosine |
purine |
T |
thymine |
deoxythymidine |
pyrimidine |
a The technical name for nucleotides is nucleoside phosphates.
The chemical structure of bases is illustrated in Figure 14.6-3.
Figure 14.6-3. DNA Bases: Chemical Structure
When a base (or nucleoside or nucleotide) is described that cannot be firmly identified as A, C, G, or T, it is most commonly reported as N (uncertain). Other single-letter designators that reflect biochemical properties may be used, but because these designations are not as well known as A, C, G, T, and N, it is best to define them (Table 14.6-2).
Table 14.6-2. Examples of Other Single-Letter Designators for Basesa
Symbol |
Stands for |
Derivation |
R |
G or A |
purine |
Y |
T or C |
pyrimidine |
M |
A or C |
amino |
K |
G or T |
keto |
S |
G or C |
strong interaction (3 hydrogen bonds) |
W |
A or T |
weak interaction (2 hydrogen bonds) |
H |
A or C or T |
not G (H follows G in the alphabet) |
B |
G or T or C |
not A (B follows A in the alphabet) |
V |
G or C or A |
not T (V follows T in the alphabet; U is not used because it stands for uracil in RNA [see 14.6.1.2, RNA]) |
D |
G or A or T |
not C (D follows C in the alphabet) |
N |
G or A or T or C |
any |
a Adapted with permission from Moss.2 Copyright IUBMB.
Various forms of DNA are commonly abbreviated as follows; expand at first use:
bDNA |
branched DNA |
cDNA |
complementary DNA, coding DNA |
dsDNA |
double-stranded DNA |
gDNA |
genomic DNA |
hn-cDNA |
heteronuclear cDNA (heterogeneous nuclear cDNA) |
mtDNA |
mitochondrial DNA |
nDNA |
nuclear DNA |
rDNA |
ribosomal DNA |
scDNA |
single-copy DNA |
ssDNA |
single-stranded DNA |
There are several classes of DNA helixes, which differ in the direction of rotation and the tightness of the spiral (number of base pairs per turn):
A-DNA (alternate DNA)
B-DNA (balanced DNA)
C-DNA (9 base pairs [bp] per turn of spiral)
D-DNA (8 base pairs [bp] per turn of spiral)
Z-DNA (zigzag)
In eukaryotic cells, DNA is bound with special proteins associated with chromosomes (see 14.6.4, Human Chromosomes). This DNA-protein complex is known as chromatin. DNA in chromatin is organized into structures called nucleosomes by proteins known as histones. The 5 classes of histones are as follows:
H1 (linker histone)
H2A (core histone)
H2B (core histone)
H3 (core histone)
H4 (core histone)
Almost all native DNA exists in the form of a double helix, in which 2 DNA polymers are paired, linked by hydrogen bonds between individual bases on each chain. Because of the biochemical structure of the nucleotides, A always pairs with T and C with G (Figure 14.6-2). Such pairs may be indicated as follows:
A • T, A = T
C • G, C ≡ G
Mispairings (which may occur as a consequence of a variant or sequence variation) may be shown in the same way:
C • T
Unpaired DNA sequences are quantified by means of the terms base (a single base), kilobase (kb, a thousand bases), and megabase (Mb, a million bases) (see 13.12, Units of Measure). Paired DNA sequences use the terms base pairs (bp), kilobase pairs (kb), megabase pairs (Mb), and gigabase pairs (Gb). (Do not use “kbp” or “Mbp.”) For example:
a 20-base fragment
a 235-bp repeat sequence
a 27-bp region
a 47-kb vector genome
1 Mb of DNA
The size of the human haploid genome is approximately 3 × 109 bp.
Sometimes the number of nucleotides in a DNA molecule is indicated using the suffix “mer”:
20mer (20 nucleotides)
24mer (24 nucleotides)
(This formation is based on the terms dimer, trimer, tetramer, etc.) It is sometimes referred to as kmer or k-mer (eg, a kmer of length 20 rather than 20mer).
A DNA sequence might be depicted as follows, with standard notation of DNA sequences from 5′ to 3′:
GGATCC means 5′ GGATCC 3′
Unknown bases may be depicted by using N (see Table 14.6-2):
GNCGANNG
Instead of N, a lowercase n or a hyphen may be used for visual clarity:
GnCGAnnG
or
G-CGA--G
A double-stranded sequence that consists of a strand of DNA and its complement would be as follows:
To show correct pairing between the bases in the 2 strands, sequences need to be aligned properly. In the sequence above, the first base pair is G • C, the next is T • A, etc. Note how the first G is directly above the first C, the first T above the first A, etc.
By convention in printed sequences, for single strands, the 5′ end is at the left and the 3′ end at the right; thus, a sequence such as the following
CCCATCTCACTTAGCTCCAATG
would be assumed to have this directionality:
5′-CCCATCTCACTTAGCTCCAATG-3′
The complementary strands of DNA have opposite directionality; by convention, the top strand reads from the 5′ end to the 3′ end, whereas its complementary strand appears below it with the 3′ end on the left. The 5′ strand is the sense strand or coding strand or positive strand. The 3′ strand is the antisense strand or noncoding strand. (Note that it is the noncoding strand that actually gets transcribed.) In the example
this directionality is implied:
Text should specify which strand, sense or antisense, is displayed. The sense strand “is the strand generally reported in the scientific literature or in databases.”3(p25)
A codon is a sequence of 3 nucleotides in a DNA molecule that (ultimately) codes for an amino acid (see below), biosynthetic message, or signal (eg, start transcription, stop transcription). Codons are also referred to as codon triplets. Examples are as follows:
CAT ATC ATT
The genetic code—typically a list or table of all the codons and the amino acids they each encode—is widely reproduced (eg, in medical dictionaries and textbooks and on the internet).
Promoter sequences are DNA sequences that define where the transcription of a gene by RNA polymerase begins. They include the following:
CAT box (CCAAT)
CG island, CpG island (CG-rich sequence)
GC box (GGGCGGG consensus sequence)
5′ UTR (5′ untranslated region) (5′ is defined below)
TATA box
Enhancers are short (50- to 1500-bp) regions of DNA that can be bound by proteins (activators) to increase the likelihood that transcription of a particular gene will occur. The κ light chain enhancer (κ enhancer), for example, contains the sequence GGGACTTTCC.
Sequences of repeating single nucleotides are named as follows:
polyA
polyC
polyG
polyT
Example: polyA tail
or, optionally, with lowercase d (within parentheses) for deoxyribose:
poly(dT)
Repeating single-nucleotide pairs (in double-stranded DNA) are similarly named:
poly(dA-dT)
poly(dG-dC)
The phosphate groups linking the nucleotides are sometimes indicated with a lowercase p:
pGpApApTpTpC
CpG island
Methylated bases may be shown with a superscript lowercase m, which refers to the nucleotide residue to the right:
GATmCC
Sequences of repeating nucleotides, also known as tandem repeats, are indicated as follows (n stands for number of repeats):
(TTAGGG)n
(GT)n
(CGG)n
Within a long sequence, the first repeat may be designated n, the next p, the next q, etc:
(TAGA)nATGGATAGATTA(GATG)pAA(TAGA)q
The number of repeats may be specified:
(GATG)2
(TAGA)12
Long sequences pose special typesetting problems. Such sequences should be depicted as separate figures, rather than within text or tables, whenever possible.
For DNA, it must be made clear whether the sequence is single-stranded or double-stranded. A double-stranded sequence such as that of the following example
might be mistaken for a single-stranded sequence and set as such:
Conversely, mistaking a single-stranded sequence for a double-stranded sequence and typesetting accordingly should also be avoided.
Always maintain alignment in 2-stranded sequences—take care that the following
does not become this:
Numbering and spacing may be used as visual aids in presenting sequences. A space every 3 bases indicates the codon triplets:
. . . GCA GAG GAC CTG CAG GTG GGG . . .
DNA sequences for protein-coding regions in most eukaryotic cells contain both exons (coding sequences of triplets) and introns (intervening noncoding sequences). An intron occurs within the sequence (examples from Cooper4[p273]):
intron: GTGAG . . . GGCAG
sequence in preceding example with intron included:
. . . GCA GAG GAC CTG CAG G GTGAG . . . GGCAG TG GGG . . .
Another way to display introns amid exons is to use lowercase letters for introns and uppercase letters for exons. There is a space on either side of the intron, and the next exon continues in the same frame or phase as before, to resume the correct codon sequence:
In longer DNA sequences, spaces every 5 or 10 bases are customary visual aids:
Several types of numbering are further aids. In the following example (from Cooper,4(p133); “lowercase letters indicate uncertainty in the base call”), numbers on the left specify the number of the first base on that line:
Alternatively, numbers may appear above bases of special interest:
When the base number is large, the right-most digit should be directly over the base being designated:
When a long sequence is run within text, use a hyphen at the right-hand end of the line to indicate the bond linking successive nucleotides:
A hyphen is not necessary if spacing is used, as long as the break between groups occurs at the end of the line. The DNA sequence may be displayed as follows:
Recognition sequences are sections of a sequence recognized by proteins such as restriction enzymes, which cleave DNA in specific locations (see 14.6.1.4, Nucleic Acid Technology). To indicate sites of cleavage, virgules or carets may be used:
single-stranded:
C^TCGTG
C/TCGTG
double-stranded:
CGWCG^
^GCWGC
CGWCG/
/GCWGC
Other conventions should be defined, in parentheses for text or in legends for tables and figures.5 For example:
CACNN↓NNGTG (↓ indicates cleavage at identical position in both strands)
14.6.1.1.1 Sequence Variations, Nucleotides.
Recommendations for sequence variation (mutation) nomenclature have been one of the major activities of the HUGO Mutation Database Initiative, now the Human Genome Variation Society (HGVS).6 Members devised the nomenclature after extensive community discussion.7,8,9,10,11,12 Authors should consult the Recommendations page of the HGVS website for the latest recommendations,6 the HGVS Simple section of the HGVS website,13 and the 2016 update.14 Basic style points are as follows (see 14.6.1.5.1, Sequence Variations, Amino Acids):
■For sequence variations described at the nucleotide level, the nucleotide number precedes the capital-letter nucleotide abbreviation.
12345A>T
■Numbers at the end of the term, if any, do not stand for the nucleotide number but rather indicate numbers of nucleotides involved in the change or, in the case of repeated sequences, numbers of repeats.
c.54GCA[21] [an allele of 21 GCA repeats]
■The nucleotide number should be preceded by g plus dot (g.) for gDNA (genomic), c plus dot (c.) for cDNA (complementary or coding), n plus dot (n.) for noncoding, r plus dot (r.) for RNA, m plus dot (m.) for mitochondrial, or p plus dot (p.) for protein.
■The symbol > is used for substitutions. The following abbreviations are used: ins, insertion; del, deletion; indel, insertion and deletion; dup, duplication; inv, inversion; con, conversion; and t, translocation. An underscore character separates a range of affected nucleotide residues.
c.4375C>T [C nucleotide at position 4375 changed to a T]
c.4375_4379del [nucleotides from positions 4375 to 4379 (GATT) are missing (deleted)]
■One set of brackets is used for 2 variations in a single allele, and 2 sets with a semicolon are used for 2 variations in paired alleles.
[76C>T;283G>C] [2 variants on 1 molecule]
[76C>T];[283G>C] [the same 2 variants on 2 different molecules]
■Nucleotide numbers may be positive or negative.
■The HGVS recommends avoiding the terms mutation and polymorphism, preferring instead the terms sequence variant, sequence variation, alteration, or allelic variant. In view of this recommendation, single-nucleotide variation (SNV) is now more frequently being used instead of SNP (single-nucleotide polymorphism) and may become standard in the future. To aid readers’ understanding during this transition, at first mention SNV may be used, with SNP in parentheses:
“SNV (formerly SNP)”
Note the examples in Table 14.6-3. In general medical publications, textual explanations should accompany the shorthand terms at first mention.
Table 14.6-3. Examples of Sequence Variation Nomenclature
Term |
Explanation |
1691G>A |
G-to-A substitution at nucleotide 1691 |
253Y>N |
pyrimidine at position 253 replaced by another base |
[76A>C;83G>C] |
2 substitutions in single allele (Note: Variations in same allele are indicated by brackets.) |
[76A>C] + [87delG] |
substitution and deletion in paired alleles |
[76A>C (+) 83G>C] |
2 sequence changes in 1 individual, alleles unknown |
977_978insA |
A inserted between nucleotides 977 and 978 |
186_187insC |
C inserted between nucleotides 186 and 187 |
926_927insAAAAAAAAAAA |
insertion of 11 A’s between nucleotides 926 and 927 |
185_186delAG |
deletion of A and G between nucleotides 185 and 186 |
617_618delT |
deletion of T between nucleotides 617 and 618 |
188_199del11 |
11-bp deletion between nucleotides 188 and 199 |
1294_1334del40 |
40-bp deletion between nucleotides 1294 and 1334 |
c.5delA |
A deleted at position 5 (cDNA) |
c.5_7delAGG |
AGG deleted at positions 5 through 7 (cDNA) |
g.5_123del |
nucleotides deleted from positions 5 through 123 (gDNA) |
g.7dup |
duplication of a T at position g.7 in the sequence ACTTACTGCC to ACTTACTTGCC |
1007fs |
frameshift mutation at codon 1007 |
112_117delinsTG |
deletion from nucleotide 112 through 117 and insertion of TG |
112_117delAGGTCAinsTG |
|
112_117>TG |
|
203_506inv |
304 nucleotides inverted from positions 203 through 506 |
203_506inv304 |
|
167(GT)6-22 |
6 to 22 GT repeats starting at position 167 |
g.167(GT)8 |
8 GT repeats starting at position 167 (gDNA) |
c.827_XYZ:233del |
Examples7 with hypothetical gene symbol XYZ incorporated (but not italicized) (see 14.6.2, Human Gene Nomenclature) |
c.827_oXYZ:233del |
o: opposite (antisensea) strand |
Abbreviations: bp, base pair; cDNA, complementary or coding DNA; gDNA, genomic DNA.
a A DNA molecule consists of 2 strands; one is the sense strand and one is the antisense strand. The sense strand (also called coding strand, plus strand, or nontemplate strand) contains codons and is the same as mRNA except that thymine in DNA is replaced by uracil in RNA. The antisense strand (also called noncoding strand, minus strand, or template strand) contains noncodons and acts as a template for the synthesis of mRNA. Therefore, the antisense strand is complementary to the sense strand and mRNA.15
When a gene symbol is used with a sequence variation term, only the gene symbol is italicized (see 14.6.2, Human Gene Nomenclature).
ADRB1 1165C>G (not: ADRB1 1165C>G)
Note: Sequence variants are often indicated by using virgules, but this is not recommended.12
Avoid: |
1721G/A |
Preferred: |
1721G, 1721A |
Avoid: |
2417A/G |
Preferred: |
2417A>G |
In practice, means other than the symbol > are commonly used to indicate substitutions. Of the following, the JAMA Network journals prefer the arrow:
1691G→A
1691G-A
1691GtoA
1691G-to-A
Any symbol for substitution is better than no symbol; otherwise the expression may be misinterpreted as indicating a dinucleotide at the site. For instance, 1691GA would imply a change involving the dinucleotide GA (1691G and 1692A).
When genotype is being expressed in terms of nucleotides (eg, sequence variants), italics and other punctuation for the nucleotides are not needed (see 14.6.2, Human Gene Nomenclature):
MTHFR 677 CC and TT genotypes
For nucleotide numbering of a cDNA reference sequence, nucleotide +1 is the A of the ATG initiator codon. The first nucleotide immediately 5′ (upstream) of the ATG initiator codon is −1. So for the sequence 5′AGC CTG ATG GAC CTC 3′ the G immediately 5′ of the
ATG is −1, and A is +1. The nucleotide 3′ of the translation stop codon is *1. For nucleotides in introns, those at the 5′ end of the intron are numbered with a “plus” relative to the last base of the immediately preceding exon, whereas those at the 3′ end are numbered with a “minus” relative to the first base of the immediate downstream exon. For example:
c.77+2T |
cDNA, nucleotide 77 of preceding exon, position 2 in intron, T residue |
c.78-1G |
cDNA, nucleotide 78 of next exon, position 1 in intron, G residue |
Nucleotide numbering of a DNA reference sequence is arbitrary (ie, there is no defined starting point as in cDNA). Therefore, authors should describe their numbering scheme. No plus signs or minus signs are used with gDNA reference sequences.
Listing both the official and the traditional names next to each other in the variant summary will help authors and readers become more familiar with the official (preferred) terms.
Preferred (Official): |
c.88+2T>G |
Replaces (Traditional): |
IVS#+2T>G |
Promoter variants (promoter sequence variants) have been commonly expressed with terms such as
−765G>A
which implies nucleotide numbering in terms of a cDNA reference sequence. However, authors are advised to instead (or additionally) describe the variant in relation to a gDNA reference sequence (see 14.6.1.1.2, Unique Identifiers).14
L01531.1:g.1561C>T
Terms with a capital delta have been used to indicate exonic deletions. For example:
∆ ex 1a-15
∆ ex 1a-12
∆ ex 3
14.6.1.1.2 Unique Identifiers.
Official recommendations include mentioning a sequence variant’s unique identifier, for instance, a number assigned by a locus-specific curator or the OMIM number.16 Allelic variants are designated by the 6-digit OMIM number, followed by a decimal point and a unique 4-digit variant number. The asterisk that precedes the number indicates that it is a gene (see 14.6.2.1, OMIM, for an explanation of OMIM numbering system and symbols). For a list of locus-specific database curators, see the HGVS website under Nomenclature for the Description of Sequence Variants.6 For example:
1311C>T (OMIM *305900.0018)
880C>T (OMIM *600681.0002)
14.6.1.1.3 Database Identifiers for Genomic Sequences.
Several databases record genomic sequence information:
Nucleotides:
GenBank (https://www.ncbi.nlm.nih.gov/genbank)
RefSeq (https://www.ncbi.nlm.nih.gov/refseq/)
EMBL (European Molecular Biology Laboratory) (https://www.embl.de)
DDBJ (DNA Data Bank of Japan) (https://www.ddbj.nig.ac.jp)
International HapMap Project (https://www.genome.gov/10001688/international-hapmap-project)
Proteins:
RCSB Protein Data Bank (https://www.rcsb.org/)
Protein database (https://www.ncbi.nlm.nih.gov/protein)
UniProt Knowledgebase (https://www.uniprot.org)
UniProtKB/Swiss-Prot (web.expasy.org/docs/swiss-prot_guideline.html)
PIR-PSD (Protein Information Resource: Protein Sequence Database) (https://proteininformationresource.org/pirwww/dbinfo/pir_psd.shtml)
For a review of databases in molecular biology, including several of the foregoing, see the 2018 Database Issue of the journal Nucleic Acids Research.17
Accession numbers are assigned when researchers submit unique sequences to any one of the databases. In published articles, accession numbers are useful in indicating specific sequences:
Founder effects were investigated using 2 previously undescribed, highly polymorphic microsatellite markers that flank presenilin 1. The first is a GT repeat at position 33117 (GenBank AF109907). The second is a CA repeat at position 23 000 of this same sequence.18
Accession numbers should include the version (eg, .1, .2) if possible6:
NM_000130.1
NM_000130.2
L01538.1
The following example shows variation expressed with the accession number7:
NM_004006.1:c.3G>T
For unambiguous identification, both version number and accession number should be used.6 Common formatting for nucleotide data was determined in 1988 by representatives of GenBank, EMBL (European Molecular Biology Laboratory), and DDBJ (DNA Data Bank of Japan), forming the International Nucleotide Sequence Database Collaboration.19
14.6.1.2 RNA.
Functionally associated with DNA is RNA. It contains the 3 bases adenine (A), cytosine (C), and guanine (G) but differs from DNA in having the base uracil (U) instead of thymine (T) and the sugar ribose rather than deoxyribose. The corresponding nucleosides are adenosine, cytidine, guanosine, and uridine.
An example of an RNA sequence is as follows:
5′-UUAGCACGUGCUAA-3′
Examples of RNA codons are as follows:
CAU |
UUG |
AUU |
Expand these common abbreviations at first use:
cRNA |
complementary RNA |
dsRNA |
double-stranded RNA |
gRNA |
genomic RNA |
hnRNA |
heteronuclear RNA (heterogeneous RNA) |
mRNA |
messenger RNA |
miRNA |
microRNA |
mtRNA |
mitochondrial RNA |
nRNA |
nuclear RNA |
RNAi |
RNA interference |
rRNA |
ribosomal RNA |
siRNA |
short interfering RNA |
snRNA |
small nuclear RNA |
tRNA |
transfer RNA |
Types of tRNA may be further specified; follow typographic style closely (these need not be expanded after the initial expansion of tRNA):
tRNAMet |
tRNA specific for methionine |
Met-tRNAMet |
methionyl-tRNA |
tRNAfMet |
tRNA specific for formylmethionine |
fMet-tRNAfMet |
|
or |
N-formylmethionyl-tRNA |
fMet-tRNAf |
|
tRNAAla |
tRNA specific for alanine |
tRNAVal |
tRNA specific for valine |
The 3-dimensional structure of tRNA has several different arms, which allow it to recognize a codon on mRNA and deliver the appropriate amino acid during protein synthesis:
AA (amino acid) arm
DHU (dihydrouridine) arm
anticodon arm
TψC arm (ψ for the unusual base pseudouridine)
14.6.1.2.1 RNA Sequence Variations.
Style for abbreviated sequence variation terms described at the RNA level is essentially the same as for DNA (see 14.6.1.1.1, Sequence Variations, Nucleotides). The main exception is that the RNA nucleotide abbreviations are lowercase. The prefix r. is used to signify RNA12 but is not required.
78a>u
r.76a>c
RNA sequences are quantified by use of the same units as for DNA (ie, base, bp, kb, and Mb) (see 13.12, Units of Measure):
240-bp dsRNA
10-25 RNA bases
a 7.5-kb RNA probe
14.6.1.3 Nucleotides as Precursors and Energy Molecules.
The nucleotides of DNA and RNA are also important individually as the precursors of DNA and RNA and as energy molecules. They may bind 1, 2, or 3 phosphate molecules, giving rise to compounds with the following abbreviations (see 13.11, Clinical, Technical, and Other Common Terms) or alternative shorthand.
14.6.1.3.1 Ribonucleotides.
See Table 14.6-4 for examples of terms and their abbreviations.
Table 14.6-4. Examples of Terms and Abbreviations for Ribonucleotides
Terms |
Abbreviation |
Alternative shorthand |
adenosine monophosphate, adenylic acid |
AMP |
pA |
adenosine diphosphate |
ADP |
ppA |
adenosine triphosphate |
ATP |
pppA |
cytidine monophosphate, cytidylic acid |
CMP |
pC |
cytidine diphosphate |
CDP |
ppC |
cytidine triphosphate |
CTP |
pppC |
guanosine monophosphate, guanylic acid |
GMP |
pG |
guanosine diphosphate |
GDP |
ppG |
guanosine triphosphate |
GTP |
pppG |
uridine monophosphate, uridylic acid |
UMP |
pU |
uridine diphosphate |
UDP |
ppU |
uridine triphosphate |
UTP |
pppU |
14.6.1.3.2 Deoxyribonucleotides.
See Table 14.6-5 for examples of terms and abbreviations for deoxyribonucleotides.
Table 14.6-5. Examples of Terms and Abbreviations for Deoxyribonucleotides
Term |
Abbreviation |
Alternative shorthanda |
deoxyadenosine monophosphate, deoxyadenylic acid |
dAMP |
pdA |
deoxyadenosine diphosphate |
dADP |
|
deoxyadenosine triphosphate |
dATP |
|
deoxycytidine monophosphate, deoxycytidylic acid |
dCMP |
pdC |
deoxycytidine diphosphate |
dCDP |
|
deoxycytidine triphosphate |
dCTP |
|
deoxyguanosine monophosphate, deoxyguanylic acid |
dGMP |
pdG |
deoxyguanosine diphosphate |
dGDP |
|
deoxyguanosine triphosphate |
dGTP |
|
deoxythymosine monophosphate, deoxythymidylic acid |
dTMP |
pdT |
deoxythymosine diphosphate |
dTDP |
|
deoxythymosine triphosphate |
dTTP |
a Terms such as ppdA and pppdA are, by analogy with ribonucleotide shorthand, feasible but not commonly found.
In the foregoing examples, monophosphates are assumed to be phosphorylated at the 5′ position, and the more specific term may be used:
5′-AMP
The additional phosphate groups of diphosphates and triphosphates are linked sequentially to the first phosphate group. Other phosphate positions and variations may be specified as follows:
2′-UMP |
|
3′-UMP |
Up |
3′,5′-ADP |
pAp |
3′,5′-AMP |
cAMP (cyclic AMP) |
Note that the p follows the capital letter when 3′-phosphate is indicated.
14.6.1.4 Nucleic Acid Technology.
Laboratory methods of analyzing DNA make use of special DNA sequences, which include the following:
RFLPs |
restriction fragment length polymorphisms |
SNPs |
SNPs single-nucleotide polymorphisms (pronounced “snips”) (note that SNVs is now preferred; see 14.6.1.1.1, Sequence Variations, Nucleotides) |
SNVs |
single-nucleotide variants |
STRs |
short tandem repeats |
STRPs |
STR polymorphisms |
STSs |
sequence tagged sites |
VNTRs |
variable number of tandem repeats |
Note: Satellite DNA repeats, microsatellite (repeating sequences of 1-9 bp) repeats (or markers), and minisatellite (repeating sequences of 10-100 bp) repeats20 (or markers) are distinct types of tandem repeat sequences.
An SNV sequence may be preceded by rs (for reference SNV ID) or ss (for submitted SVP ID), used for accession numbers assigned by the National Center for Biotechnology Information:
rs1002138(-)
14.6.1.4.1 The Reference Genome.
The publication of the draft human genome sequence in 2001 heralded the beginning of the current era of genomic medicine.21 Since that time, rapid advances in technology have facilitated increasingly accurate and inexpensive methods for interrogating the genomes of humans and model organisms for research and clinical care.
Current sequencing technologies do not sequence chromosomes from end to end. Rather, in a massively parallel process, genomic DNA is fragmented, sequenced, and reassembled for purposes of representation of a nearly complete genome.22 In some applications only the protein coding regions of the genome are sequenced (exome sequencing), but increasingly it is feasible to sequence the entire genome for research or clinical purposes.
Of importance, a genome assembly and a genome are not the same thing. “A genome is the physical entity that defines an organism. An assembly is not a physical object; it is the collection of all sequences used to represent the genome of an organism.”22 Assemblies can be of varying degrees of completeness; for example, some regions of the human genome remain refractory to sequencing or assembly with current technologies. Informaticians and geneticists are continually striving to refine the accuracy of these assemblies known as reference genomes. As sequencing and assembly technologies continue to evolve, so does the notion of a reference assembly. The sequences in the human reference genome assembly do not represent the genome of a single individual but are mosaics constructed from the DNA of many anonymous individuals. Contributions from one individual comprise approximately 70% of the assembly sequence, although more than 50 individuals are represented in GRCh38.23 The human reference genome assembly or build23 (currently GRCh38) acts as the coordinate system for the human genome and the features annotated on it and is often the representation used for comparisons with other human genomes for diagnosis or research. Initially produced by the Human Genome Project, it is now maintained by the Genome Reference Consortium. However, other genomes may also be used as a reference in comparative analyses (eg, a parent’s genome vs an offspring’s genome). In publication, the most reliable means to define a genome assembly is by its unique GenBank (INSDC: http://insdc.org/)24 accession number (eg, GRCh38 = GCA_000001405.15). If an assembly has not been deposited in GenBank, typically, at a minimum, an identifier for the sample from which the assembly is derived is provided (eg, INSDC BioSample accession [eg, SAMN06710886] or other identifier [eg, Coriell: NA10874]). Publications may include names for genome assemblies along with accession numbers or sample identifiers (eg, HuRef = GCA_000002125.2) (see 14.6.1.1.3, Database Identifiers for Genomic Sequences). Note: Use care with the term reference. Although the Human Genome Project produced the first notion of a human reference genome assembly, there are several ongoing efforts to create high-quality genome assemblies that could serve as population-specific reference assemblies.25 None of these have yet been formally recognized by the global research community as a reference, but “it may be possible that the future human ’reference’ genome is a panel of assemblies, rather than a single assembly” (Valerie Schneider, PhD, staff scientist, National Center for Biotechnology Information, written communication, May 2, 2017). There are also multiple efforts under way to use graph formats (rather than the traditional linear sequence format) to create references that represent population-level variation.26
For patients with disorders that have a primarily genetic origin, massively parallel sequencing of the total complement of an individual’s DNA (genome sequencing) has proven to be a powerful diagnostic approach. Genome-scale sequencing can be performed on DNA from white blood cells or on buccal cells from saliva. In sequence analysis, each individual’s genome contains millions of sites where his or her DNA differs from a reference sequence. Clinical interpretation requires assessing whether any of these variants are associated with disease.27 See Figure 14.6-4 for the analysis processing sequence.
Figure 14.6-4. Informatic and Human Analysis Required for Finding Rare Pathogenic Variants in a Human Genome
Genetic variants are informatically filtered to remove those with very low likelihood of pathogenicity (eg, variants known to be benign or present at very high frequency in the general population). This informatic processing incorporates annotations of individual variants (eg, population allele frequencies, prior literature reports, computational predictions of functional effect) for use in manual analysis. Reproduced from Evans et al.27
14.6.1.4.2 Methods of Analysis.
Methods of analysis include the following:
ASO |
allele-specific oligonucleotide probes |
DGGE |
denaturing gradient gel electrophoresis |
EMSA |
electrophoretic mobility shift assay |
FISH |
fluorescence in situ hybridization |
OSH |
oligonucleotide-specific hybridization |
PCR |
polymerase chain reaction |
PTT |
protein truncation test |
RT-PCR |
reverse-transcriptase polymerase chain reaction |
SKY |
spectral karyotyping, a type of fluorescence in situ hybridization |
SSCP |
single-stranded conformational polymorphism |
14.6.1.4.3 Blotting.
The first blotting technique, used for identifying specific DNA sequences in gDNA isolated in vitro by means of nucleic acid probes, was named Southern blotting for its originator, Edwin Southern. Similar techniques have since been named (with droll intent) for compass directions and include Northern blotting (RNA identified; nucleic acid probe), Western blotting (protein identified; antibody probe), Southwestern blotting (DNA protein identified; DNA probe), and Far Western blotting (protein-protein interaction identified; protein probe).28,29 Recombinant DNA is DNA created by combining isolated DNA sequences of interest. Among the tools used in this process are cloning vectors, such as plasmids, phages (see 14.14.3, Virus Nomenclature, and 14.4.4, Prions), and hybrids of these, cosmids, and phagemids. Additional tools are bacterial artificial chromosomes (BACs) and yeast artificial chromosomes (YACs).
Basic explanations of these entities are available in medical dictionaries and textbooks. A few that present special nomenclature problems are described here.
14.6.1.4.4 Cloning Vectors.
Plasmids are typically named with a lowercase p followed by a letter or alphanumeric designation; spacing may vary:
pBR322
pJS97
pUC
pUC18
pSPORT
pSPORT 2
Phage cloning vectors are named for the phages. For example:
phage λ: |
λgt10, λgt11, λgt22A |
M13 phage: |
M13KO7, M13mp |
14.6.1.4.5 Restriction Enzymes.
Restriction enzymes (or restriction endonucleases) are special enzymes that cleave DNA at specific sites. They are named for the organism from which they are isolated, usually a bacterial species or strain. An authoritative source of information is REBASE.5 As originally proposed,30 their names consist of a 3-letter term, italicized and beginning with a capital letter, taken from the organism of origin, for example:
Hpa for Haemophilus parainfluenzae
followed by a roman numeral, which is a series number, for example:
HpaI
HpaII
In some cases, the series number is preceded by a capital or lowercase letter (roman, not italic), an arabic numeral, or a number and letter combination, which refers to the strain of bacterium; there are no spaces between any of these elements of the term:
EcoRI
HinfI
Sau96I
Sau3AI
Many variations in the form of the names of these enzymes have appeared (eg, Hin d III, Hin dIII, Hind III, Hind III). It is currently recommended that italics and spacing be given as noted in the preceding paragraph to differentiate the species name, strain designation, and enzyme series number. Table 14.6-6 gives examples of commonly used restriction enzymes.
Table 14.6-6. Examples of Commonly Used Restriction Enzymes and the Organism of Origin
Enzyme name |
Organism of origin |
AccI |
Acinetobacter calcoaceticus |
AluI |
Arthrobacter luteus |
AlwNI |
Acinetobacter lwoffii N |
BamHI |
Bacillus amyloliquefaciens H |
BclI |
Bacillus caldolyticus |
BstEII |
Bacillus stearothermophilus ET |
BstXI |
Bacillus stearothermophilus X |
I-CeuI |
Chlamydomonas eugametos |
DpnI |
Streptococcus (diplococcus) pneumoniae M |
EcoRI |
Escherichia coli RY13 |
EcoRII |
Escherichia coli R245 |
HaeII |
Haemophilus aegyptius |
HincII |
Haemophilus influenzae Rc |
HindIII |
Haemophilus influenzae Rd |
HinfI |
Haemophilus influenzae Rf |
MseI |
Micrococcus species |
MspI |
Moraxella species |
PleI |
Pseudomonas lemoignei |
PmlI |
Pseudomonas maltophilia |
PstI |
Providencia stuartii |
Sau3AI |
Staphylococcus aureus 3A |
Sau96I |
Staphylococcus aureus PS96 |
SmaI |
Serratia marcescens |
SstI |
Streptomyces stanford |
TaqI |
Thermus aquaticus YT-1 |
XbaI |
Xanthomonas badrii |
XhoI |
Xanthomonas holicola |
Prefixes may further specify type of enzyme action, for example:
I-CeuI |
I: intron-coded endonuclease |
Chlamydomonas eugametos |
M.MlyI |
M: methylase |
Micrococcus lylae |
N.MlyI |
N: nicking enzyme |
Restriction enzyme names are often seen as modifiers, for example:
a BamHI fragment
an EcoRI site
For information on recognition sequences, see 14.6.1.1, DNA.
14.6.1.4.6 Modifying Enzymes.
Enzymes exist that synthesize DNA and RNA (polymerases), cleave DNA (nucleases), join nucleic acid fragments (ligases), methylate nucleotides (methylases), and synthesize DNA from RNA (reverse transcriptases) (see 14.10.3, Enzyme Nomenclature). Those in laboratory use come from living systems, often from the same organisms that furnish restriction enzymes. Because the names may be similar, it is essential to specify the type of enzyme so that there is no confusion, for example:
AluI methylase
Pfu DNA polymerase (Pyrococcus furiosus)
TaqI methylase
Taq DNA ligase
Modifying enzyme names are often seen as modifiers, for example:
a TaqI RFLP
In the following enzyme terms, T plus numeral refers to the related phage (see 14.14.3, Virus Nomenclature, and 14.4.4, Prions):
T7 DNA polymerase
T4 DNA polymerase
T4 polynucleotide kinase
T4 RNA ligase
14.6.1.4.7 DNA Families.
Some sequences belonging to non—protein-coding regions of the genome can also be classified by their base content. Non—protein-coding DNA includes that which is transcribed into functional noncoding RNA molecules (eg, transfer RNA, ribosomal RNA, and regulatory RNA, such as microRNA), as well as families of repetitive sequence, some of which include transposons and retrotransposons. Families include the following:
Collective term: SINEs (short interspersed nuclear elements) Example: Alu family (named for AluI; see 14.6.1.4.5, Restriction Enzymes) Category: Interspersed |
Collective term: LINEs (long interspersed nuclear elements) Example: L1 family (from LINE 1 family) Category: Tandem |
14.6.1.5 Amino Acids.
Twenty amino acids are encoded by triplet base codons in DNA and constituents of proteins. Each has 1 or more distinct codons in DNA (eg, GCU, GCC, GCA, and GCG code for alanine).
Table 14.6-7 gives the amino acids of proteins and their preferred 3- and single-letter symbols. Although these amino acids have systematic names (eg, alanine is 2-aminopropanoic acid), the trivial names are the most widely recognized and used. The single-letter symbols are usually used for longer sequences; otherwise, the 3-letter symbols are preferred. Do not mix single-letter and 3-letter amino acid symbols. In publications for a general audience, it may be helpful to define the single-letter symbols (eg, in a key) and to expand the 3-letter symbols at first mention as well.
Table 14.6-7. Amino Acids of Proteins and Their 3- and Single-Letter Symbols
Amino acid |
3-Letter symbol |
Single-letter symbol |
alanine |
Ala |
A |
arginine |
Arg |
R |
asparagine |
Asn |
N |
aspartic acid |
Asp |
D |
asparagine or aspartic acid |
Asx |
B |
cysteine |
Cys |
C |
glutamic acid |
Glu |
E |
glutamic acid or glutamine |
Glx |
Z |
glutamine |
Gln |
Q |
glycine |
Gly |
G |
histidine |
His |
H |
isoleucine |
Ile |
I |
leucine |
Leu |
L |
lysine |
Lys |
K |
methionine |
Met |
M |
phenylalanine |
Phe |
F |
proline |
Pro |
P |
serine |
Ser |
S |
threonine |
Thr |
T |
tryptophan |
Trp |
W |
tyrosine |
Tyr |
Y |
valine |
Val |
V |
unspecified amino acid |
Xaa |
X |
The symbols Asp and Glu apply equally to the anions aspartate and glutamate, respectively, the forms that exist under most physiological conditions.
Other amino acids are also well known by their trivial names and have 3-letter codes. These, however, should always be expanded at first mention, as the example of cystine, whose 3-letter code is the same as that of cysteine, bears out:
citrulline |
Cit |
cystine |
Cys |
homocysteine |
Hcy |
homoserine |
Hse |
hydroxyproline |
Hyp |
ornithine |
Orn |
thyroxine |
Thx |
The side chains of amino acids are known as R groups, and the letter R is used in molecular formulas when indicating a nonspecified side chain, as in this general formula for an amino acid:
Do not confuse the R with the single-letter abbreviation for arginine (see Table 14.6-7).
Peptide bonds are bonds between the α-carboxyl group of one amino acid and the α-amino group of the next. Long peptide sequences are the backbones of proteins. A peptide sequence might be indicated as follows, with hyphens representing peptide bonds:
Gly-Ile-Val-Glu-Gln-Cys-Cys-Ala-Ser-Val-Cys-Ser-Leu-Tyr
By convention in such a sequence, the amino end of the peptide (the end of the peptide whose amino acid has a free amino group, also known as the N terminal) is on the left and the carboxyl end (the end of the peptide whose amino acid has a free carboxyl group, also known as the C terminal) is on the right. The symbols NH2 and COOH may be included in the representation of the peptide sequence, as follows:
NH2-Gly-Ile-Val-Glu-Gln-Cys-Cys-Ala-Ser-Val-Cys-Ser-Leu-Tyr-COOH
The same left-to-right convention applies to sequences using single letters. The above sequence using single letters would be as follows:
GIVEQCCASVCSLY
When the NH2 group appears on the right of a sequence, it has a meaning other than amino end. For instance, in the following sequence, Val-NH2 indicates the amide derivative of valine:
His-Phe-Arg-Lys-Pro-Val-NH2
To indicate bonds other than the peptide bonds described above, lines, rather than hyphens, are used:
(Adapted with permission from Moss.2 Copyright IUPAC and IUBMB.)
For a multiline peptide sequence in running text, use a hyphen at the right end of one line to indicate a break and at the start of the next line to indicate the peptide bond:
Ala-Ser-Tyr-Phe-Ser-
-Gly-Pro-Gly-Trp-Arg
or, in figures, use a line:
(Adapted with permission from Moss.2 Copyright IUPAC and IUBMB.)
In special cases, such as cyclic compounds (illustrated here by gramicidin S), the bond from C-2 to N-2 can be shown with arrows, as follows:
(Adapted with permission from Moss.2 Copyright IUPAC and IUBMB.)
As with nucleic acid sequences, alignment is important in protein sequences. In the following examples, the amino acid residues must remain aligned with the nucleic acid triplets:
(Adapted with permission from Moss.2 Copyright IUPAC and IUBMB.)
An amino acid term plus number refers to the amino acid by codon number (when known) or by protein residue. For example:
Arg506
14.6.1.5.1 Sequence Variations, Amino Acids.
HGVS has expressed a preference for the 3-letter amino acid abbreviation to be used in shorthand descriptions of sequence variations in proteins because several amino acids start with the same initial letter (eg, Ala, Arg, Asn, Asp). The use of only 1 letter could lead to ambiguity or confusion. The 1-letter style still may be seen but is not recommended. For sequence variations described at the protein level, recommended style for abbreviated terms is similar to that for nucleotides (see 14.6.1.1.1, Sequence Variations, Nucleotides, and 14.6.2, Human Gene Nomenclature). Note, as indicated in Table 14.6-8, that the amino acid abbreviation begins the term, preceding the position number (in contrast to nucleotide sequence variant terms, in which the residue number precedes the residue abbreviation). Explanation of such terms at first mention is recommended. Use of the prefix p. (protein) is another recent recommendation.
Table 14.6-8. Sequence Variations in Proteins and Their 3- and Single-Letter Descriptions
3-Letter style |
Single-letter style |
Explanation |
Arg506Gln |
R506Q |
arginine at residue 506 replaced by glutamine |
Leu10ins |
L10ins |
leucine inserted at position 10 |
Leu141del |
L141del |
leucine deleted at position 141 |
Gln318X or Gln318ter |
G318X |
glutamine at 318 changed to stop codon (X or ter) |
p.Trp26Cys |
p.W26C |
tryptophan at residue 26 replaced by cysteine |
X is officially recommended as the symbol for the stop codon, but it can also be the single-letter abbreviation for unspecified or unknown amino acid. Therefore, when an amino acid sequence expressed with single letters that includes X is used, the X should be explained in the text.
When an amino acid sequence variation is used with a gene symbol, italicize only the gene symbol:
ADRB1 Arg389Gly (not: ADRB1 Arg389Gly )
(See 14.6.2, Human Gene Nomenclature.)
Note: Residue numbering begins at the translation initiator methionine, +1.
For further details on expressing sequence variations in proteins, consult the HGVS recommendations.6
Principal Author: Cheryl Iverson, MA
Acknowledgment
Thanks to the following for reviewing and providing comments: W. Gregory Feero, MD, PhD, JAMA, and Maine-Dartmouth Family Medicine Residency, Augusta; Valerie Schneider, PhD, National Center for Biotechnology Information, Bethesda, Maryland; Trevor Lane, MA, DPhil, Edanz Group, Fukuoka, Japan; and John J. McFadden, MA, JAMA Network. Thanks also to David Song, JAMA Network, for obtaining permissions.
References
1.Cammack R. The biochemical nomenclature committees. IUBMB Life. 2000;50(3):159-161. doi:10.1080/152165400300001453
2.Moss GP. International Union of Biochemistry and Molecular Biology recommendations on biochemical & organic nomenclature, symbols & terminology, etc. Updated May 21, 2018. Accessed June 25, 2018. https://www.qmul.ac.uk/sbcs/iubmb/
3.Nussbaum RL, McInnes RR, Willard HF. Thompson & Thompson Genetics in Medicine. 8th ed. Elsevier; 2016.
4.Cooper NG. The Human Genome Project: Deciphering the Blueprint of Heredity. University Science Books; 1994.
5.Roberts RJ, Vincze T, Posfai J, Macelis D. REBASE: a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Res. 2010;38(database issue):D234-D236. Accessed July 31, 2019. https://www.ncbi.nlm.gov/pmc/articles/PMC2808884
6.Human Genome Variation Society website. Updated May 17, 2018. Accessed July 31, 2019. http://www.hgvs.org
7.den Dunnen JT, Antonarakis SE. Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion. Hum Mutat. 2000;15(1):7-12. doi:10.1002/(SICI)1098-1004(200001)15:1<7::AID-HUMU4>3.0.CO;2-N
8.Antonarakis SE; Nomenclature Working Group. Recommendations for a nomenclature system for human gene mutations. Hum Mutat. 1998;11(1):1-3. doi:10.1002/(SICI)1098-1004(1998)11:1<1::AID-HUMU1>3.0.CO;2-O
9.Beutler E, McKusick VA, Motulsky AG, Scriver CR, Hutchinson F. Mutation nomenclature: nicknames, systematic names, and unique identifiers. Hum Mutat. 1996;8(3):203-206. doi:10.1002/(SICI)1098-1004(1996)8:3<203::AID-HUMU1> 3.0.CO;2-A
10.Ad Hoc Committee on Mutation Nomenclature. Update on nomenclature for human gene mutations. Hum Mutat. 1996;8(3):197-202. doi:10.1002/humu.1380080302
11.Beaudet AL, Tsui L-C. A suggested nomenclature for designing mutations. Hum Mutat. 1993;2(4):245-248. doi:10.1002/humu.1380020402
12.den Dunnen JT, Antonarakis E. Nomenclature for the description of human sequence variations. Hum Genet. 2001;109(1):121-124. doi:10.1007/s004390100505
13.Sequence variant nomenclature. HGVS Simple. Accessed June 25, 2018. http://varomen.ghvs.org/bg-material/simple/
14.den Dunnen JT, Dalgleish R, Maglott DR, et al; Human Genome Variation Society (HGVS) and Human Genome Organization (HUGO). HGVS recommendations for the description of sequence variants: 2016 update. Hum Mutat. 2016;37(6):564-569. doi:10.1002/humu.22981
15.Major Differences. Accessed March 17, 2019. http://www.majordifferences.com/2015/01/difference-between-sense-and-antisense.html
16.Online Mendelian Inheritance in Man (OMIM). National Center for Biotechnology Information website. Updated daily. Accessed July 31, 2019. https://www.ncbi.nlm.nih.gov/omim
17.Rigden DJ, Fernández XM. The 2018 Nucleic Acids Research database issue and the online molecular biology database collection. Nucleic Acids Res. doi:10.1093/nar/gkx1235
18.Athan ES, Williamson J, Ciappa A, et al. A founder mutation in presenilin 1 causing early-onset Alzheimer disease in unrelated Caribbean Hispanic families. JAMA. 2001;286(18):2257-2263. doi:10.1001/jama.286.18.2257
19.About INSDC. International Nucleotide Sequence Database Collaboration website. Accessed July 31, 2019. www.insdc.org/about
20.Difference between minisatellite and microsatellite. July 14, 2017. Accessed July 31, 2019. https://www.differencebetween.com/difference-between-minisatellite-and-vs-microsatellite
21.Pasche B. Whole-genome sequencing: a step closer to personalized medicine. JAMA. 2011;305(15):1596-1597. doi:10.1001/JAMA.2011.484
22.Schneider V, Church D. Genome Reference Consortium. In: The NCBI Handbook. 2nd ed. National Center for Biotechnology Information; 2013.
23.Schneider VA, Graves-Lindsay T, Howe K, et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 2017;27(5):849-864. doi:10.1101/gr.213611.116
24.International Nucleotide Sequence Database Collaboration (INSCD). Accessed July 31, 2019. http://www.insdc.org
25.McDonnell Genome Institute. Reference genome improvement. Accessed July 24, 2017. https://www.genome.wustl.edu/items/reference-genome-improvement/
26.Novak AM, Hickey G, Garrison E, et al. Genome graphs. Accessed July 24, 2017. doi:10.1101/101378
27.Evans JP, Powell BC, Berg JS. Finding the rare pathogenic variants in a human genome. JAMA. 2017;317(18):1904-1905. doi:10.1001/jama.2017.0432
28.Nicholas MW, Nelson K. North, South, or East? blotting techniques. J Invest Dermatol. 2013;133(7):e10. doi:10.1038/jid.2013.216
29.Wu Y, Li Q, Chen X-Z. Detecting protein-protein interaction by Far Western blotting. Nat Protoc. 2007;2(12):3278-3284. doi:10.1038/nprot.2007.459
30.Smith HO, Nathans D. A suggested nomenclature for bacterial host modification and restriction systems and their enzymes. J Mol Biol. 1973;81(3):419-423. doi:10.1016/0022-2836(73)90152-6
14.6.2 Human Gene Nomenclature.
The International System for Human Gene Nomenclature (ISGN), a system for gene symbols, was inaugurated in 19791,2 and has been continually updated. The history of naming genes and proteins is littered with redundancy because investigators often make discoveries separately and choose a name without following any sort of naming convention. Hence, the literature, especially literature more than a decade old, can be confusing because the same gene may have multiple names. Standardization helps both research and clinical care. The Human Gene Mapping Nomenclature Committee (HGNC), which developed the ISGN, put forth a “one human genome—one gene language” principle:
Certainly there exists a genetic and molecular basis for a single human gene language without dialects. All human nuclear genes as we know them follow the same genetic, molecular, and evolutionary principles. . . .Thus it is reasonable and logical to develop a standard and consolidated gene nomenclature system rather than have a human gene language based on different gene systems.3(p12)
The HGNC is 1 of 7 committees of the Human Genome Organisation (HUGO) and is “responsible for gene name validation.”4(p115) Gene names and symbols are assigned by the HGNC.5,6 To date, the HGNC has assigned more than 42 000 gene names.
■Gene Symbols: A gene symbol is a short term, typically 3 to 7 characters long, that conveys in abbreviated form the name or other attribute of a gene. Human gene symbols usually consist of uppercase letters and may also contain (but never begin with) arabic numerals. Approved gene symbols do not contain Greek letters, roman numerals, superscripts, or subscripts and, usually, contain no punctuation. Gene symbols should be italicized, per official recommendations.7 Italicizing is a useful way to make clear that a gene, and not a similarly named entity such as a condition or product of the gene, is being discussed. Italics are not necessary in published catalogs of gene symbols.7 For style rules for gene symbols, see Table 14.6-9.
Approved symbols may represent other entities, such as chromosomal regions, certain syndromes, genes whose existence is inferred (supported by linkage analysis or association with known markers), cloned DNA segments, pseudogenes, and DNA fragments.
Within larger terms, only the gene symbol is italicized:
ADRB2 46G>A (not: ADRB2 46G>A)
ADRB2 Gly16Arg (not: ADRB2 Gly16Arg)
(For an explanation of 46G>A and Gly16Arg, see 14.6.1, Nucleic Acids and Amino Acids.)
Authors are encouraged to use the most up-to-date gene symbol, which may be verified at the HGNC database (www.genenames.org),5 previously known as Entrez Gene.8 One area of growth in the HGNC database has been the increase in the number of gene families: to date, the database includes more than 1100 families, “with 51% of the protein coding genes within [the] database associated to at least one family.”6 The HGNC symbols and names are seen as a standard and are used in all the major databases that concentrate on human genes and proteins, for example, UniProt and NCBI Gene, as well as disease and phenotype resources, including Online Locus Reference Genomic (LRG),9 a manually curated record that contains stable, and thus unversioned, reference sequences designed specifically for reporting sequence variants with clinical implications,6 and Online Mendelian Inheritance in Man (OMIM).
14.6.2.1 OMIM.
Online Mendelian Inheritance in Man (OMIM) is a continually updated catalog of human genes and genetic disorders and traits, with focus on the molecular relationship between genetic variation and phenotypic expression.10,11
When a specific syndrome is mentioned, it is helpful to include the OMIM number (see 14.6.1.1.2, Unique Identifiers):
bronchomalacia (OMIM 211450)
DiGeorge syndrome (OMIM #188400)
Each entry is given a unique 6-digit number. Allelic variants are designated by the OMIM number of the entry, followed by a decimal point and a unique 4-digit variant number. For example:
Allelic variants in the factor IX gene (OMIM 300746) are numbers 300746.0001-300746.0101.
Symbols precede many OMIM numbers. These are explained in the OMIM frequently asked questions (FAQ) site,12 as follows:
■An asterisk before an entry number indicates a gene.
■A number symbol (#) indicates that it is a descriptive entry, usually of a phenotype, and does not represent a unique locus.
■A plus sign, the entry contains the description of a gene of known sequence and a phenotype.
■A percent sign, the entry describes a confirmed mendelian phenotype or phenotype locus for which the underlying molecular basis is not known.
■No symbol, description of a phenotype for which the mendelian basis, although suspected, has not been clearly established or the separateness of the phenotype from that in another entry is unclear.
■A caret (^), the entry no longer exists because it was removed from the database or moved to another entry.
Consistent use of the approved gene symbol provides advantages when searching for information in multiple databases.13
■Gene Names: Genes are usually named for the molecular product of the gene, the function of the gene, or the condition associated with the gene, if known. Gene names are not italicized. As shown in Table 14.6-9, the approved gene names, available in the above-mentioned databases, expand Greek letters and do not use subscripts (so that, for instance, in searching for a term with α online, one would type “alpha”). Descriptions based on the approved gene names but styled according to the journal in question (eg, using Greek letters and subscripts) or omitting some terms from the full name are permissible in general medical journals.
approved gene name: |
the alpha-fetoprotein gene |
description: |
the α-fetoprotein gene |
approved gene name: |
the gene for beta-2-microglobulin |
description: |
the gene for β2-microglobulin |
Table 14.6-9. Examples of Style Rules for Gene Symbols
Approved gene name |
Approved gene symbol |
Rule illustrated |
α-fetoprotein |
AFP |
Greek letter changed to Latin letter (but not moved to end of symbol; exception to recommendation) |
α-galactosidase |
GLA |
Greek letter changed to Latin letter and moved to end of symbol |
β1-galactosidase |
GLB1 |
Greek letter changed to Latin letter and moved with numeral to end of term; no subscripts or punctuation |
β2-microglobulin |
B2M |
Greek letter changed to Latin letter; no subscripts or punctuation |
coagulation factor VIII |
F8 |
roman numeral changed to arabic numeral |
heterogeneous nuclear ribonucleoprotein A2/B1 |
HNRPA2B1 |
no punctuation marks or spaces |
MCF.2 cell line—derived transforming sequence |
MCF2 |
no punctuation marks |
5′-nucleotidase, cytosolic |
NT5C |
number moved from the start of symbol; no punctuation |
5S RNA, cluster 1 |
RN5S1@ |
first character is always a letter, not a number; @ sign indicates gene cluster in chromosomal region |
thromboxane A2 receptor |
TBXA2R |
no superscripts or subscripts |
A number of conventions are followed when gene symbols and names are officially designated. Related genes are often assigned symbols by sequentially numbering a stem, the root symbol for the gene family:
ABC: root symbol
genes: ABCA1, ABCG4, etc
TNF: root symbol
genes: TNF, TNFAIP1, TNFAIP2, TNFAIP3, etc
Other conventions involve stereotypic abbreviations; for example, CR will usually signify a chromosome region. (However, a given letter or letter combination does not always signify conventional usage. For instance, L at or near the end of a symbol often, but not always, indicates “like.”) In Table 14.6-10, the conventions in column 1 reflect HGNC recommendations.5 (Note: DNA sequences are available from GenBank.)
Gene symbols can be used without expansion, with the identifying OMIM (see 14.6.2.1, OMIM) or GenBank (see 14.6.2.2, GenBank) number given parenthetically, as in the following examples:
Most of these trials included patients with metastatic colorectal cancer or assessed only KRAS (OMIM 190070) exon 2 variants.
The HSD3B1 gene (OMIM 109715) encodes for the enzyme 3β-hydroxysteroid dehydrogenase-1 (3βHSD1), which catalyzes adrenal androgen precursors into dihydrotestosterone (DHT), the most potent androgen.
Sequencing the APTX gene (OMIM 606350) was performed on request for cases of cerebellar ataxia with hypoalbumunemia and/or early-onset cerebellar ataxia combined with peripheral neuropathy and/or cerebellar atrophy using brain magnetic resonance imaging.
Autosomal dominant cerebellar ataxias are most often caused by CAG repeat expansions in ATXN1 (OMIM 601556), ATXN2 (OMIM 601517), ATXN3 (OMIM 607047), CACNA1A (OMIM 601011), ATXN7 (OMIM 607640), TBP (OMIM 600075), or ATN1 (OMIM 607462).
Patients with stage IV melanoma and established BRAF (GenBank NM_004333.5) or NRAS (GenBank NM_002524.4) variants treated with pembrolizumab or nivolumab alone or in combination between July 3, 2014, and May 24, 2016, were included.
14.6.2.2 GenBank.
GenBank14 is the National Institutes of Health genetic sequence database, an annotated collection of all publicly available DNA sequences. It is part of the International Nucleotide Sequence Database Collaboration, which includes 3 organizations: the DNA DataBank of Japan, the European Nucleotide Archive, and GenBank at the National Center for Biotechnology Information. These organizations exchange data daily, and a new release is issued every 2 months.
Table 14.6-10. Examples of Conventions for Gene Names and Gene Symbols
Convention illustrated |
Gene symbol |
Gene description |
@: gene family or cluster; RN, RNA |
RN5S1@ |
RNA, 5S ribosomal 1q42 cluster |
AP: associated protein |
BRAP |
BRCA1-associated protein |
AS: antisense |
IGF2-AS |
IGF2 antisense RNA (no longer used: insulinlike growth factor 2, antisense) |
BP: binding protein |
IL18BP |
interleukin 18 binding protein |
C: catalytic |
G6PC |
glucose 6-phosphatase, catalytic (glycogen storage disease type I, von Gierke disease) |
CASP (stem), sequentially numbered |
CASP1, CASP2, CASP3, etc |
caspase 1, 2, 3, etc, apoptosis-related cysteine protease |
CF (formerly); name modified after discovery of gene product |
CFTR |
cystic fibrosis transmembrane conductance regulator |
CR: chromosome region |
ANCR |
Angelman syndrome chromosome region |
CR: chromosome region |
DCR |
Down syndrome chromosome region |
D: DNA; 19, chromosome 19; S: (unique DNA) segment; E: expressed |
TOMM40 (D19S1177E is an alias; the official term should be preferred) |
translocase of outer mitochondrial membrane 40 homolog (yeast) (no longer used; DNA: segment sequence) |
D: domain-containing |
BRD1 |
bromodomain containing 1 |
F: series letter; X, X chromosome |
F81A (no longer used: DXS522E) |
coagulation factor VIII—associated 1 (no longer used: DNA segment sequence) |
F: series letter, X, X chromosome |
FRAXF |
fragile site, folic acid type, rare, fra(X)(q28) F |
FAM: family with sequence similarity |
ULK4P1 (no longer used: FAM7A1) |
ULK4 pseudogene (no longer used; family with sequence similarity 7, member A1) |
FRA: fragile site; 10, chromosome 10; G: series letter |
FRA10G |
fragile site, aphidicolin type, common, fra(10)(q11.2) (see 14.6.4, Human Chromosomes) |
6GPD: glucose-6-phosphate dehydrogenase (named for gene product) |
6GPD |
glucose-6-phosphate dehydrogenase |
HBA: hemoglobin subunit alpha (named for gene product) |
HBA1 |
hemoglobin subunit alpha 1 |
HCL: hair color (named for characteristic) |
HCL1 |
hair color 1 (brown) |
HLA (punctuation exception for HLA genes) |
HLA-A |
major histocompatibility complex, class 1, A |
HOX: “homeobox” gene family |
HOXA7 |
homeobox A7 |
IL: interleukin |
IL2RA (no longer used: IDDM10) |
interleukin 2 receptor subunit alpha (no longer used: insulin-dependent diabetes mellitus 10) |
INS: insulin (named for gene product) |
INS |
Insulin |
IP: interacting protein |
SCHIP1 |
schwannomin interacting protein 1 |
L: “like” sequence |
G6PDL |
glucose-6-phosphate dehydrogenase—like |
L (in this case, L at the end does not signify “like”); named for condition |
CDL1 |
Cornelia de Lange syndrome 1 |
LG: ligand |
CAMLG |
calcium modulating ligand |
LOH: loss of heterozygosity |
LINC00312 (no longer used: LOH3CR2A) |
long intergenic non—protein coding RNA 312 (no longer used: loss of heterozygosity 3, chromosomal region 2, gene A) |
M: mitochondrial; RP, ribosomal protein |
MRPL57 (previously MRP63) |
mitochondrial ribosomal protein L57 |
MAG: melanoma antigen (named for condition and gene product) |
MAGEA2 |
melanoma antigen, family member A2 |
MT: mitochondrial |
MT7SDNA |
mitochondrially encoded 7S DNA |
MT: mitochondrial, used with hyphen (punctuation exception) |
MT-RNR1 |
mitochondrially encoded 12S RNA |
MY: myosin |
MYH14 (no longer used: DFNA4) |
myosin, heavy chain 14, nonmuscle (no longer used: deafness, autosomal dominant 4) |
N: inhibitor |
CDKN1B |
cyclin-dependent kinase inhibitor 1B |
orf (lowercase exception for open reading frame) |
TMEM258 (no longer used: C11orf10) |
transmembrane protein 258 (no longer used: chromosome 11 open reading frame 10) |
P: “pseudogene” |
HBAP1 |
hemoglobin subunit alpha pseudogene 1 |
P: does not always signify “pseudogene” |
HIVEP2 |
human immunodeficiency virus 1 enhancer binding protein 2 |
PD: programmed cell death (named for function) |
PD-1 |
programmed cell death 1 protein |
PD-L: programmed cell death ligand (named for function) |
PD-L1 |
programmed cell death 1 ligand 1 |
PDL-L: programmed cell death ligand (named for function) |
PD-L2 |
programmed cell death 1 ligand 2 |
R: receptor |
INSR |
insulin receptor |
R: receptor; L: like |
INSRL |
insulin receptor—like |
REN: renin (named for gene product) |
REN |
renin |
REN: renin (named for gene product); BP, binding protein |
RENBP |
renin binding protein |
RG: regulator |
TCIRG1 |
T-cell, immune regulator 1, ATPase, H+ transporting, lysosomal V0 subunit A3 |
TTR |
TTR (transthyretin) (no longer used: CTS1) |
transthyretin (no longer used: carpal tunnel syndrome 1) |
TUB: tubulin (named for gene product) |
TUBAC3 |
tubulin alpha 3Cα2-tubulin |
ZNF: zinc finger protein |
ZNF160 |
zinc finger protein 160 |
When a gene name or symbol has been changed, both the new and former names (the latter known as the previous name) are available in gene databases.5,6,8 Authors should use the most up-to-date name. The previous symbol may be included parenthetically at first mention:
CYP2A6 (previously CYP2A3)
SOD1 (previously ALS and ALS1)
ERBB2 (previously HER2/neu)
14.6.2.3 Glossary of Genomic Terms.
To help clinicians understand the latest developments in genetics so that they can make the most informed decisions for their patients, in 2017 JAMA began a series entitled Genomics and Precision Health. Associated with this ongoing series is a glossary of genomics terms. This may be accessed at https://sites.jamanetwork.com/genetics/#glossary.15
14.6.2.4 Writing About Genes: Italicizing Gene Symbols.
Observing the rule of italicizing gene symbols makes clear whether the writer is referring to a gene or to another entity that might be confused with a gene.
In any discussion of a gene, it is recommended that the approved gene symbol be mentioned at some point, preferably in the title and abstract if relevant. However, the gene symbol need not be mentioned every time the writer refers to the gene. Authors may refer to genes (or gene loci) by their official gene names or other descriptive expression. Any of these is acceptable, depending on context and syntax. Of names, descriptions, and symbols, only the gene symbol is italicized. Examples are given in Table 14.6-11.
Table 14.6-11. Examples of Expressions of Gene Symbols
Gene symbol |
Gene description |
Acceptable expression |
BRCA1 |
breast cancer 1, early-onset gene |
the breast and ovarian cancer susceptibility gene |
CFTR |
cystic fibrosis transmembrane conductance regulator gene |
the cystic fibrosis locus |
F8 |
coagulation factor VIII, procoagulant component (hemophilia A) gene |
the factor VIII locus |
F8 |
coagulation factor VIII, procoagulant component (hemophilia A) gene |
the hemophilia A locus |
SYN1 |
synapsin I gene |
the gene for synapsin I |
TP53 |
tumor protein p53 (Li-Fraumeni syndrome) gene |
the TP53 gene (p53 is the alias term; the official term should be preferred to the alias) |
In the foregoing examples, the gene names and descriptions are readily distinguishable from the gene symbols. Sometimes, however, the gene symbol may be easily confused with the abbreviation for the product or condition associated with the gene unless the gene symbol is italicized. See, for instance, Table 14.6-12.
Table 14.6-12. Examples of Potentially Confusing Nongene Terms
Gene |
Potentially confusing nongene term |
ABO |
ABO blood group system (see 14.1, Blood Groups, Platelet Antigens, and Granulocyte Antigens) |
APOE |
apoE (apolipoprotein E) |
EPO |
erythropoietin (Epo) |
GRIFIN |
GRIFIN protein (galectin-related interfiber protein) |
HLA-A, HLA-B, etc |
HLA-A, HLA-B, etc (see 14.8.5, HLA/Major Histocompatibility Complex) |
MS |
multiple sclerosis (MS) |
many hormone genes (eg, CRH, GHRH, GNRHR, PTH, TRH) |
hormone name abbreviations (eg, CRH, GHRH, GNRH receptor, PTH, TRH) |
In other expressions, italics distinguish different meanings:
HD |
gene for huntingtin (protein), Huntington disease gene |
HD |
Huntington disease |
Person with HD |
person with Huntington disease |
TH variant |
variant of the TH gene |
TH deficiency |
deficiency of the enzyme TH |
Therefore, it is best to make clear by italicizing gene symbols and through context whether the gene or another entity is being discussed.
Gene symbols do not immediately follow the term in the gene name that they might seem to abbreviate but rather should relate to the word gene, usually following it:
the guanylate cyclase 2D gene, GUCY2D (Not: the guanylate cyclase 2D [GUCY2D] gene)
the Huntington disease gene, HD
the tyrosine hydroxylase gene, TH
The cystic fibrosis transmembrane conductance regulator gene, CFTR, is implicated in cystic fibrosis.
In the following examples, both gene aliases and approved symbols are used; however, authors are encouraged to use the approved name (see 13.11, Clinical, Technical, and Other Common Terms):
the retinal guanylate cyclase 2D (GUCY2D) gene, GUCY2D
the retinal guanylate cyclase 2D (RetGC1) gene, GUCY2D (Not: the guanylate cyclase 2D [GUCY2D] gene)
In discussions of variants, the gene symbol remains italicized; specific variants, however, are not italicized (see 14.6.1, Nucleic Acids and Amino Acids):
ADRB2 46G>A
variant of the GUCY2D gene
variant of GUCY2D
GUCY2D variant
The objective of this study was to describe the phenotype in 4 families with dominantly inherited cone-rod dystrophy, 1 with an R838C variant and 1 with an R838H variant in the guanylate cyclase 2D gene (GUCY2D) encoding retinal guanylate cyclase 1.
LRP5v171: valine substitution at codon 171 of the LRP5 gene
In gene mapping, when the order of genes along the chromosome is known, the genes are listed from short-arm end (pter) to the centromere (cen) or long-arm end (qter) (see 14.6.4, Human Chromosomes).
pter-ENO1-PGM1-AMY1-cen
In gene mapping, when the order of genes along the chromosome is not known, the genes are listed alphabetically and parentheses are used:
pter-PGD-AK2-(ACTA,APOA2,REN)-qter
Table 14.6-13 presents some examples of gene names and symbols from fields covered elsewhere in this chapter.
Table 14.6-13. Gene Names and Symbols From Fields Covered Elsewhere in This Chapter
Approved gene symbol |
Gene description |
14.1, Blood Groups, Platelet Antigens, and Granulocyte Antigens |
|
A4GALT |
α-1,4-galactosyltransferase (P blood group) |
ABO |
ABO blood group (transferase A, α-1-3-N-acetylgalactosaminyltransferase; transferase B, α-1-3-galactosyltransferase) |
ACHE |
acetylcholinesterase (Cartwright blood group) |
ACKR1 (was atypical DARC) |
chemokine receptor 1 (Duffy blood group) |
AQP1 (was CO) |
aquaporin 1 (Colton blood group) |
ART4 (was DO) |
ADP-ribosyltransferase 4 (Dombrock blood group) |
BCAM (was LU) |
basic cell adhesion molecule (Lutheran blood group) |
BSG |
basigin (OK blood group) |
C4A |
complement 4A (Rodgers blood group) |
C4B |
complement 4B (Chido blood group) |
CD44 |
CD44 molecule (Indian blood group) |
CD151 (was MER2) |
CD151 molecule (Raph blood group) |
CR1 |
complement C3b/C4b receptor 1 (Knops blood group) |
CD55 (was DAF) |
CD55 molecule (Cromer blood group) |
ERMAP (was SC) |
erythroblast membrane-associated protein (Scianna blood group) |
FUT1 |
fucosyltransferase 1 (H blood group) |
FUT3 |
fucosyltransferase 3 (Lewis blood group) |
GYPA |
glycophorin A (MNS blood group) |
GYPB |
glycophorin B (MNS blood group) |
GYPC |
glycophorin C (Gerbich blood group) |
GYPE |
glycophorin E |
ICAM4 |
intercellular adhesion molecule 4 (Landsteiner-Wiener blood group) |
KEL |
Kell blood group |
P1 |
P blood group (P1 antigen) |
RHCE |
Rh blood group, CcEe antigens |
RHD |
Rh blood group, D antigen |
SLC4A1 |
solute carrier family 4, member 1 (Diego blood group) |
SLC14A1 |
solute carrier family 14, member 1 (Kidd blood group) |
XG |
Xg blood group |
XK |
Kell blood group precursor (McLeod phenotype) |
14.2, Cancer (See 14.6.3, Oncogenes and Tumor Suppressor Genes) |
|
ACTN1 |
α1-actinin, actin alpha 1 |
ACTN2 |
α2-actinin, actin alpha 2 |
BCL2 |
B-cell/CLL lymphoma 2 |
BCL7A |
BCL tumor suppressor 7A |
CCND1 (formerly BCL1) |
cyclin D1 |
CDC2 |
cell division cycle 2, G1 to S and G2 to M |
CDK2 |
cyclin-dependent kinase 2 |
CDKN1A |
cyclin-dependent kinase inhibitor 1A |
CTNNB1 |
catenin beta 1 |
MEN1 |
menin 1 |
RB1 |
RB transcriptional copressor 1 |
RET (formerly MEN2A, MEN2B) |
ret proto-oncogene |
TGFA |
transforming growth factor alpha |
TGFB1 |
transforming growth factor beta 1 |
TNF |
tumor necrosis factor receptor superfamily |
TNFRSF1A |
TNF receptor superfamily member 1A |
TP53 |
tumor protein p53 |
14.3, Cardiology |
|
ANK2 (formerly LQT4) |
ankyrin 2 |
APOA1 |
apolipoprotein AI |
APOB |
apolipoprotein B |
APOC2 |
apolipoprotein C2 |
APOD |
apolipoprotein D |
APOE |
apolipoprotein E |
GPR1 |
G protein—coupled receptor 1 |
HDLBP |
high-density lipoprotein-binding protein |
KCNH2 (formerly LQT2) |
potassium voltage-gated channel, subfamily H, member 2 |
KCNQ1 (formerly LQT) |
potassium voltage-gated channel subfamily Q member 1 |
LDLR |
low-density lipoprotein receptor |
LPL |
lipoprotein lipase |
NOS1 |
nitric oxide synthase 1 |
NOS2 |
nitric oxide synthase 2 |
NOS2P2 |
nitric oxide synthase 2 pseudogene 2 |
NOS2P1 |
nitric oxide synthase 2 pseudogene 1 |
NOS3 |
nitric oxide synthase 3 |
PLAT |
plasminogen activator, tissue type |
SCN5A (formerly LQT3) |
sodium voltage-gated channel alpha subunit 5 |
TNNC1 |
troponin C1, slow skeletal and cardiac type |
TNNC2 |
troponin C2, fast skeletal type |
TNNI1 |
troponin I1, slow skeletal type |
TNNI2 |
troponin I2, fast skeletal type |
TNNI3 |
troponin I3, cardiac type |
TNNT1 |
troponin T1, slow skeletal type |
TNNT2 |
troponin T2, cardiac type |
TNNT3 |
troponin T3, fast skeletal type |
VLDLR |
very-low-density lipoprotein receptor |
14.7, Hemostasis |
|
A2M |
α2-macroglobulin |
CALM1 |
calmodulin 1 |
CCL5 |
chemokine (C-C motif), ligand 5 |
CLEC3B (was TNA) |
C-type lectin domain family 3, member B |
F2 |
coagulation factor II (thrombin) |
F2R |
coagulation factor II thrombin receptor |
F2RL1 |
F2R-like trypsin receptor 1 |
F3 |
coagulation factor III, tissue factor |
F5 |
coagulation factor V |
F7 |
coagulation factor VII |
F7R |
coagulation factor VII regulator |
F8 |
coagulation factor VIII |
F8A1 |
coagulation factor VIII associated 1 |
F9 |
coagulation factor IX |
F10 |
coagulation factor X |
F11 |
coagulation factor XI |
F12 |
coagulation factor XII |
F13A1 |
coagulation factor XIII, A chain |
F13A2 |
coagulation factor XIII, A2 polypeptide |
F13B |
coagulation factor XIII, B chain |
FGA |
fibrinogen, α chain |
FGB |
fibrinogen, β chain |
FGG |
fibrinogen, γ chain |
FGL1 |
fibrinogenlike 1 |
FGL2 |
fibrinogenlike 2 |
GP5 |
glycoprotein V (platelet) |
GP6 |
glycoprotein VI (platelet) |
GP9 |
glycoprotein IX (platelet) |
GP1BA |
glycoprotein Ib, (platelet), alpha subunit |
ICAM1 |
intercellular adhesion molecule 1 |
ICAM2 |
intercellular adhesion molecule 2 |
ITGA1 |
α1-integrin integrin subunit alpha 1 |
ITGA2 |
α2-integrin integrin subunit alpha 2 |
ITGA2B |
integrin subunit alpha 2B |
ITGA3 |
α3-integrin integrin subunit alpha 3 |
ITGA6 |
α6-integrin integrin subunit alpha 6 |
ITGAV |
vitronectin, α polypeptide, antigen V |
ITGB1 |
integrin subunit beta 1 |
ITGB3 |
integrin subunit beta 3 |
ITPKA |
Inositol-triphosphate 3-kinase A |
KLKB1 |
kallikrein B1 |
KNG1 |
kininogen 1 |
NOS3 |
nitric oxide synthase 3 |
PDGFA |
platelet-derived growth factor subunit A |
PDGFC |
platelet-derived growth factor C |
PDGFRA |
platelet-derived growth factor receptor alpha |
PDGFRL |
platelet-derived growth factor receptor-like |
PECAM1 |
platelet and endothelial cell adhesion molecule 1 |
PLAT |
plasminogen activator, tissue type |
PLAU |
plasminogen activator, urokinase |
PLAUR |
plasminogen activator, urokinase receptor |
PLG |
plasminogen |
PLGLA1 |
plasminogenlike A |
PLGLB1 |
plasminogenlike B1 |
PPBP |
proplatelet basic protein |
PROC |
protein C |
PROS1 |
protein S |
PROSP |
protein S pseudogene |
PROZ |
protein Z, vitamin K—dependent plasma glycoprotein |
PTGDR |
prostaglandin D2 receptor |
PTGDS |
prostaglandin D2 synthase |
PTGFR |
prostaglandin F receptor |
PTGFRN |
prostaglandin F2 receptor inhibitor |
PTGIR |
prostaglandin I2 (prostacyclin) receptor |
PTGIS |
prostaglandin I2 synthase |
PTGS1 |
prostaglandin-endoperoxide synthase 1 |
SELE |
selectin E |
SELP |
selectin P |
SERPINA1 |
serpin family A, member 1 |
SERPINC1 |
serpin family C, member 1 |
SERPINE1 |
serpin family E, member 1 |
SERPINF2 |
serpin family F, member 2 |
TBXA2R |
thromboxane A2 receptor |
TBXAS1 |
thromboxane A synthase 1 |
TFPI |
transferrin pseudogene 1 |
TFPI2 |
tissue factor pathway inhibitor 2 |
THBD |
thrombomodulin |
VCAM1 |
vascular cell adhesion molecule 1 |
VWF |
von Willebrand factor |
VWFP |
von Willebrand factor pseudogene 1 |
14.8, Immunology |
|
14.8.1, Chemokines |
|
CCL1 |
C-C motif chemokine ligand 1 |
CX3CL1 |
C-X3-C motif chemokine ligand 1 |
CXCL1 |
C-X-C motif chemokine ligand 1 |
PF4 |
platelet factor 4 |
XCL1 |
X-C motif chemokine ligand 1 |
14.8.2, CD Cell Markers |
|
CD14 |
CD14 molecule |
CD19 |
CD19 molecule |
CD1A |
CD1a molecule |
CD3D |
CD3D molecule |
CD46 |
CD46 molecule |
CD55 |
CD55 molecule (Cromer blood group) |
CD6 |
CD6 molecule |
CD79A |
CD79A molecule |
CD97 |
CD97 molecule |
CR1 |
complement C3b/C4b receptor type 1 (Knops blood group) |
FCGR3A |
Fc fragment of IgG receptor IIIa |
ICAM3 |
intracellular adhesion molecule 3 |
MME |
membrane metalloendopeptidase |
14.8.3, Complement |
|
C1QA |
complement C1q A chain |
C1QB |
complement C1q B chain |
C1QBP |
complement C1q binding protein |
C1R |
complement C1r |
C1S |
complement C1s |
C2 |
complement C2 |
C3 |
complement C3 |
C4A |
complement C4a (Rodgers blood group) |
C4B |
complement C4b (Chido blood group) |
C4BPA |
complement component 4, binding protein alpha |
C5 |
complement component C5 |
C5AR1 |
complement C5a receptor 1 |
C6 |
complement C6 |
C7 |
complement C7 |
C8A |
complement C8, alpha chain |
C8B |
complement C8, beta chain |
C9 |
complement C9 |
CD55 (was DAF) |
CD55 molecule (Cromer blood group) |
CFH |
complement factor H |
CFP |
complement factor properdin |
14.8.4, Cytokines |
|
CRLF1 |
cytokine receptorlike factor 1 |
CRLF2 |
cytokine receptorlike factor 2 |
CSF1 |
colony-stimulating factor 1 |
CSF2 |
colony-stimulating factor 2 |
CSF3 |
colony-stimulating factor 3 |
CSF3R |
colony-stimulating factor 3 receptor |
EPO |
erythropoietin |
EPOR |
erythropoietin receptor |
GH1 |
growth hormone 1 |
GH2 |
growth hormone 2 |
GHR |
growth hormone receptor |
IFNA1 |
interferon alpha 1 |
IFNA2 |
interferon alpha 2 |
IFNB1 |
interferon beta 1 |
IFNG |
interferon gamma |
IFNW1 |
interferon omega 1 |
IL1A |
interleukin 1 alpha |
IL1B |
interleukin 1 beta |
IL1R1 |
interleukin 1 receptor type 1 |
IL1R2 |
interleukin 1 receptor type 2 |
IL1RAP |
interleukin 1 receptor accessory protein |
IL1RN |
interleukin 1 receptor antagonist |
IL2 |
interleukin 2 |
LEP |
leptin |
LEPR |
leptin receptor |
PRL |
prolactin |
SOCS1 |
suppressor of cytokine signaling 1 |
TGFA |
transforming growth factor alpha |
TGFB1 |
transforming growth factor beta 1 |
THPO |
thrombopoietin |
TNF |
tumor necrosis factor |
14.8.5, HLA/Major Histocompatibility Complex |
|
HLA-A |
HLA-A, major histocompatibility complex, class I, A |
HLA-B |
HLA-B, major histocompatibility complex, class I, B |
HLA-C |
HLA-C, major histocompatibility complex, class I, C |
HLA-DMA |
major histocompatibility complex, class II, DM alpha |
HLA-DMB |
major histocompatibility complex, class II, DM beta |
HLA-DOA |
major histocompatibility complex, class II, DO alpha |
HLA-DOB |
major histocompatibility complex, class II, DO beta |
HLA-DPA1 |
major histocompatibility complex, class II, DP alpha |
HLA-DQA1 |
major histocompatibility complex, class II, DQ alpha |
HLA-DQB1 |
major histocompatibility complex, class II, DQ beta |
HLA-DRA |
major histocompatibility complex, class II, DR alpha |
HLA-DRB1 |
major histocompatibility complex, class II, DR beta 1 |
HLA-E |
major histocompatibility complex, class I, E |
HLA-F |
major histocompatibility complex, class I, F |
HLA-G |
major histocompatibility complex, class I, G |
HLA-H |
major histocompatibility complex, class I, H |
HLA-J |
major histocompatibility complex, class I, J |
14.8.6, Immunoglobulins |
|
IGHA1 |
immunoglobulin heavy constant alpha 1 |
IGHA2 |
immunoglobulin heavy constant alpha 2 |
IGHD |
immunoglobulin heavy constant delta |
IGHD1-1 |
immunoglobulin heavy diversity 1-1 |
IGHE |
immunoglobulin heavy constant epsilon |
IGHG1 |
immunoglobulin heavy constant gamma 1 |
IGHG2 |
immunoglobulin heavy constant gamma 2 |
IGHG3 |
immunoglobulin heavy constant gamma 3 |
IGHG4 |
immunoglobulin heavy constant gamma 4 |
IGHJ1 |
immunoglobulin heavy joining 1 |
IGHM |
immunoglobulin heavy constant mu |
IGHV1-2 |
immunoglobulin heavy variable 1-2 |
IGHV1-18 |
immunoglobulin heavy variable 1-18 |
IGKC |
immunoglobulin kappa constant |
IGKJ2 |
immunoglobulin kappa joining 2 |
IGKV1-5 |
immunoglobulin kappa variable 1-5 |
IGLC1 |
immunoglobulin lambda constant 1 |
IGLJ1 |
immunoglobulin lambda joining 1 |
IGLV10-54 |
immunoglobulin lambda variable 10-54 |
14.8.7, Lymphocytes |
|
TRAC |
T-cell receptor alpha constant |
TRBC1 |
T-cell receptor beta constant 1 |
TRBC2 |
T-cell receptor beta constant 2 |
TRBV10-3 |
T-cell receptor beta variable 10-3 |
TRGC1 |
T- cell receptor gamma constant 1 |
TRGJ1 |
T-cell receptor gamma joining 1 |
TRGJ2 |
T-cell receptor gamma joining 2 |
TRDC |
T-cell receptor delta constant |
14.10, Molecular Medicine |
|
APBA1 |
amyloid-β precursor protein binding family A, member 1 |
ADIPOQ |
adiponectin, C1Q, and collagen domain containing |
ADIPOR1 |
adiponectin receptor 1 |
ADIPOR2 |
adiponectin receptor 2 |
ACSL1 |
acyl-CoA synthetase long-chain family member 1 |
ADAMTS1 |
ADAM metallopeptidase with thrombospondin type 1 motif 1 |
AHCY |
adenosylhomocysteine |
AMD1 |
adenosylmethionine decarboxylase 1 |
AKT1 |
AKT serine/threonine kinase 1 |
ATP1A1 |
ATPase, Na+/K+ transporting subunit, alpha 1 polypeptide |
BPGM |
bisphosphoglycerate mutase |
CALM1 |
calmodulin 1 |
CCAR1 |
cell division cycle and apoptosis regulator 1 |
CCPG1 |
cell cycle progression 1 |
CDK20 |
cyclin dependent kinase |
CDC2 |
cyclin dependent kinase 2 |
CDK2 |
cyclin-dependent kinase 2 |
CDK7 |
cyclin-dependent kinase 7 |
CDKN1A |
cyclin-dependent kinase inhibitor 1A |
CDKN1C |
cyclin-dependent kinase inhibitor 1C |
CDKN2A |
cyclin-dependent kinase inhibitor 2A |
COASY |
coenzyme A (CoA) synthetase |
COX4I1 |
cytochrome c oxidase subunit 4I1 |
COX5B |
cytochrome c oxidase subunit 5b |
CRP |
C-reactive protein |
CYP1A2 |
cytochrome P450 family 1, subfamily A, member 2 |
DHFR |
dihydrofolate reductase |
DKK1 |
dickkopf WNT signaling pathway, inhibitor 1 |
ERBB2 |
erb-b2 receptor tyrosine kinase 2 |
FBP1 |
fructose bisphosphatase 1 |
FDX1 |
ferredoxin 1 |
FDX2 |
ferredoxin 2 |
FHIT |
fragile histidine triad |
GNA12 |
G protein subunit alpha 12 |
GNG2 |
G protein subunit gamma 2 |
GALNT1 |
polypeptide N-acetylgalactosaminyltransferase 1 |
G6PD |
glucose-6-phosphate dehydrogenase |
B3GALT1 |
beta-1,3-galactosyltransferase |
CDKN2A |
cyclin-dependent kinase inhibitor 2A |
GFI1 |
growth factor independent 1 transcriptional repressor |
GRB2 |
growth factor receptor-bound protein 2 |
GRIN1 |
glutamate ionotropic receptor, N-methyl-D-aspartate (NMDA) type, subunit 1 |
HBA1 |
hemoglobin type, subunit alpha 1 |
HBB |
hemoglobin subunit beta |
HMGCS1 |
3-hydroxy-3-methylglutaryl CoA synthase 1 |
IGF1 |
insulinlike growth factor 1 |
IGF1R |
insulinlike growth factor 1 receptor (IGF-R1) |
IKBKB |
inhibitor of nuclear factor kappa B kinase, subunit beta |
ITPKA |
inositol-triphosphate 3-kinase A |
MNAT1 |
CDK activating kinase assembly factor |
MB |
myoglobin |
MCM2 |
minichromosome maintenance complex, component 2 |
NMNAT1 |
nicotinamide nucleotide adenyltransferase 1 |
NPY |
neuropeptide Y |
NPPA |
natriuretic peptide |
OGDH |
oxoglutarate dehydrogenase |
INPP5J |
inositol polyphosphate-5-phosphatase J |
PYY |
peptide YY |
RBBP4 |
RB binding protein 4 |
RNASE1 |
ribonuclease A family member 1 pancreatic |
SFPQ |
splicing factor proline and glutamine rich |
SNCA |
synuclein alpha |
TAF1 |
TATA-box binding protein associated factor 1 |
TBP |
TATA-box binding protein |
THPO |
thrombopoietin |
TNFSF11 |
TNF superfamily member 11 |
TP53 |
tumor protein p53 |
UCP1 |
uncoupling protein 1 |
WNT1 |
Wnt family member 1 |
14.11, Neurology |
|
ASIC2 |
acid sensing ion channel subunit 2 |
ACHE |
acetylcholinesterase (Cartwright blood group) |
ADORA1 |
adenosine A1 receptor |
ADRA1A |
adrenoreceptor alpha 1A |
ADRB1 |
adrenoreceptor beta 1 |
BDNF |
brain-derived neurotrophic factor |
CACNA1A |
calcium voltage-gated channel subunit alpha 1A |
CHRM1 |
cholinergic receptor, muscarinic 1 |
CHRNA1 |
cholinergic receptor, nicotinic, alpha 1 subunit |
CNTF |
ciliary neurotrophic factor |
COMT |
catechol-O-methyltransferase |
DRD1 |
dopamine receptor D1 |
EGF |
epidermal growth factor |
GABBR1 |
gamma-aminobutyric acid type B receptor subunit 1 |
GDNF |
glial cell line—derived neurotrophic factor |
GRIA1 |
glutamate inotropic receptor AMPA type, subunit 1 |
GRIN1 |
glutamate ionotropic receptor, NMDA type, subunit 1 |
HRH1 |
histamine receptor H1 |
HTR1A |
5-hydroxytryptamine receptor 1A |
ITPKA |
inositol triphosphate 3-kinase A |
KCNJ3 |
potassium voltage-gated channel, subfamily J, member 3 |
MAOA |
monoamine oxidase A |
NGF |
nerve growth factor |
NGFR |
nerve growth factor receptor |
NMB |
neuromedin B |
NOS1 |
nitric oxide synthase 1 |
NPY |
neuropeptide Y |
NPY1R |
neuropeptide Y receptor Y1 |
NRTN |
neurturin |
NTF3 |
neurotrophin 3 |
NTS |
neurotensin |
NTSR1 |
neurotensin receptor 1 |
OPRD1 |
opioid receptor delta 1 |
OPRK1 |
opioid receptor kappa 1 |
OPRM1 |
opioid receptor mu 1 |
SIGMAR1 |
sigma nonopioid intracellular receptor 1 |
PCP2 |
Purkinje cell protein 2 |
SLC1A1 |
solute carrier family 1, member 1 |
SLC18A1 |
solute carrier family 18, member A1 |
SNAP25 |
synaptosomal-associated protein, 25 kDa |
SNCA |
synuclein alpha |
TAC1 |
tachykinin, precursor 1 |
TAC3 |
tachykinin 3 |
TRPA1 |
transient receptor potential cation channel, subfamily A, member 1 |
TSNARE1 |
t-SNARE domain containing 1 (see 14.11, Neurology, for expansion) |
VAMP1 |
vesicle-associated membrane protein 1 |
14.14.3 and 14.14.4, Virus and Prion Nomenclature |
|
AAVS1 |
adeno-associated virus integration site 1 |
BNIP1 |
BLC2 interacting protein 1 |
CR2 |
complement component C3d receptor 2 |
CXADR |
CXADR, Ig-like cell adhesion molecule |
CXB3S |
coxsackie virus B3 sensitivity |
E11S |
ECHO virus (serotypes 4, 6, 11, 19) sensitivity |
GPR183 |
G protein—coupled receptor 183 |
EBVM1 |
Epstein-Barr virus modification site 1 |
EBVS1 |
Epstein-Barr virus integration site 1 |
HAVCR1 |
hepatitis A virus cellular receptor 1 |
RSF1 |
remodeling and spacing factor 1 |
LAMTOR5 |
late endosomal/lysosomal adaptor, MAPK and MTOR activator 5 |
HCVS |
human coronavirus sensitivity |
CCNT1 |
cyclin T1 |
HPV6AI1 |
human papillomavirus (type 6a) integration site 1 |
FOXN2 |
forkhead box N2 |
HV1S |
herpes simplex virus type 1 sensitivity |
ICAM1 |
intercellular adhesion molecule 1 |
MX1 |
MX dynam-like GTPase 1 |
PVR |
poliovirus receptor |
PRND |
prion-like protein doppel |
PRNP |
prion protein |
PRNPIP |
prion protein interacting protein |
PRNT |
prion locus IncRNA, testis expressed |
14.6.2.5 Alleles.
Alleles denote alternative forms of a gene. Alleles are often characterized by particular variant sequences (mutations). For variant sequence nomenclature see 14.6.1, Sequence Variations, Nucleotides.
Because alleles are alternative forms of a particular gene, they are expressed by means of both the gene name or symbol and an appendage that indicates the specific allele.
Classically, allele symbols consist of the gene symbol plus an asterisk plus the italicized allele designation.7 For example:
HBB*S S allele of the HBB gene
As with gene terms, Greek letters are changed to Latin letters in allele terms:
APOE*E4 allele producing the ε4 type of apolipoprotein E
See HGNC guidelines for Greek to Latin alphabet conversion.16 If clear in context, the allele symbol may be used in a shorthand form that omits the gene symbol and includes only the asterisk and the allele designation that follows. For example:
*S
*E4
In the case of alleles of the major histocompatibility locus, which are not italicized (see 14.8.5, HLA/Major Histocompatibility Complex), each HLA allele name has a unique number corresponding to up to 4 sets of digits separated by colons.17 The digits before the first colon describe the type (this often corresponds to the serologic antigen carried by an allotype). The next set of digits list the subtype (numbers are assigned in the order in which DNA sequences have been determined). A portion of the gene name is usually included in the shortened form:
Full name: HLA-DRB1:03:01
Shortened form: DRB1:03:01
In practice, common or trivial names for alleles, which take various forms, are used. The same allele is often expressed in different ways that diverge from the recommended nomenclature. For example:
s: short allele of serotonin transporter gene (SLC6A4)
l: long allele of SLC6A4
As another example of common allele names, the following expressions are all used for APOE*E4; follow author preference:
ε4 allele
epsilon 4 allele
E4 allele
APOE*4
apo e4
APOEE4
14.6.2.5.1 Genotype and Phenotype Terminology.
The genotype comprises the set of alleles in an individual. Because individuals almost always have 2 of each autosome (nonsex chromosome) (see 14.6.4, Human Chromosomes), individuals have 2 alleles (which may be the same alleles or 2 different alleles) for each autosomal gene.
The simplest genotype term for an individual would describe 1 gene and consist of the names of 2 alleles. Larger genotypes would contain 2 or more allele symbol pairs.
As originally formulated in ISGN, allele groupings may be indicated by placement above and below a horizontal line or on the line. As seen in the following examples (from Shows et al2,3), such placement, as well as order, spacing, and punctuation marks (virgules [/], semicolons, spaces, and commas), has specific meanings.
Alleles of the same gene are indicated by placement above and below a horizontal line or with a virgule:
ADA*1 or ADA*1/ADA*2
ADA*2
In theoretical discussions when a single letter is substituted for the allele symbol, the line or virgule may be dispensed with:
AA
Aa
aa
ss
ll
sl
Semicolons separate pairs of alleles at unlinked loci:
ADA*1, ADH1*1, AMY1*A
ADA*2 ADH1*1 AMY1*B
or
ADA*1/ADA*2; ADH1*1/ADH1*1; AMY1*A/AMY1*B
or
ADA*1/*2; ADH*1/*1; AMY1*A/*B
A single space separates alleles together on the same chromosome from alleles together on another chromosome (phase [assignment of alleles of genes on the same or different chromosomal copy] known):
AMY1*A PGM1*2
AMY1*B PGM1*1
or
AMY1*A PGM1*2/AMY1*B PGM1*1
Commas indicate that alleles above and below the line (or on either side of the virgule) are on the same chromosome pair but not on which chromosome of the pair specifically (phase unknown):
PGM1*1, AMY1*A
PGM1*2 AMY1*B
or
PGM1*1/PGM1*2, AMY1*A/AMY1*B
A special form for hemizygous males is
G6PD*A/Y
When genotype is being expressed in terms of nucleotides (eg, a polymorphism), italics and other punctuation are not needed (see 14.6.1, Nucleic Acids and Amino Acids):
MTHFR 677 TT genotype
CC genotype
the “long/short” (5HTTLPR) polymorphism in SLC6A4
(LPR: length polymorphism region)
When the subject is being described in terms of the 2 possible amino acids at 1 position in the protein owing to a single-nucleotide variation (formerly single-nucleotide polymorphism) (nonsynonymous mutation), the corresponding amino acids are separated by a virgule (see 14.6.1, Nucleic Acids and Amino Acids):
Val/Val |
(homozygous) |
Met/Val |
(heterozygous) |
Met/Met |
(homozygous) |
Such terms should be explained at first mention with the amino acid terms expanded:
the common methionine/valine (Met/Val) polymorphism at codon 129
The virgule is not needed in expressions such as the following:
α1-antitrypsin MZ heterozygotes
individuals with the ZZ phenotype
The phenotype is the collection of traits in an individual that result from his or her genotype. Genotypes usually contain pairs of symbols, whereas phenotypes contain single symbols. When phenotypes are expressed in terms of the specific alleles, the phenotype term derives from the genotype term, but no italics are used, and, instead of asterisks, spaces are used.18
Genotype: ADA*1/ADA*1
Phenotype: ADA 1
Genotype: ADA*1/ADA*2
Phenotype: ADA 1, 2
Genotype: C2*C/C2*QO
Phenotype: C2 C, QO
The normal allele of a gene is identified by adding *N. Adding *D or *R to a gene symbol designates a dominant or recessive allele, respectively.
Genotype: CFTR*N/CFTR*R
Phenotype: CFTR N
Principal Author: Cheryl Iverson, MA
Acknowledgment
Thanks to the following for reviewing and providing comments: W. Gregory Feero, MD, PhD, JAMA, and Maine-Dartmouth Family Medicine Residency, Augusta; Trevor Lane, MA, DPhil, Edanz Group, Fukuoka, Japan; and John J. McFadden, MA, JAMA Network.
References
1.Klinger HP. Progress in nomenclature and symbols for cytogenetics and somatic- cell genetics. Ann Intern Med. 1979;91(3):487-488. doi:10.7326/0003-4819- 91-3-487
2.Shows TB, Alper CA, Bootsma D, et al. International system for human gene nomenclature (1979). Cytogenet Cell Genet. 1979;25(1-4):96-116. doi:10.1159/000131404
3.Shows TB, McAlpine PJ, Boucheix C, et al. Guidelines for human gene nomenclature: an international system for human gene nomenclature (ISGN, HGM9). Cytogenet Cell Genet. 1987;46(1-4):11-28. doi:10.1159/000132471
4.Rangel P, Giovannetti J. Genomes and Databases on the Internet: A Practical Guide to Functions and Applications. Horizon Scientific Press; 2002.
5.HUGO Gene Nomenclature Committee. Accessed July 31, 2019. https://www.genenames.org/
6.Gray KA, Yates B, Seal RL, Wright MW, Bruford EA. Genenames.org: the HGNC resources in 2015. Nucleic Acids Res. 2015;43(database issue):D1079-D1085. doi:10.1093/nar/gku1071
7.Wain HM, Bruford EA, Lovering RC, Lush MJ, Wright MW, Povey S. Guidelines for human gene nomenclature (2002). Genomics. 2002;79(4):464-470. doi:10.1006/geno.2002.6748
8.Entrez Gene. Accessed January 9, 2018. https://www.ncbi.nlm.nih.gov/gene
9.Locus Reference Genomic (LRG). Accessed July 23, 2019. https://www.lrg-sequence.org
10.Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledge base of human genes and genetic disorders. Nucl Acids Res. 2005;33(database issue):D514-D517. doi:10.1093/nar/gki033
11.Online Mendelian Inheritance in Man (OMIM). Updated July 22, 2019. Accessed July 23, 2019. https://omim.org
12.OMIM Frequently Asked Questions. Accessed July 23, 2019. https://omim.org/help/faq
13.HGNC. FAQs about gene nomenclatures. Accessed January 9, 2018. https://www.genenames.org/help/faq/
14.GenBank. Updated November 2017. Accessed July 23, 2019. https://www.ncbi.nlm.nih.gov/genbank/
15.Glossary of genetic terms. Accessed July 31, 2019. https://sites.jamanetwork.com/genetics/#glossary
16.HGNC guidelines. Table 1: Greek to Latin alphabet conversion. Accessed July 23, 2019. https://www.genenames.org/about/guidelines
17.Nomenclature for factors of the HLA system. Updated June 7, 2018. Accessed July 23, 2019. https://www.hla.alleles.org/nomenclature/naming.html
18.Pasternak JJ. An Introduction to Human Molecular Genetics: Mechanisms of Inherited Disease. 2nd ed. Published January 27, 2005. Accessed June 13, 2018. http://www.wiley.com/WileyCDA/WileyTitle/product_Cd0471474266.html
14.6.3 Oncogenes and Tumor Suppressor Genes.
Oncogenes and tumor suppressor genes are 2 of the main types of genes that play a central role in cancer. “An important difference between oncogenes and tumor suppressor genes is that oncogenes result from the activation (turning on) of proto-oncogenes, but tumor suppressor genes cause cancer when they are inactivated (turned off).”1
14.6.3.1 Oncogenes.
An oncogene is a “mutated gene that contributes to the development of a cancer. In their normal, unmutated state, oncogenes are called proto-oncogenes, and they play a role in the regulation of cell division.”2 Oncogenes were discovered and characterized in viruses and animal experimental systems. These genes exist widely outside the systems in which they were discovered, and their normal cellular homologues are important in cell division and differentiation.
Human oncogenes should be expressed according to the style for human gene symbols (see 14.6.2, Human Gene Nomenclature). Mouse oncogenes (and other nonhuman oncogenes) should be expressed according to style for mouse gene symbols (see 14.6.5, Nonhuman Genetic Terms). Retroviral oncogenes are expressed in a style typical of microbial genes (see 14.6.5, Nonhuman Genetic Terms), namely, 3 letters, italicized, lowercase. The protein products of the oncogenes (oncoproteins) typically use the same abbreviation as the oncogene but in roman type. In humans, the protein is all capitals; in mice, the protein has an initial capital. Some examples of human, mouse, and retroviral oncogenes appear in Table 14.6-14.
Table 14.6-14. Human, Mouse, and Retroviral Oncogenes
Retroviral oncogenes |
Human gene homologue(s); mouse gene homologue(s) |
Human protein product(s); mouse protein product(s); retroviral oncoprotein |
Viral origin |
abl |
Human: ABL1, ABL2 Mouse: Abl1, Abl2 |
Human: ABL1, ABL2 Mouse: Abl1, Abl2 Retroviral: abl |
Abelson murine leukemia |
bcl-2 |
Human: BCL2 Mouse: Bcl2 |
Human: BCL2 Mouse: Bcl2 Retroviral: bcl |
B-cell CLL/lymphoma 2 |
erba |
Human: ERBB2, ERBB3, ERBB4 Mouse: Erbb2, Erbb3, Erbb4 |
Human: ERBB2, ERBB3, ERBB4 Mouse: Erbb2, Erbb3, Erbb4 Retroviral: erb |
avian erythroblastic leukemia |
ets |
Human: ETS1, ETS2 Mouse: Ets1, Ets2 |
Human: ETS1, ETS2 Mouse: Ets1, Ets2 Retroviral: ets |
avian erythroblastosis |
fes |
Human: FES Mouse: Fes |
Human: FES Mouse: Fes Retroviral: fes |
Gardner-Arnstein feline sarcoma |
fms |
Human: CSF1R (formerly FMS) Mouse: Csf1r (formerly Fms) |
colony stimulating factor 1 receptor (CSF1R) |
McDonough feline sarcoma |
fos |
Human: FOS, FOSB Mouse: Fos, Fosb |
Human: FOS, FOSB Mouse: Fos, Fosb Retroviral: fos |
FBJ murine osteogenic sarcoma |
jun |
Human: JUN, JUNB, JUND Mouse: Jun, Junb, Jund |
Human: JUN, JUNB, JUND Mouse: Jun, Junb, Jund Retroviral: jun |
avian sarcoma 17 |
kit |
Human: KIT Mouse: Kit |
Human: KIT Mouse: Kit Retroviral: kit |
Hardy-Zuckerman feline sarcoma |
mos |
Human: MOS Mouse: Mos |
Human: MOS Mouse: Mos Retroviral: mos |
Moloney sarcoma |
myb |
Human: MYB Mouse: Myb |
Human: MYB Mouse: Myb Retroviral: myb |
avian myeloblastosis |
myc |
Human: MYC Mouse: Myc |
Human: MYC Mouse: Myc Retroviral: myc |
avian myelocytomatosis |
raf |
Human: RAF1, ARAF, BRAF Mouse: Raf1, Araf, Braf |
Human: RAF1, ARAF1, BRAF Mouse: Raf1, Araf, Braf Retroviral: raf |
3611 murine leukemia |
ras |
Human: family with many human homologues, eg, HRAS, NRAS, RAB9A, RRAS, RRAS2 Mouse: Hras1, Nras, Rab9, Rras, Rras2 |
Human: HRAS1, NRAS, RAB9A, RRAS, RRAS2 Mouse: Rab9a, Rras, Rras2, Hras, Nras, Rab9 Retroviral: ras |
retrovirus-associated DNA sequence |
sis |
Human: PDGFB Mouse: Pdgfb |
Human: PDGFB (platelet-derived growth factor, B chain) Mouse: Pdgfb Retroviral: sis |
simian sarcoma |
src |
Human: SRC Mouse: Src |
Human: SRC Mouse: Src Retroviral: src |
Rous sarcoma |
a See 14.6.3.1.1, ERBB2 and HER2/neu.
Examples of use are as follows:
ras activation and inactivation
protein derived from the ras gene, ras, functions as a signaling molecule
Commonly, the oncogene term contains a prefix that indicates the source or location of the gene: v- for virus or c- for the oncogene’s cellular or chromosomal counterpart. The c- form is also known as a proto-oncogene and in standard gene nomenclature (see 14.6.2, Human Gene Nomenclature) is given in all capitals, as in the Human Gene Homologues column of Table 14.6-14 and the following examples. Note that the v and the c are set roman.
c-abl (ABL1) |
c-mos (MOS) |
v-abl |
v-mos |
The protein product may be similarly prefixed:
c-abl |
c-mos |
v-abl |
v-mos |
Additional prefixes may further identify oncogenes. Note that these prefixes are set roman and are hyphenated. Examples of expansions of some prefixes are given below, but it should not be inferred that the gene in question is associated only with the tumor for which it is named:
B-lym |
B-cell lymphoma |
L-myc |
small cell lung carcinoma |
N-myc |
neuroblastoma |
H-ras |
Harvey rat sarcoma |
K-ras |
Kirsten rat sarcoma |
N-ras |
neuroblastoma |
For example:
The K-ras mutation assay is more sensitive than the conventional histologic diagnosis in detecting minute cancer invasion around the superior mesenteric artery.
Numbers or letters designate genes in a series. For example:
K-ras-2
H-ras-1
erb-b2
14.6.3.1.1 ERBB2 and HER2/neu.
The oncogene known as HER2/neu, which stimulates the growth of breast cancer, is actually ERBB2. HER2 (from human epidermal growth factor receptor 2) and neu are the same as ERBB2 and are current aliases for ERBB2.3 Because the term HER2/neu is widely used and recognized, it may be included in parentheses after the first mention of ERBB2.
ERBB2 (formerly HER2 or HER2/neu)
14.6.3.1.2 Fusion Oncogenes and Oncoproteins.
The result of fusion of an oncogene and another gene is known as a fusion oncogene. The product of a fusion oncogene is a fusion oncoprotein. Terms for fusion oncogenes and their products may use traditional oncogene format or standard human gene format, as in the examples in Table 14.6-15.
Table 14.6-15. Examples of Terms for Fusion Oncogenes and Their Products
Fusion oncogene |
Fusion oncoprotein |
Expansion4 |
bcr-abl |
BCR-ABL |
fusion of the BCR and ABL genes |
c-fos/c-jun |
C-FOS/C-JUN |
protein product of FOS and JUN proto-oncogenes |
gag-onc |
GAG-ONC |
general term for fusion proteins of viral gag (group-specific antigen) gene and oncogene |
gag-jun |
GAG-JUN |
general term for fusion proteins of viral gag (group-specific antigen) gene and oncogene, with JUN representing a specific oncogene |
PML-RARA |
PML-RARα |
promyelocytic leukemia—retinoic acid receptor α |
Example of use in text:
The BCR-ABL fusion oncoprotein is the key driver of pathogenesis in most cases of chronic myelogenous leukemia.
14.6.3.2 Tumor Suppressor Genes.
Tumor suppressor genes are “normal genes that slow down cell division, repair DNA mistakes, or tell cells when to die. . . .When tumor suppressor genes don’t work properly, cells can grow out of control, which can lead to cancer.”1 Examples are given in Table 14.6-16.
Table 14.6-16. Examples of Tumor Suppressor Genes and Their Products
Gene |
Gene product (aliasa) |
Expansion |
CDKN1A |
CDKN1A (p21a) |
cyclin-dependent kinase (CDK) inhibitor 1A |
CDKN1B |
CDKN1B (p27a) |
CDK inhibitor 1B |
CDKN1C |
CDKN1C (p57a) |
CDK inhibitor 1C |
DCC |
DCC, a transmembrane receptor protein |
deleted in colorectal carcinoma |
GLTSCR1 |
glioma tumor suppressor candidate region gene 1 |
|
NF1 |
neurofibromin 1 |
|
RB1 |
Rb protein |
retinoblastoma 1 |
TP53 |
TP53 (p53a) |
a 53-kd protein |
WT1 |
a zinc finger protein |
Wilms tumor 1 (also called Wilms tumor protein) |
a Although these gene symbol aliases or nicknames may still be used by some, use of the approved gene symbol, not the alias, is strongly preferred. Such use will minimize confusion and make it possible to provide links to genome databases for online versions of the article and to facilitate data retrieval in a number of databases. If an author insists on using an alias, provide the alias parenthetically after the approved gene symbol at first mention in text and abstract. This practice will link the two and provide a learning experience for those not yet familiar with the approved gene symbol.
Principal Author: Cheryl Iverson, MA
Acknowledgment
Thanks to the following for reviewing and providing comments: W. Gregory Feero, MD, PhD, JAMA, and Maine-Dartmouth Family Medicine Residency, Augusta, Maine; Trevor Lane, MA, DPhil, Edanz Group, Fukuoka, Japan; and John J. McFadden, MA, JAMA Network.
References
1.American Cancer Society. Oncogenes and tumor suppressor genes. Last revised June 25, 2014. Accessed July 31, 2019. https://www.cancer.org/cancer/cancercauses/geneticsandcancer/genesandcancer/genes-and-cancer-oncogenes-tumor-suppressor-genes.html
2.National Human Research Gene Institute Talking Glossary of Genetic Terms. Accessed June 6, 2018. https://genome.gov/glossary
3.V-ERB-B2 avian erythroblastic leukemia viral oncogene homolog 2; ERBB2. OMIM. Updated September 27, 2016. Accessed July 31, 2019. https://omim.org/entry/164870
4.NCI Dictionary of Cancer Terms. Accessed June 6, 2018. https://www.cancer.gov/publications/dictionaries/cancer-terms?cdrid=561237
14.6.4 Human Chromosomes.
Chromosomes are structures in the cell nucleus that contain short and long arms, joined at the centromere. They are composed of chromatin (chromatin is made up of DNA, RNA, and proteins) that carries genetic information (definition after Nussbaum et al1 and Turnpenny and Ellard2). Structural variation of chromosomes has traditionally been studied from the perspective of direct visualization of bands, using staining techniques. However, sophisticated fluorescent technologies, such as FISH (fluorescence in situ hybridization),3 are now widely in use to probe for structural variations (eg, deletions, duplications, and large-scale copy number variants, as well as insertions, inversions, and translocations)4 (see 14.6.4.4, In Situ Hybridization), leading to important gains in medical diagnosis and research, as well as gene ordering and mapping. Microarray technologies are increasingly being used to detect microdeletions, inversions, deletions, and so on. Sequencing technologies are making gains as well in being able to detect structural variation. Regardless of the development of these technologies, the essential purpose of cytogenetics remains the same: to study genomic organization and the structure, function, and evolution of chromosomes.
Translocations involve a segment of one chromosome being transferred to a nonhomologous chromosome or to a new site on the same chromosome. They are often associated with negative consequences, such as cancer.5
Structural variation in cancer is different from that seen in germline variation and is clearly related to pathogenesis in some cancers (eg, Philadelphia chromosome; see 14.6.4.5, Marker Chromosomes, Derivative Chromosomes, and the Philadelphia Chromosome).
Formalized standard nomenclature for human chromosomes dates from 1960 and, since 1978, has been known as the International System for Human Cytogenetic Nomenclature (ISCN).
Material in this section is based on recommendations in ISCN 2016.6
Human chromosomes are numbered from largest to smallest from 1 to 22. There are 2 additional chromosomes, X and Y. The numbered chromosomes are known as autosomes, and X and Y as the sex chromosomes. Chromosomes can also be grouped based on similar size and centromere position, as follows6(p8):
Group A |
chromosomes 1-3 |
Group B |
chromosomes 4, 5 |
Group C |
chromosomes 6-12, X |
Group D |
chromosomes 13-15 |
Group E |
chromosomes 16-18 |
Group F |
chromosomes 19, 20 |
Group G |
chromosomes 21, 22, Y |
A chromosome may be referred to by number or by group:
chromosome 14
a group D chromosome
14.6.4.1 Chromosome Bands.
Chromosome bands are elicited by multiple staining methods; a band is “a part of a chromosome clearly distinguishable from adjacent parts by virtue of its lighter or darker staining intensity.”6(p9-10) Banding pattern terms in the left-hand column of the following list need not be expanded. Their technique or purpose is shown to the right of the banding pattern.
Q-banding, Q-bands |
quinacrine |
G-banding, G-bands |
Giemsa |
R-banding, R-bands |
reverse Giemsa |
C-banding, C-bands |
constitutive heterochromatin |
T-banding, T-bands |
telomeric |
NORs |
nucleolus organizing regions |
Banding technique codes of several letters provide more information about the banding method. These abbreviations must be expanded, but the letters in the list above (Q, G, R, C, T, NOR) within those terms need not be expanded:
QF |
Q bands by fluorescence |
QFQ |
Q bands by fluorescence using quinacrine |
CBG |
C bands by barium hydroxide using Giemsa stain |
Ag-NOR |
NOR staining, silver nitrate technique |
Figure 14.6-5 shows a chromosome illustrating bands and subbands at different levels of resolution.
Figure 14.6-5 Frequently Altered Chromosome Territories With Significant Associations to Other Territories in the Discovery Set (37 Associations)a
aFrom Bredel et al.7
The short arm is designated by p, for petit, and the long arm by the next letter of the alphabet, q.6(p11) Arm designations follow the chromosome number:
17p |
short arm of chromosome 17 |
3q |
long arm of chromosome 3 |
Xq |
long arm of the X chromosome |
Expressions such as those on the left need not be expanded. It is incorrect to refer to chromosome arms as chromosomes:
Acceptable: |
chromosome arm 17p |
short arm of 17 |
|
17p |
|
Not Acceptable: |
chromosome 17p |
Regions are determined by major chromosome band landmarks. Chromosome arms contain 1 to 4 regions, numbered outward from the centromere. The region number follows the p or the q:
4q3 region 3 of long arm of chromosome 4
The regions are divided into bands, also numbered outward from the centromere. Bands have subdivisions or subbands (these are seen only when the chromosomes are extended). The band number follows the region number, and the subband number follows a period after the band number. When a subband is further subdivided, the sub-subband number follows the subband number without a period or other intervening punctuation. A generic formula for the order shown (with punctuation or no punctuation indicated) is chromosome,arm,region[no punctuation]band[no punctuation].subband[no punctuation]sub-subband. Some examples illustrate this:
11q23 |
chromosome 11, long arm, band 23 (region 2, band 3) |
11q23.3 |
band in above subdivided, resulting in subband 23.3 |
20p11.23 |
chromosome 20, short arm, sub-subband 11.23 (region 1, band 1, subband 2, sub-subband 3) |
It is correct usage to refer to the previous expressions as “band 11q23,” “band 11q23.3,” and “band 20p11.23.”
The centromere is designated band 10, as in the following:
p10 |
(portion of centromere facing short arm) |
q10 |
(portion of centromere facing long arm) |
Visualization of genomic information by chromosome region in humans and other organisms is available at the National Center for Biotechnology Information Genome Data Viewer.8
14.6.4.2 Karyotype.
Karyotype is the chromosome complement of an individual, tissue, or cell line. Karyotype is expressed as the number of chromosomes in a cell, including the sex chromosomes, a description of the sex chromosome composition, and, whenever applicable, any chromosome abnormality.
The karyogram and the idiogram are graphic representations of karyotype. The karyogram is “a systemized array of the chromosomes”6(p7) that has been prepared using methods such as photomicrography. An idiogram is a “diagrammatic representation of a karyotype.”6(p7)
In karyotype expressions, the sex chromosomes, which should always be specified, are separated from the chromosome number by a comma, without an intervening space, as in the following examples:
46,XX |
46 chromosomes (2 each of chromosomes 1-22 and 2 X chromosomes in human female karyotype) |
46,XY |
46 chromosomes (2 each of chromosomes 1-22, 1 X and 1 Y in human male karyotype) |
45,X |
45 chromosomes (2 each of chromosomes 1-22 and 1 X chromosome) (Turner syndrome) |
47,XXY |
47 chromosomes (2 each of chromosomes 1-22, 2 X chromosomes, and 1 Y chromosome) (Klinefelter syndrome) |
47,XYY |
47 chromosomes (2 each of chromosomes 1-22, 1 X chromosome, and 2 Y chromosomes) |
69,XXX |
69 chromosomes (3 each of chromosomes 1-22 and 3 X chromosomes) |
A virgule (forward slash) is used to indicate more than 1 karyotype in an individual, tumor, cell line, and so on:
45,X/46,XX
Descriptions of autosomal chromosome abnormalities are presented after the sex chromosomes and listed in numerical order regardless of aberration type, separated from the sex chromosomes by a comma. For instance, the karyotype of a person with trisomy 21 (Down syndrome) with an extra chromosome 21 is specified as follows:
47,XX,+21
or
47,XY,+21
A karyotype description may contain both constitutional and acquired elements. For instance, the karyotype of a tumor cell from a person with trisomy 21 could show both the constitutional anomaly and an acquired neoplastic anomaly (eg, an acquired extra chromosome 8) and would be expressed as follows:
48,XX,+8,+21c
The lowercase c specifies that the trisomy 21 is constitutional, as distinguished from the acquired trisomy 8.
An individual with more than 1 karyotypic clone may have a mosaic (single-cell origin) karyotype or a chimera (multicell origin) karyotype, which should be specified with a 3-letter abbreviation at first mention of the karyotype. For example:
mos 45,X/46,XY
chi 46,XX/46,XY
Brackets indicate the number of cells observed in a clone:
chi 46,XX[25]/46,XY[10]
A double slash (virgule, forward slash), used in chimeras that result from bone marrow transplants, separates recipient and donor cell lines. Recipient karyotype precedes the double slash, donor karyotype follows the double slash, and either or both may be specified. For example:
46,XY[3]//
//46,XX[17]
46,XY[3]//46,XX[17]
Three cells from the male recipient were identified, along with 17 cells from the female donor.
For details on order in such expressions, consult ISCN 2016.6
Meiotic karyotypes may begin with a term such as MI and contain a haploid or near-haploid number of chromosomes and may (if the sex chromosomes are associated) or may not (if the sex chromosomes are separate) have a comma between X and Y:
MI,23,XY
MI,24,X,Y
14.6.4.3 Chromosome Rearrangements.
The abbreviations and symbols in Table 14.6-17 are used in descriptions of chromosomes, including chromosome rearrangements. The symbols in the list of chromosomes from ISCN 2016 are part of an efficient shorthand that describes the exact changes in a karyotype that contains rearranged chromosomes. In publications that range beyond the field of cytogenetics, the symbols should always be defined.
Table 14.6-17. Chromosome Rearrangement Abbreviations and Symbolsa
Abbreviation |
Explanation |
AI |
first meiotic anaphase |
AII |
second meiotic anaphase |
ace |
acentric fragment |
add |
additional material of unknown origin |
arr |
microarray |
b |
break |
c |
constitutional anomaly |
cen |
centromere |
cgh |
comparative genomic hybridization |
chi |
chimera |
chr |
chromosome |
cht |
chromatid |
cp |
composite karyotype |
cx |
complex rearrangements |
del |
deletion |
der |
derivative chromosome |
dia |
diakinesis |
dic |
dicentric |
dim |
diminished |
dip |
diplotene |
dis |
distal |
dit |
dictyotene |
dmin |
double minute |
dn (de novo) |
chromosome abnormality not inherited |
dup |
duplication |
E |
exchange |
end |
endoreduplication |
enh |
enhanced |
fem |
female |
fis |
centric fission |
fra |
fragile site |
G |
gap |
H |
heterochromatin, constitutive |
hsr |
homogeneously staining region |
I |
isochromosome |
idem |
stemline karyotype in a subclone |
ider |
isoderivative chromosome |
idic |
isodicentric chromosome |
inc |
incomplete karyotype |
ins |
insertion |
inv |
inversion or inverted |
ish |
in situ hybridization |
lep |
leptotene |
MI |
first meiotic metaphase |
MII |
second meiotic metaphase |
mal |
male |
mar |
marker chromosome |
mat |
maternal origin |
med |
medial |
min |
minute acentric fragment |
mos |
mosaic |
neo |
neocentromere |
nuc |
nuclear or interphase |
oom |
oogonial metaphase |
or |
alternative interpretation |
P |
short arm of chromosome |
PI |
first meiotic prophase |
pac |
pachytene |
pat |
paternal origin |
pcc |
premature chromosome condensation |
pcd |
premature centromere division |
prx |
proximal |
ps |
satellited short arm of chromosome |
psu |
pseudo- |
pvz |
pulverization |
q |
long arm of chromosome |
qdp |
quadruplication |
qr |
quadriradial |
qs |
satellited long arm of chromosome |
r |
ring chromosome |
rea |
rearrangement |
rec |
recombinant chromosome |
rev |
reverse, including comparative genomic |
rob |
robertsonian translocation |
roman numerals |
|
I |
univalent structure |
II |
bivalent structure |
III |
trivalent structure |
IV |
quadrivalent structure |
s |
satellite |
sce |
sister chromatid exchange |
sdl |
sideline |
Sl |
stemline |
spm |
spermatogonial metaphase |
stk |
satellite stalk |
subtel |
subtelomeric region |
t |
translocation |
tas |
telomeric association |
ter |
terminal end of chromosome or telomere |
tr |
triradial |
trc |
tricentric chromosome |
trp |
triplication |
upd |
uniparental disomy |
var |
variant or variable region |
xma |
chiasma(ta) |
zyg |
zygotene |
: |
break, in detailed system |
:: |
break and reunion, in detailed system |
; |
separates altered chromosomes and break points in structural rearrangements involving 2 or more chromosomes; separates probes on different derivative chromosomes |
→ |
from-to, in detailed system |
+ |
additional normal or abnormal chromosomes; increase in length; locus present on a specific chromosome |
− |
loss; decrease in length; locus absent from a specific chromosome |
~ |
intervals and boundaries of a chromosome segment or number of chromosomes, fragments, or markers |
<> |
angle brackets for ploidy |
[] |
square brackets for number of cells or genome build |
= |
number of chiasmata |
× |
multiple copies of rearranged chromosomes |
? |
questionable identification of a chromosome or chromosome structure |
/ |
separates clones or contiguous probes |
// |
separates chimeric clones |
a Adapted from McGowan-Jordan et al,6 with permission of S Karger AG.
Single-letter abbreviations combined with other abbreviations are set closed up:
chte chromatid exchange
Three-letter symbols combined are set with a space:
cht del |
chromatid deletion |
psu dic |
pseudodicentric |
Chromosome rearrangement terms can be written using a short system or short form. Complex abnormalities are designated by the more specific detailed system or long form. The detailed form uses symbols such as arrows to describe individual derivative chromosomes that result from complex rearrangements (even the short system can result in a complex expression). For example:
Short: 46,XY,t(2;5)(q21;q31)
Long: 46,XY,t(2;5)(2pter→2q21::5q31→5qter;5pter→5q31::2q21→2qter)
The complete nomenclature, formulated for consistency in the description of chromosomal rearrangements, is detailed in ISCN 2016.6 The following sections contain terms that illustrate some of the basic principles of the ISCN. Terms such as these may stand alone or may be part of longer expressions such as those previously listed.
14.6.4.3.1 Order.
For aberrations that involve more than 1 chromosome, the sex chromosome appears first, then other chromosomes in numerical order (or, less commonly, in group order if only the group is specified).
t(X;13)(q27;q12) translocation involving bands Xq27 and 13q12
For 2 breaks in the same chromosome, the short arm precedes the long arm, and there is no internal punctuation:
inv(2)(p21q31) inversion in chromosome 2
Exceptions to numerical order convey special conditions; for example, when a piece of one chromosome is inserted into another (3-break rearrangement), the recipient chromosome precedes the donor:
ins(5;2)(p14;q21q31) insertion of portion of long arm of chromosome 2 into short arm of chromosome 5
14.6.4.3.2 Plus and Minus Signs.
A plus sign preceding a chromosome indicates addition of the entire chromosome:
+14 entire chromosome 14 gained
A plus sign following p or q and the chromosome number indicates an addition to that chromosome:
14p+ addition to 14p
Such a term is ambiguous; it might refer to one of many possible specific additions to 14p of an individual karyotype, to an unknown addition to 14p, or to additions to 14p in general. A term such as 14p+ may be used after context has been provided. In the case of karyotype descriptions, this means using more specific terms that incorporate symbols, such as add, der, and ins:
Shorter Term: 14p+ |
Karyotype term: add(14)(p13) |
Shorter Term: 14q+ |
Karyotype term: add(14)(q32) |
For example:
The 14q+ cytogenetic abnormality was found to be add(14)(q32).
A minus sign preceding a chromosome signifies loss of the entire chromosome:
−5 all of chromosome 5 missing
A minus sign following a chromosome arm signifies loss from that arm, but this should be reserved for text, whereas more specific notation is used in karyotype descriptions. For example:
Text |
Karyotype |
5q− |
del(5)(q13q31) |
A deletion of the entire long arm of a chromosome should not be expressed in text with a minus sign.
del(5q) (not 5q−)
Use more specific terms in karyotypes.
14.6.4.3.3 Punctuation.
■Parentheses: The number of the affected chromosome follows the rearrangement symbol in parentheses:
inv(2) inversion in chromosome 2
Details of the aberration follow in a second set of parentheses:
inv(2)(p13p24) inversion in chromosome 2 involving bands 13 and 24 of the short arm
■Semicolon: In structural rearrangements that involve 2 or more chromosomes, a semicolon is used:
t(2;5)(q21;q31) translocation involving breaks at 2q21 and 5q31
■Comma: Commas separate the chromosome number, sex chromosomes, and each term describing an abnormality:
46,XX,r(18)(p11q22) female karyotype with ring chromosome 18 with ends joined at bands p11 and q22
14.6.4.3.4 Underlining.
In different clones within the same karyotype, an underline (underscore) distinguishes homologous aberrations of the same chromosome (eg, 2 homologous chromosome 1s):
46,XX,der(1)t(1;3)(p34;q21)/46,XX,der(1)t(1;3)(p34;q21)
In manuscripts, authors should indicate that the underline is intended, so that it will not be set as italics, per typographic convention, in the published version.
14.6.4.3.5 Or.
The word or indicates “alternative interpretations of an aberration”6(p48) or alternative results (for instance, breaks that appear in consecutive bands using different techniques):
add(19)(p13 or q13)
add(10)(q22 or q23)
14.6.4.3.6 Spacing.
As seen in previous examples, there is no spacing between the elements of a karyotype description (except after mos and chi, between 2 or more 3-letter abbreviations [eg, cht del, rev ish enh], and before and after “or”).
14.6.4.3.7 Long Karyotypes.
Multiline karyotypes carry over from 1 line of text to the next with no punctuation other than that of the original expression (eg, no hyphen at the end of the first line), as in the following tumor karyotype:
46,XX,t(8;21)(q22;q22)[12]/45,idem,−X[19]/46,idem,
−X,+8[5]/47,idem,−X,+8,+9[8]
14.6.4.4 In Situ Hybridization.
Style for terms that describe karyotypes identified by means of this technique alone or along with cytogenetic analysis (traditional karyotyping techniques) is similar to that described above (see 14.6.1, Nucleic Acids and Amino Acids). Some symbol meanings may differ. Table 14.6-18 is adapted from ISCN 2016.6
Table 14.6-18. In Situ Hybridization Abbreviations and Symbolsa
Term |
Explanation |
amp |
amplified signal |
arr |
microarray |
cgh |
comparative genomic hybridization |
con |
connected signals |
dim |
diminished |
enh |
enhanced |
fib ish |
extended chromatin/DNA fiber in situ hybridization |
ish |
in situ hybridization |
nuc ish |
nuclear or interphase in situ hybridization |
pcp |
partial chromosome paint |
rev ish |
reverse in situ hybridization |
sep |
separated signals |
subtel |
subtelomeric region |
wcp |
whole chromosome paint |
; |
separates altered chromosomes and break points in structural arrangements that involve >1 chromosome; separates probes on different derivative chromosomes |
. |
[period] separates various techniques |
+ |
additional normal or abnormal chromosomes; increase in length; locus present on a specific chromosome |
++ |
2 hybridization signals or hybridization regions on a specific chromosome |
− |
loss; decrease in length; locus absent from a specific chromosome |
× |
multiple copies of rearranged chromosomes; aberrant polyploidy clones in neoplasias; precedes number of signals seen; multiple copies of a chromosome or chromosomal region |
a Adapted from McGowan-Jordan et al,6 with permission of S Karger AG.
Examples are as follows:
46,XY.ish del(22)(q11.2q11.2)(D22S75−)
47,XY,+mar.ish der(8)(D8Z1+)
(D22S75 refers to the probe for the DNA segment sequence D22S75; see 14.6.2, Human Gene Nomenclature.)
14.6.4.5 Marker Chromosomes, Derivative Chromosomes, and the Philadelphia Chromosome.
A marker chromosome “is a structurally abnormal chromosome that cannot be unambiguously identified or characterized by conventional banding cytogenetics”6(p70) and might be included in a karyotype as shown below:
47,XX,+mar
A structurally abnormal chromosome in which any part can be recognized is considered a derivative chromosome, defined as “a structurally rearranged chromosome generated either by a rearrangement involving two or more chromosomes or by multiple aberrations within a single chromosome.”6(p60)
A derivative chromosome is specified in parentheses, followed by the aberrations involved in the generation of the derivative chromosome. The aberrations are not separated by a comma. For instance,
der(1)t(1;3)(p32;q21)t(1;11)(q25;q13)
signifies a derivative chromosome 1 generated by 2 translocations, one involving the short arm with a break point in 1p32 and the other involving the long arm with a breakpoint in 1q25.
For example, Philadelphia chromosome is the name given to a particular derivative chromosome found in chronic myelogenous leukemia and some types of acute leukemia. The Philadelphia chromosome can be abbreviated as Ph chromosome or, if clear in context, Ph. Appendages, as in Ph1, Ph1, Ph1, or Ph′, are not necessary, and Ph is the preferred form. The Ph chromosome is the derivative chromosome 22 that results from the translocation t(9;22)(q34;q11.2) and may be described as follows:
der(22)t(9;22)(q34;q11.2)
The Ph chromosome is the result of a rearrangement that juxtaposes the oncogene ABL with the breakpoint cluster region gene BCR (see 14.6.2, Human Gene Nomenclature, and 14.6.3, Oncogenes and Tumor Suppressor Genes).
Principal Author: Cheryl Iverson, MA
Acknowledgment
Thanks to the following for reviewing and providing comments: W. Gregory Feero, MD, PhD, JAMA, and Maine-Dartmouth Family Medicine Residency, Augusta; Trevor Lane, MA, DPhil, Edanz Group, Fukuoka, Japan; and John J. McFadden, MA, JAMA Network. Thanks also to David Song, JAMA Network, for obtaining permissions.
References
1.Nussbaum RL, McInnes RR, Willard HF. Thompson & Thompson Genetics in Medicine. 8th ed. Saunders; 2016.
2.Turnpenny PD, Ellard S. Emery’s Elements of Medical Genetics. 14th ed. Churchill Livingstone; 2012.
3.Riegel M. Human molecular cytogenetics: from cells to nucleotides. Genet Mol Biol. March 2014:37(suppl 1):194-209.
4.Feuk L, Carson AR, Scherer SW. Structural variation in the human genome. Nat Rev Genet. 2006;7(2):85-97. doi:10.1038/nrg1767
5.O’Connor C. Human chromosome translations and cancer. Nature Educ. 2008;1(1):56.
6.McGowan-Jordan J, Simons A, Schmid M, eds. ISCN 2016: An International System for Human Cytogenetic Nomenclature (2016). S Karger AG; 2016.
7.Bredel M, Scholtens DM, Harsh GR, et al. A network model of a cooperative genetic landscape in brain tumors. JAMA. 2009;302(3):261-275. doi:10.1001/jama.2009.99
8.Genome Data Viewer. Accessed July 31, 2019. https://www.ncbi.nlm.nih.gov/genome/gdv/
14.6.5 Nonhuman Genetic Terms.
Comparative genome analysis has shown that eukaryote species share genes to a great extent.1 Therefore, similar or identical names designate the same gene across species whenever possible. Italicization of gene symbols is uniformly observed.
14.6.5.1 Vertebrates.
Animal gene symbols resemble human gene symbols (see 14.6.2, Human Gene Nomenclature).2,3 However, unlike human gene symbols, animal gene symbols typically use or include lowercase letters and punctuation marks.
Gene terms for the laboratory mouse (Mus musculus domesticus) and laboratory rat (Rattus norvegicus), often seen in medical publications because of the common use of those species in investigating diseases that affect humans, are prototypic of such style.
14.6.5.1.1 Mouse and Rat Gene Nomenclature.
Mouse and rat gene nomenclature guidelines were unified in 2003 by the International Committee on Standardized Genetic Nomenclature for Mice and Rat Genome and Nomenclature Committee.4
Mouse and rat gene symbols resemble human symbols in several respects.4,5 They are descriptive, short (typically 3-5 characters), and italicized. Symbols begin with letters not numbers. They contain roman letters in place of Greek letters and arabic numerals in place of roman numerals.
Mouse and rat gene symbols differ from human symbols in the use of lowercase letters. Symbols usually contain an initial capital. Capital letters within a mouse gene symbol may indicate the laboratory code (see 14.6.5.1.4, Laboratory Codes) or code for another species/vector. A symbol with all lowercase letters (ie, no initial capital) indicates a recessive trait. Mouse and rat gene symbols may contain hyphens and other punctuation.
The central source for mouse gene terms is the Mouse Genome Database,6 and for rats, RATMAP: Rat Genome Database3 (Box 14.6-1). Gene names and symbols may be verified by means of the search features at those sites.
Box 14.6-1. Resources/Websites for Nonhuman Species
Website (reference) |
URL |
Description |
ArkDb2 |
Now closed. See Hu et al2 |
General genomics and proteomics databases: resources for human, goat, mouse, deer, rat, and horse genomes |
RATMAP: Rat Genome Database3 |
https://rgd.mcw.edu/ |
Genetic, genomic, phenotype, and disease data generated from rat research; also provides access to corresponding human and mouse data for cross-species comparisons |
MGI: Mouse Genome Informatics6 |
www.informatics.jax.org |
Official names for mouse genes, alleles, and strains |
FlyBase10 |
http://flybase.org |
Database of Drosophila genes and genomes |
WormBase12 |
https://www.wormbase.org/ |
Genetics, genomics, and biology of Caenorhabditis elegans and related nematodes |
OMIA13 |
https://omia.org/ |
Catalog/compendium of inherited disorders, other traits, and genes in animal species other than human, mouse, and rat |
SGD15 |
https://www.yeastgenome.org/nomenclature-conventions |
Comprehensive integrated biological information for the budding yeast Saccharomyces cerevisiae |
Entrez Genomes18 |
https://www.ncbi.nlm.nih.gov/genome |
More than 3000 completely sequenced organisms, including Archaea, bacteria, eukaryotes, viruses, viroids, and plasmids |
Maize Genetics and Genomics Database21 |
https://maisegdb.org/ |
Federally funded informatics service to researchers focused on the crop and plant and model organism Zea mays |
Rice Genome Annotation Project22 |
rice.plantbiology. msu.edu/ |
National Science Foundation—sponsored database that provides sequence and annotations for the rice genome |
SoyBase and the Soybean Breeder’s Toolbox23 |
https://soybase.org/ |
Repository for genetics, genomics, and related data sources for the soybean |
Style rules and conventions for mouse and rat gene symbols are given in Tables 14.6-19 through 14.6-21. (Note: The gene descriptions in the tables that follow are based on but not identical to the approved gene names available in the Mouse Genome Informatics database,7 which are more complete and do not use Greek letters and other typographic variants. For instance, in searching for a term with α online, one would type “alpha.”) Note that a given letter or letter combination often but not always signifies conventional usage. For instance, l at or near the end of a symbol often, but not always, indicates “like.” Mammalian Orthology Markers (OrthoMaM),8 a database of orthologous mammalian markers, allows comparative searches of more than 40 vertebrate species. It can be queried to better understand the evolutionary dynamics of genes.
Table 14.6-19. Style Rules for Mouse Gene Symbols and Comparison With Human Gene Symbols (Examples)
Mouse gene symbol |
Mouse gene description |
Rule illustrated |
Human gene symbol (when known) |
a |
nonagouti |
lowercase initial capital because named for mutant recessive trait |
ASIP |
Afp |
α-fetoprotein |
initial capital, otherwise lowercase, Greek letter changed to roman |
AFP |
B2m |
β2-microglobulin |
no subscript |
B2M |
Gla |
α-galactosidase |
Greek letter changed to roman and moved to end of symbol |
GLA |
Gt(ROSA)26Sor |
gene trap, ROSA 26, Philippe Sorianoa |
parentheses may be used |
|
Rn4.5s |
4.5S RNA |
period permissible |
|
Rn5s |
5S RNA |
symbol does not begin with number |
RN5S1@ (@ signifies gene family; see 14.6.2, Human Gene Nomenclature) |
a The eponymous naming of genes is not uncommon.
Table 14.6-20. Examples of Mouse Gene Symbols Compared With Human Gene Symbols
Mouse gene symbol |
Mouse gene description |
Convention illustrated |
Human gene symbol (when available) |
Brca1 |
breast cancer 1 |
same as human symbol except for case |
BRCA1 |
Cafq1 |
caffeine metabolism QTL 1 |
q: quantitative locus |
|
C4bp-ps1 |
complement component 4 binding protein, pseudogene 1 |
-ps: pseudogene |
C4BPB |
D10Mit1 |
DNA segment, Chr 10, Massachusetts Institute of Technology 1 |
symbol for DNA segment identified only in the mouse; includes laboratory code (see 14.6.5.1.4, Laboratory Codes) |
|
D17H21S56 |
DNA segment, Chr 17, human D21S56 |
H21 indicates DNA segment resides on human chromosome 21 |
D21S56 |
G6pdx |
glucose-6-phosphate dehydrogenase X-linked |
similar but not identical to human gene symbol |
G6PD |
Gna-rs1 |
guanine nucleotide binding protein, related sequence 1 |
-rs: related sequence |
GNL1 |
Gtl10 |
gene trap locus 10 |
Gt: gene trap |
|
Gt(ROSA)26Sor |
gene trap ROSA 26, Philippe Soriano |
vector in parentheses; laboratory code indicated (see 14.6.5.1.4, Laboratory Codes) |
|
H2-Aa |
histocompatibility 2, class II antigen A, α |
HLA-DQA1 |
|
Hbb |
hemoglobin β-chain complex |
same as human symbol except for case |
HBB |
Hc9 |
heterochromatin, Chr 9 |
Hc: heterochromatin |
|
Hras1 |
Harvey rat sarcoma virus oncogene 1 |
see 14.6.3, Oncogenes and Tumor Suppressor Genes |
HRAS |
Ighmbp2 (formerly nmd) |
immunoglobulin heavy chain μ binding protein 2 (formerly neuromuscular degeneration) |
name change with new information about gene |
IGHMBP2 |
l17Wis9 |
lethal, Chr 17, University of Wisconsin 9 |
initial l: lethal |
|
Lamb1-1 |
β1 laminin, subunit 1 |
hyphen separates 2 adjacent numbers |
LAMB1 |
Lzp-s |
P lysozyme structural |
s: structural |
|
mt-Rnr1 |
12S RNA, mitochondrial |
mt: mitochondrial |
MT-RNR1 |
Mcptl |
mast cell protease—like |
l: like |
|
Nidd1, Nidd2, Nidd3, Nidd4 |
non—insulin-dependent diabetes mellitus 1, 2, 3, 4 |
same stem (root) for gene families |
|
Nup160 |
nucleoporin 160 |
name change (formerly Gtl1-13) |
NUP160 |
Rnr13 |
rRNA, chromosome 13 cluster |
||
Tcrb |
T-cell receptor β-chain |
TRB@ (formerly TCRB; @ signifies gene family or cluster; see 14.6.2, Human Gene Nomenclature) |
|
Tel10p |
telomeric sequence, Chr 10, centromere end |
Tel: telomere; 10: Chr 10; p: short arm |
|
Tg(APOE)1Vln |
transgene insertion 1, Fred Van Leuven |
Tg: transgene; parenthetic material: inserted gene, in this case the human gene APOE; Vln: founder or “laboratory of” designation |
Table 14.6-21. Conventions for Mouse Gene Symbols Identified in Collaborative Sequencing Efforts (Examples)a
Mouse gene symbol |
Mouse gene description |
Convention illustrated |
Human gene symbol (when available) |
0610005C13Rik |
RIKEN cDNA 0610005C13 gene |
RIKEN symbol assigned to sequence that does not match known genes in other species; Rik: RIKEN Institute, Japan |
|
Cdc42ep3 |
CDC42 effector protein (rho GTPase binding) 3; formerly 3200001F04Rik |
RIKEN symbol changed when gene identified in another organism |
CDC42EP3 |
BC023055 |
cDNA sequence BC023055 |
BC indicates sequence from Mammalian Gene Collection of the National Institutes of Health |
C10orf83 |
Aldob |
aldolase 2, B isoform, formerly BC016435 |
Mammalian Gene Collection symbol changed when gene identified in another organism |
ALDOB |
AF179933 |
cDNA sequence AF179933 |
GenBank symbol for genes with no other information available in other organisms or sequencing efforts |
|
Ppt2 |
palmitoyl-protein thioesterase 2, formerly AA672937 and 0610007M19Rik |
GenBank sequence ID withdrawn when gene identified in other organism |
PPT2 |
a See Database Identifiers for Genomic Sequences in 14.6.1, Nucleic Acids and Amino Acids.
14.6.5.1.2 Mouse Alleles.
A mouse allele symbol consists of a mouse gene term often, but not always, with a superscript. As with mouse gene terms, mouse allele terms are italicized.
Allele symbols can be verified within the records of a mouse gene:
■Search for the gene symbol at http://www.informatics.jax.org/marker
■Select the link for the gene symbol that has been located
■Under Phenotypes, select Phenotypic Diseases
Conventions and rules for mouse allele symbols are shown in Table 14.6-22.
Table 14.6-22. Rules and Conventions for Mouse Allele Terms (Examples)
Allele symbol |
Allele name |
Convention or rule illustrated |
abn |
abnormal |
recessive trait, thus begins with lowercase; because there is no superscript indicating an allelic term, use context to clarify |
Dbf |
doublefoot |
dominant trait, thus begins with capital; because there is no superscript indicating an allelic term, use context to clarify |
Dnahc11iv |
situs inversus viscerum allele of dynein, axon, heavy chain 11 gene |
allele superscript designation is lowercase (recessive) |
Ins2Akita |
Akita allele of insulin 2 gene |
allele superscript designation has initial capital (dominant) |
Lama2dy-2J |
dystrophia muscularis allele, Jackson 2, of α2-laminin gene (second allele discovered at the Jackson Laboratory) |
laboratory code included in superscript (see 14.6.5.1.4, Laboratory Codes); hyphens used |
MatpUw-dbr |
underwhite dominant brown alleles of membrane-associated transporter protein gene |
multiple alleles separated by hyphen in superscript |
In a phenotype expression, a superscript plus sign indicates wild type, for example,
Nf1tm1Fcr/Nf1+
which indicates a phenotype with a mutant neurofibromatosis allele (targeted mutation 1, Fredrick Cancer Research and Development Center) and the wild-type neurofibromatosis allele.
14.6.5.1.3 Mouse Chromosomes.
Chromosome nomenclature is similar for mice and humans (see 14.6.4, Human Chromosomes). However, in mice, rearrangement terms are capitalized. The following listing and subsequent examples are from the International Committee on Standardized Genetic Nomenclature for Mice4:
Cen |
centromere |
Del |
deletion |
Df |
deficiency |
Dp |
duplication |
Hc |
pericentric heterochromatin |
Hsr |
homogeneous staining region |
In |
inversion |
Is |
insertion |
MatDf |
maternal deficiency |
MatDi |
maternal disomy |
MatDp |
maternal duplication |
Ms |
monosomy |
Ns |
nullisomy |
PatDf |
paternal deficiency |
PatDi |
paternal disomy |
PatDp |
paternal duplication |
Rb |
robertsonian translocation |
T |
translocation |
Tc |
transchromosomal |
Tel |
telomere |
Tet |
tetrasomy |
Tg |
transgenic insertion |
Tp |
transposition |
Ts |
trisomy |
UpDf |
uniparental deficiency |
UpDi |
uniparental disomy |
UpDp |
uniparental duplication |
As with human chromosomes, lowercase p represents the short arm and lowercase q the long arm. When specific chromosomes are referred to, the word Chromosome is capitalized (and abbreviated Chr after first mention), for example:
Human chromosome 1 shows extensive homology to several mouse chromosomes, especially Chromosome (Chr) 4 and Chr 1.
Chromosome anomaly symbols usually include a unique laboratory code (see 14.6.5.1.4, Laboratory Codes) and a series number, for example:
In5Rk |
fifth inversion found by Roderick |
T37H |
37th translocation found at Harwell |
Chromosome number appears in parentheses:
In(2)5Rk inversion in Chr 2
Semicolons separate numbers of chromosomes involved in translocations:
T(4;X)37H translocation involving Chr 4 and Chr X
Periods indicate the centromere in robertsonian translocations:
Rb(9.19)163H robertsonian translocation that involves Chr 9 and Chr 19
In insertions, the donor chromosome number comes first:
Is(7;1)40H insertion from Chr 7 to Chr 1
For further rules and conventions for chromosomes, see the Chromosome Nomenclature section of the Mouse Genome Informatics website.4
14.6.5.1.4 Laboratory Codes.
Laboratory registration codes appear as 1- to 5-letter symbols in animal genetic terms, including chromosomal, DNA locus, and mouse strain nomenclature (see below). Such codes help identify specific colonies, useful in genetic studies that can extend over many generations. Laboratory codes are registered with the Institute of Laboratory Animal Research at the National Academy of Sciences in Washington, DC.9 These codes uniquely identify an investigator, laboratory, or institution that produces or maintains an animal strain. Laboratory codes have initial capitals and appear without expansion. Examples are as follows:
Arb |
Arthritis and Rheumatism Branch, National Institute of Arthritis and Musculoskeletal and Skin Diseases |
Ddd |
University of Durham, Drug Dependence Group |
J |
The Jackson Laboratory |
Jr |
John Rapp |
Kyo |
Kyoto University |
Maar |
Silvère van Maarel Leiden University Medical Center |
McW |
Medical College of Wisconsin |
N |
National Institutes of Health |
Ty |
Benjamin A. Taylor, The Jackson Laboratory |
Wil |
Jean Wilson, University of Texas |
14.6.5.1.5 Mouse Strains.
Mouse strain names6 are registered at the Mouse Genome Informatics website. Mouse strain names are available at the International Committee Standardized Genetic Nomenclature for Mice database.4 (Rat strain names are registered at the Rat Genome Database.3)
Mouse strain names consist of capital letters or combinations of capital letters and numbers:
A
BXH
CBA
C57BL
FVB
HDA32
A few earlier strains have names that are entirely numeric, for example:
129
A substrain is indicated by a term following the strain name after a virgule, usually the laboratory registration codes (see above), for example:
129/J
A/J
atherosclerosis in CBA/J mice
FVB/N mice used as controls
A serial number may precede the laboratory code, such as the 10 before the J in this example:
C57BL/610J
(Note: The 6 belongs to the substrain name.)
Exceptions to the initial capital after the virgule exist in the case of 2 well-known strains (not substrains) of mouse:
BALB/c
C57BR/cd
Many standard laboratory mouse strains are derived from crosses dating back to the early 20th century or even older lines, and the names reflect abbreviations for characteristics:
A |
albino |
BALB |
Bagg, albino |
DBA |
dilute, brown, nonagouti |
However, mouse strain names are not expanded.
Strain names may be abbreviated using approved abbreviations, for example:
B |
C57BL |
C |
BALB/c |
Note that some abbreviations are the same as some names of different strains (eg, the strain C and the abbreviation C), so context must clarify. Additional abbreviations are available at the International Committee on Standardized Genetic Nomenclature for Mice and Rat Genome and Nomenclature Committee.4
Abbreviations and the letter X are used to indicate recombinant inbred strains (female parental strain first), for example:
CXB BALB/c x C57BL
Capital F followed by a number in parentheses may appear after a strain designation to indicate the number of inbred generations:
F(20) 20 inbred generations
For further guidelines on mouse strain nomenclature, see the Mouse Genome Informatics website.4
14.6.5.2 Invertebrates.
14.6.5.2.1 Drosophila melanogaster.
Gene symbols for the fruit fly Drosophila melanogaster are generally capital and lowercase and, for recessive phenotypes, all lowercase. This convention is also observed for gene names. Gene symbols may include punctuation.10 Nomenclature rules and symbol search are available at FlyBase10 (Box 14.6-1). Examples are as follows:
Ppi |
Preproinsulinlike |
SerT |
Serotonin transporter |
su(Hw) |
Suppressor of Hairy wing |
tRNA:S7:23Ea |
Transfer RNA:ser7:23Ea (ser7: seventh isoform of serine; 23E: map position) |
As with mouse alleles, Drosophila alleles are indicated with superscripts:
Hnr, Hnr2 (Henna gene, eye color—defective alleles)
14.6.5.2.2 Caenorhabditis elegans.
The gene symbols for this nematode (roundworm) (Box 14.6-1) consist of 3 lowercase letters, a hyphen, an arabic numeral (sometimes a decimal), and, sometimes, a roman numeral after a space11,12:
dpy-1
dpy-5 I
let-37 X
sir-2.1
Parentheses indicate mutation in the gene:
let-37(mn138)
Mutation symbols consist of 1- or 2-letter terms plus a number:
mn138
A characteristic of a mutation may be indicated by a 2-letter ending set in roman type:
hc17 ts (ts: temperature sensitive)
14.6.5.3 Online Mendelian Inheritance in Animals.
Online Mendelian Inheritance in Animals (OMIA) (Box 14.6-1) is the counterpart to Online Mendelian Inheritance in Man (OMIM; see 14.6.2, Human Gene Nomenclature)13,14 and includes a database of inherited disorders, other traits, and genes in animal species other than humans, mice, and rats.
14.6.5.4 Microorganism Gene Nomenclature.
14.6.5.4.1 Yeasts.
Gene symbols for the fungus Saccharomyces cerevisiae (Box 14.6-1) consist of 3 capital letters plus a number (or, occasionally, a number-letter) ending,15 for example:
ACT1 |
actin |
CDC25 |
adenylate cyclase regulatory protein |
COX5A |
cytochrome c oxidase chain Va |
This represents a change from earlier style in which all-lowercase symbols were used for loci named for recessive mutations and all-capital symbols for loci named for dominant mutations. Allele symbols still follow the case convention (ie, capital for dominant, lowercase for recessive).
14.6.5.4.2 Bacterial Gene Nomenclature.
Gene terms typically consist of an italicized lowercase 3-letter abbreviation often with an uppercase locus designator. The phenotype or encoded entity (eg, enzyme) is in all roman letters with an initial capital.16,17 See examples below.
araA |
AraA (L-arabinose isomerase) |
asr |
Asr (acid shock protein) |
imp (formerly ostA) |
OstA (organic solvent intolerance; imp: increased membrane permeability) |
katE |
KatE (catalase) |
soda |
SodA (superoxide dismutase, manganese) |
sodB |
SodB (superoxide dismutase, iron) |
The genetic nomenclature for bacteriophages is different from that for bacteria; there may be a separate convention for each phage.17
A number of bacterial genome databases are available on the internet. The National Center for Biotechnology Information sponsors Entrez Genomes18 (select Gene, then search for the gene in question) (Box 14.6-1).
Alleles are designated with a number after the uppercase letter or following a hyphen, when not assigned to a locus. Wild-type alleles are designated with a superscript plus sign, mutant phenotypes with a superscript minus sign:
ara+
araA1
ara-23
sodA1
14.6.5.4.3 Retroviral Gene Nomenclature.
HIV and other retroviruses contain 3 main structural genes and a number of regulatory genes19 (see 14.6.3, Oncogenes and Tumor Suppressor Genes):
Structural: |
|
env |
envelope gene |
gag |
group-specific core antigen gene |
pol |
polymerase gene |
Regulatory: |
|
nef |
negative factor |
rev |
regulator of viral protein expression |
tat |
transactivator of viral transcription |
vif |
viral infectivity |
vpr |
viral protein R |
vpu |
viral protein U |
vpx |
viral protein X |
Compare typographic style (Table 14.6-23) of gene names and their products (p stands for protein, gp for glycoprotein).
Table 14.6-23. Some Examples of Typographic Style of Gene Names and Their Products
Gene |
Gene product (protein or polypeptide) |
Protein products (examples)a |
env |
Env |
gp41, gp120 |
gag |
Gag |
p6, p7, p17, p24 |
pol |
Pol |
p12, p32, p66/51 |
nef |
Nef |
p27 |
rev |
Rev |
p19 |
tat |
Tat |
p14 |
vif |
Vif |
p24 |
vpr |
Vpr |
p15 |
vpu |
Vpu |
p16 |
vpx |
Vpx |
p14 |
a A helpful resource for protein nomenclature is UniProt,20 a central resource for functional information on proteins, including amino acid sequence, protein name or description, taxonomic data, and citation information.
14.6.5.4.4 Plant Genetics.
Plants are extremely important food sources, and genetic alteration of plants is increasingly used to confer disease and pest resistance as well as to enhance the nutritional value of food crops. Such genetically modified organisms in food sources have generated controversy and relate to biomedicine. Included below are a few guidelines for 3 common food sources for which complete genome sequence data are available: corn (maize), rice, and soybeans.
■Corn: The name and symbol of the gene should be lowercase and italic, eg, defective kernel12, dek12. Note: There is no hyphen between the gene name and the numerical suffix.21
■Rice: A transcription unit, equivalent to a gene or locus, uses the naming scheme x.tyyyy, where x refers to the pseudomolecule assembly identifier and yyyy to the distinct identifier of the transcription unit.22
■Soybeans: The full locus identifier can be used as part of each gene name, or a locus name can be provided separately to describe a set of genes, for example:
We studied Glyma.01g123450 in genotype, assembly, and annotation version Glyma.Wm82.a2.v1.
Thereafter, the shorter locus name, Glyma.01g123450, may be used.23
Principal Author: Cheryl Iverson, MA
Acknowledgment
Thanks to the following for reviewing and providing comments: W. Gregory Feero, MD, PhD, JAMA, and Maine-Dartmouth Family Medicine Residency, Augusta; Trevor Lane, MA, DPhil, Edanz Group, Fukuoka, Japan; and Garth D. Ehrlich, PhD, Center for Advanced Microbial Processing, Drexel University College of Medicine, Philadelphia, Pennsylvania.
References
1.Gene Ontology Consortium. Accessed June 7, 2018. http://www.geneontology.org/
2.Hu J, Mungall C, Law A, et al. The ARKdb: genome databases for farmed and other animals. Nucleic Acids Res. 2001;29(1):106-110. doi:10.1093/nar/29.1.106
3.RatMapGroup. RATMAP: Rat Genome Database. Accessed June 7, 2018. https://rgd.mcw.edu/
4.International Committee on Standardized Genetic Nomenclature for Mice and Rat Genome and Nomenclature Committee. Guidelines for nomenclature of mouse and rat strains. Revised January 2016. Accessed July 31, 2019. www.informatics.jax.org/mgihome/nomen/strains.shtml
5.Maltais LJ, Blake JA, Chu T, Lutz CM, Eppig JT, Jackson I. Rules and guidelines for mouse gene, allele, and mutation nomenclature: a condensed version. Genomics. 2002;79(4):471-474. doi:10.1006/geno.2002.6747
6.Jackson Laboratory. MGI: Mouse Genome Informatics. Updated May 29, 2018. Accessed July 31, 2019. www.informatics.jax.org
7.Mouse Genome Informatics. Mammalian Orthology Query Form. Accessed August 5, 2019. http://www.informatix.jax.org
8.The OrthoMaM (Orthologous Mammalian Markers) database. April 2015. Accessed August 5, 2019. http://www.orthomam.univ-montp2.fr/orthomam/html/index.php
9.ILAR: Institute for Laboratory Animal Research. International Laboratory Code Registry. Accessed August 5, 2019. http://dels.nas.edu/global/ilar/Lab-Codes
10.FlyBase: a database of Drosophila genes & genomes. Released May 3, 2018. Accessed July 31, 2019. http://flybase.org
11.C elegans genetic nomenclature basics. Last modified March 5, 2014. Accessed August 5, 2019. http://home.sandiego.edu/~cloerlab/nomenclature.html
12.WormBase. Last edited June 4, 2018. Accessed July 31, 2019. https://www.wormbase.org
13.Nicholas F. Online Mendelian Inheritance in Animals (OMIA). Updated May 31, 2018. Accessed July 31, 2019. http://omia.org/
14.Rangel P, Giovannetti J. Genomes and Databases on the Internet: A Practical Guide to Functions and Applications. Horizon Scientific Press; 2002.
15.SGD gene nomenclature conventions. Accessed July 31, 2019. https://www.yeastgenome.org/nomenclature-conventions
16.Demerec M, Adelberg EA, Clark AJ, Hartman PE. A proposal for a uniform nomenclature in bacterial genetics. Genetics. 1966;54(1):61-76.
17.Journal of Bacteriology instructions to authors. Updated January 2019. Accessed August 5, 2019. https://jb.asm.org/sites/additional-assets/JB-ITA.pdf
18.National Center for Biotechnology Information (NCBI). Entrez Genomes. Accessed July 31, 2019. https://www.ncbi.nlm.nih.gov/genome
19.Collins DR, Collins KL. HIV-1 accessory proteins adapt cellular adaptors to facilitate immune evasion. PLoS Pathogens. Published January 23, 2014. doi:10.1371/journal.ppat.1003851
20.UniProt. Updated 2018. Accessed June 7, 2018. www.uniprot.org
21.Maize Genetics and Genomics Database. Updated May 8, 2018. Accessed July 31, 2019. https://www.maizegdb.org/
22.Rice Genome Annotation Project. Accessed July 31, 2019. rice.plantbiology.msu.edu/
23.Soybase and the Soybean Breeder’s Toolbox. Accessed July 31, 2019. https://soybase.org/
14.6.6 Pedigrees.
Pedigree format recommendations are established by the Pedigree Standardization Task Force (now called the Pedigree Standardization Work Group) of the National Society of Genetic Counselors1,2 (see 5.8.3, Rights in Published Reports of Genetic Studies). The 2008 update recommends including on the pedigree the reason for referral (eg, abnormal findings on ultrasonography, family history of cancer).
A square represents a male individual; a circle, a female individual; and a diamond, an individual whose sex is not specified, a person with a congenital disorder of sex development, or a person who is transgender (Figure 14.6-6).2
Figure 14.6-6. Shapes Used to Represent an Individual in a Pedigree
Square indicates male; circle, female; and diamond, individual whose sex is not specified, a person with a congenital disorder of sex development, or a person who is transgender.
Shading indicates an affected individual (Figure 14.6-7). Partitions with different shading should be used for individuals with more than one condition. Define all shading in a legend or key.
Figure 14.6-7. Use of Shading in a Pedigree
Multiple individuals are indicated by a number inside the shape (Figure 14.6-8). For unknown number, a roman “n” is preferred to a question mark.
Figure 14.6-8. Indication of Number of Individuals in a Pedigree
A slash mark (Figure 14.6-9) indicates a deceased individual.
Figure 14.6-9. Indication of a Deceased Individual in a Pedigree
A pregnancy is indicated with a capital “P” inside the shape (Figure 14.6-10). Symbols would not be shaded unless the pregnancy was affected.
Figure 14.6-10. Indication of a Pregnancy in a Pedigree
The proband (the first affected family member who seeks medical attention) is indicated by a capital “P” with an arrow outside the shape (Figure 14.6-11).
Figure 14.6-11. Indication of the Proband in a Pedigree
The consultand (person seeking medical attention) is indicated with an arrow (Figure 14.6-12).
Figure 14.6-12. Indication of the Consultand (Person Seeking Medical Attention) in a Pedigree
Textual information appears below the individual symbol (Figure 14.6-13). Preferred order is age information, evaluation, and pedigree number.
Figure 14.6-13. Indication of Textual Information About an Individual in a Pedigree
An obligate carrier (ie, unaffected individual inferred by pedigree analysis to carry a trait) is indicated with a central dot (Figure 14.6-14).
Figure 14.6-14. Indication of an Obligate Carrier (Unaffected Individual Inferred by Pedigree Analysis to Carry a Trait) in a Pedigree
A small triangle indicates a pregnancy not carried to term (Figure 14.6-15). Sex, if known, is indicated with text. (Sex is often unknown, especially with miscarriages.) Shading is used as described above for affected individuals. The symbol should be shaded only if the cause of the abnormality is known, and the abnormality should be defined in the key or under the symbol.
Figure 14.6-15. Indication of a Pregnancy Not Carried to Term in a Pedigree
ECT indicates ectopic pregnancy. A slash indicates termination of pregnancy.
Stillborn individuals use full-sized shapes with SB in the caption (Figure 14.6-16).
Figure 14.6-16. Indication of Stillborn Individuals in a Pedigree
Partner relationships are indicated by a straight, horizontal line (Figure 14.6-17). It is preferred that the male partner be shown on the left.
Figure 14.6-17. Indication of Partner Relationships in a Pedigree
A vertical line (the line of descent) indicates the offspring (Figure 14.6-18).
Figure 14.6-18. Indication of the Line of Descent in a Pedigree
Siblings should appear in order of birth (oldest to the left), connected by lines as shown in Figure 14.6-19.
Figure 14.6-19. Indication of Siblings in a Pedigree
Offspring are indicated by vertical lines (Figure 14.6-20). Use of a shorter line to indicate a pregnancy not carried to term is no longer recommended because it is made redundant graphically by the use of a triangle for pregnancies not carried to term.
Figure 14.6-20. Indication of Offspring in a Pedigree
An ended relationship is indicated by a double slash (Figure 14.6-21).
Figure 14.6-21. Indication of an Ended Relationship in a Pedigree
Consanguinity (kinship because of common ancestry) is indicated by a double line (Figure 14.6-22), and the relationship should be noted (eg, first cousins, second cousins).
Figure 14.6-22. Indication of Consanguinity in a Pedigree
Two diagonal lines indicate twins; 3, triplets (Figure 14.6-23). A horizontal bar specifies monozygotic; no horizontal bar, dizygotic; and a question mark, unknown.
Figure 14.6-23. Indication of Twins or Triplets in a Pedigree
A horizontal bar specifies monozygotic; no horizontal bar, dizygotic; and a question mark, unknown.
No offspring is indicated by perpendicular lines; infertility, by perpendicular lines with a double horizontal line (Figure 14.6-24).
Figure 14.6-24. Indication of No Offspring or of Infertility in a Pedigree
Brackets indicate an adopted individual and dashed lines legal parentage, for example, adoptive parent (Figure 14.6-25).
Figure 14.6-25. Indication of an Adopted Individual and of Legal Parentage in a Pedigree
In pedigrees that show relationships defined by assisted reproductive technologies (Figure 14.6-26), D indicates donor (sperm or ovum) and S, surrogate carrier of the pregnancy.
Figure 14.6-26. Indication of Relationships That Are Defined by Assisted Reproductive Techniques in a Pedigree
Diagonal lines indicate other parental relationships (Figure 14.6-27).
Figure 14.6-27. Indication of Other Parental Relationships in a Pedigree
Haplotypes may be indicated with shaded rectangles below the individual (Figure 14.6-28). Meaning should be clarified by means of a key.
Figure 14.6-28. Indication of Haplotypes in a Pedigree
In a complete pedigree (Figure 14.6-29), generations are indicated on the left by roman numerals. See Bennett3 for more examples of complete pedigrees.
Figure 14.6-29. Example of a Complete Pedigree, With Generations Indicated on the Left by Roman Numerals
14.6.6.1 Deidentification of Pedigrees.
As noted in 5.8.3, Rights in Published Reports of Genetic Studies, the rules for ethical approval of studies, obtaining informed consent, and protecting patients’ rights to privacy in scientific publication also apply to genetic studies of family pedigrees. If appropriate consent cannot be obtained from those identified in pedigree charts, nonessential identifying information can be removed or not presented. However, data should not be altered or “scrambled” in an attempt to protect the identities of individuals or family members, although relevant information may be masked. As noted in 5.8.3, Rights in Published Reports of Genetic Studies, for example, in pedigree charts, diamonds or another sex-neutral symbol can be used instead of squares or circles if the sex of family members is not essential to the report (eg, if the disease or condition is known not to be sex linked), or sections of pedigrees may be excluded from pedigree charts or not described in detail if appropriate consent could not be obtained, as long as such omissions are noted.
Principal Author: Cheryl Iverson, MA
Acknowledgment
Thanks to the following for reviewing and providing comments: Robin L. Bennett, MS, CGC, Division of Medical Genetics, University of Washington, Seattle; Trevor Lane, MA, DPhil, Edanz Group, Fukuoka, Japan; and John J. McFadden, MA JAMA Network. Thanks also to Karen Bucher, JAMA, for preparing the illustrations.
References
1.Bennett RL, Steinhause KA, Uhrich SB, et al. Recommendations for standardized human pedigree nomenclature. Am J Hum Genet. 1995;56(3):745-752. Also published in J Genet Counseling. 1995;4(4):267-279.
2.Bennett RL, French KS, Resta RG, Doyle DL. Standardized human pedigree nomenclature: update and assessment of the recommendations of the National Society of Genetic Counselors. J Genet Counseling. 2008;17:424-433. doi:10.1007/s10897-008-9169-9
3.Bennett RL. Handy reference tables of pedigree nomenclature. In: The Practical Guide to the Genetic Family History. Wiley-Liss Inc; 1999.