Genetics - Nomenclature

AMA Manual of Style - Stacy L. Christiansen, Cheryl Iverson 2020

Genetics
Nomenclature

14.6.1 Nucleic Acids and Amino Acids.

Standards for molecular nomenclature are set jointly by the International Union of Biochemistry and Molecular Biology (IUBMB) and the International Union of Pure and Applied Chemistry (IUPAC).1 The recommendations in this section are based on conventions put forth by the IUBMB-IUPAC Joint Commission on Biochemical Nomenclature and the Nomenclature Committee of the IUBMB.2

14.6.1.1 DNA.

The nucleic acids DNA and RNA are nucleotide polymers. DNA is the molecule forming the substrate for the genetic code and is contained in the chromosomes of higher organisms. It is made up of (1) molecules called bases, (2) the sugar 2-deoxyribose, and (3) phosphate groups. The DNA bases fall into 2 classes: pyrimidines (including cytosine and thymine) and purines (including adenine and guanine).

Structurally, DNA in the nucleus of a living cell is a double-stranded, antiparallel helical polymer of deoxyribose linked by phosphate groups; 1 of 4 bases projects from each sugar molecule of the sugar-phosphate chain. A base-sugar unit is a nucleoside. A base-sugar-phosphate unit is a nucleotide (Figure 14.6-1).

Figure 14.6-1. Nucleosides and Nucleotides: General Structure

Image

The carbons in the sugar moiety are numbered with prime symbols, not apostrophes (eg, 3′ carbon, 5′ carbon). Sometimes chemical moieties are specified in connection with the 3′ and 5′ ends:

3′ hydroxyl end (3′ OH end)

5′ phosphate (5′ P) end

5′ OH end

(See 13.13, Elements and Chemicals.)

The phosphates that join the DNA nucleotides link the 3′ carbon of one deoxyribose to the 5′ carbon of the next deoxyribose. The end of the DNA strand with an unattached 5′ carbon is known as the 5′ end (or terminal) and the end with an unattached 3′ carbon as the 3′ end (or terminal) (Figure 14.6-2) (see 13.13, Elements and Chemicals).

Figure 14.6-2. DNA Double Helix

Image

The carbons and nitrogens of the bases are numbered 1 through 6 (pyrimidines) or 1 through 9 (purines), and the carbons of deoxyribose are designated by numbers with prime symbols 1′ through 5′ (Figure 14.6-3).

This section presents nomenclature for nucleotides of DNA, especially nomenclature used for DNA sequences (ie, nucleotide polymers). For nomenclature of nucleotides as DNA precursors and energy molecules, see 14.6.1.3, Nucleotides as Precursors and Energy Molecules.

A 1-letter designation represents each base, nucleoside, or nucleotide (Table 14.6-1). The letters are commonly used without expansion.

Table 14.6-1. Abbreviations for DNA Nucleotides

Abbreviation

Base

Nucleoside; nucleotidea residue in DNA

Molecular class

A

adenine

deoxyadenosine

purine

C

cytosine

deoxycytidine

pyrimidine

G

guanine

deoxyguanosine

purine

T

thymine

deoxythymidine

pyrimidine

a The technical name for nucleotides is nucleoside phosphates.

The chemical structure of bases is illustrated in Figure 14.6-3.

Figure 14.6-3. DNA Bases: Chemical Structure

Image

When a base (or nucleoside or nucleotide) is described that cannot be firmly identified as A, C, G, or T, it is most commonly reported as N (uncertain). Other single-letter designators that reflect biochemical properties may be used, but because these designations are not as well known as A, C, G, T, and N, it is best to define them (Table 14.6-2).

Table 14.6-2. Examples of Other Single-Letter Designators for Basesa

Symbol

Stands for

Derivation

R

G or A

purine

Y

T or C

pyrimidine

M

A or C

amino

K

G or T

keto

S

G or C

strong interaction (3 hydrogen bonds)

W

A or T

weak interaction (2 hydrogen bonds)

H

A or C or T

not G (H follows G in the alphabet)

B

G or T or C

not A (B follows A in the alphabet)

V

G or C or A

not T (V follows T in the alphabet; U is not used because it stands for uracil in RNA [see 14.6.1.2, RNA])

D

G or A or T

not C (D follows C in the alphabet)

N

G or A or T or C

any

a Adapted with permission from Moss.2 Copyright IUBMB.

Various forms of DNA are commonly abbreviated as follows; expand at first use:

bDNA

branched DNA

cDNA

complementary DNA, coding DNA

dsDNA

double-stranded DNA

gDNA

genomic DNA

hn-cDNA

heteronuclear cDNA (heterogeneous nuclear cDNA)

mtDNA

mitochondrial DNA

nDNA

nuclear DNA

rDNA

ribosomal DNA

scDNA

single-copy DNA

ssDNA

single-stranded DNA

There are several classes of DNA helixes, which differ in the direction of rotation and the tightness of the spiral (number of base pairs per turn):

A-DNA (alternate DNA)

B-DNA (balanced DNA)

C-DNA (9 base pairs [bp] per turn of spiral)

D-DNA (8 base pairs [bp] per turn of spiral)

Z-DNA (zigzag)

In eukaryotic cells, DNA is bound with special proteins associated with chromosomes (see 14.6.4, Human Chromosomes). This DNA-protein complex is known as chromatin. DNA in chromatin is organized into structures called nucleosomes by proteins known as histones. The 5 classes of histones are as follows:

H1 (linker histone)

H2A (core histone)

H2B (core histone)

H3 (core histone)

H4 (core histone)

Almost all native DNA exists in the form of a double helix, in which 2 DNA polymers are paired, linked by hydrogen bonds between individual bases on each chain. Because of the biochemical structure of the nucleotides, A always pairs with T and C with G (Figure 14.6-2). Such pairs may be indicated as follows:

A • T, A = T

C • G, C ≡ G

Mispairings (which may occur as a consequence of a variant or sequence variation) may be shown in the same way:

C • T

Unpaired DNA sequences are quantified by means of the terms base (a single base), kilobase (kb, a thousand bases), and megabase (Mb, a million bases) (see 13.12, Units of Measure). Paired DNA sequences use the terms base pairs (bp), kilobase pairs (kb), megabase pairs (Mb), and gigabase pairs (Gb). (Do not use “kbp” or “Mbp.”) For example:

a 20-base fragment

a 235-bp repeat sequence

a 27-bp region

a 47-kb vector genome

1 Mb of DNA

The size of the human haploid genome is approximately 3 × 109 bp.

Sometimes the number of nucleotides in a DNA molecule is indicated using the suffix “mer”:

20mer  (20 nucleotides)

24mer  (24 nucleotides)

(This formation is based on the terms dimer, trimer, tetramer, etc.) It is sometimes referred to as kmer or k-mer (eg, a kmer of length 20 rather than 20mer).

A DNA sequence might be depicted as follows, with standard notation of DNA sequences from 5′ to 3′:

GGATCC means 5′ GGATCC 3′

Unknown bases may be depicted by using N (see Table 14.6-2):

GNCGANNG

Instead of N, a lowercase n or a hyphen may be used for visual clarity:

GnCGAnnG

or

G-CGA--G

A double-stranded sequence that consists of a strand of DNA and its complement would be as follows:

Image

To show correct pairing between the bases in the 2 strands, sequences need to be aligned properly. In the sequence above, the first base pair is G • C, the next is T • A, etc. Note how the first G is directly above the first C, the first T above the first A, etc.

By convention in printed sequences, for single strands, the 5′ end is at the left and the 3′ end at the right; thus, a sequence such as the following

CCCATCTCACTTAGCTCCAATG

would be assumed to have this directionality:

5′-CCCATCTCACTTAGCTCCAATG-3′

The complementary strands of DNA have opposite directionality; by convention, the top strand reads from the 5′ end to the 3′ end, whereas its complementary strand appears below it with the 3′ end on the left. The 5′ strand is the sense strand or coding strand or positive strand. The 3′ strand is the antisense strand or noncoding strand. (Note that it is the noncoding strand that actually gets transcribed.) In the example

Image

this directionality is implied:

Image

Text should specify which strand, sense or antisense, is displayed. The sense strand “is the strand generally reported in the scientific literature or in databases.”3(p25)

A codon is a sequence of 3 nucleotides in a DNA molecule that (ultimately) codes for an amino acid (see below), biosynthetic message, or signal (eg, start transcription, stop transcription). Codons are also referred to as codon triplets. Examples are as follows:

CAT ATC ATT

The genetic code—typically a list or table of all the codons and the amino acids they each encode—is widely reproduced (eg, in medical dictionaries and textbooks and on the internet).

Promoter sequences are DNA sequences that define where the transcription of a gene by RNA polymerase begins. They include the following:

CAT box (CCAAT)

CG island, CpG island (CG-rich sequence)

GC box (GGGCGGG consensus sequence)

5′ UTR (5′ untranslated region) (5′ is defined below)

TATA box

Enhancers are short (50- to 1500-bp) regions of DNA that can be bound by proteins (activators) to increase the likelihood that transcription of a particular gene will occur. The κ light chain enhancer (κ enhancer), for example, contains the sequence GGGACTTTCC.

Sequences of repeating single nucleotides are named as follows:

polyA

polyC

polyG

polyT

Example: polyA tail

or, optionally, with lowercase d (within parentheses) for deoxyribose:

poly(dT)

Repeating single-nucleotide pairs (in double-stranded DNA) are similarly named:

poly(dA-dT)

poly(dG-dC)

The phosphate groups linking the nucleotides are sometimes indicated with a lowercase p:

pGpApApTpTpC

CpG island

Methylated bases may be shown with a superscript lowercase m, which refers to the nucleotide residue to the right:

GATmCC

Sequences of repeating nucleotides, also known as tandem repeats, are indicated as follows (n stands for number of repeats):

(TTAGGG)n

(GT)n

(CGG)n

Within a long sequence, the first repeat may be designated n, the next p, the next q, etc:

(TAGA)nATGGATAGATTA(GATG)pAA(TAGA)q

The number of repeats may be specified:

(GATG)2

(TAGA)12

Long sequences pose special typesetting problems. Such sequences should be depicted as separate figures, rather than within text or tables, whenever possible.

For DNA, it must be made clear whether the sequence is single-stranded or double-stranded. A double-stranded sequence such as that of the following example

Image

might be mistaken for a single-stranded sequence and set as such:

Image

Conversely, mistaking a single-stranded sequence for a double-stranded sequence and typesetting accordingly should also be avoided.

Always maintain alignment in 2-stranded sequences—take care that the following

Image

does not become this:

Image

Numbering and spacing may be used as visual aids in presenting sequences. A space every 3 bases indicates the codon triplets:

. . . GCA GAG GAC CTG CAG GTG GGG . . .

DNA sequences for protein-coding regions in most eukaryotic cells contain both exons (coding sequences of triplets) and introns (intervening noncoding sequences). An intron occurs within the sequence (examples from Cooper4[p273]):

intron: GTGAG . . . GGCAG

sequence in preceding example with intron included:

. . . GCA GAG GAC CTG CAG G GTGAG . . . GGCAG TG GGG . . .

Another way to display introns amid exons is to use lowercase letters for introns and uppercase letters for exons. There is a space on either side of the intron, and the next exon continues in the same frame or phase as before, to resume the correct codon sequence:

Image

In longer DNA sequences, spaces every 5 or 10 bases are customary visual aids:

Image

Image

Several types of numbering are further aids. In the following example (from Cooper,4(p133); “lowercase letters indicate uncertainty in the base call”), numbers on the left specify the number of the first base on that line:

Image

Alternatively, numbers may appear above bases of special interest:

Image

When the base number is large, the right-most digit should be directly over the base being designated:

Image

When a long sequence is run within text, use a hyphen at the right-hand end of the line to indicate the bond linking successive nucleotides:

Image

A hyphen is not necessary if spacing is used, as long as the break between groups occurs at the end of the line. The DNA sequence may be displayed as follows:

Image

Recognition sequences are sections of a sequence recognized by proteins such as restriction enzymes, which cleave DNA in specific locations (see 14.6.1.4, Nucleic Acid Technology). To indicate sites of cleavage, virgules or carets may be used:

single-stranded:

C^TCGTG

C/TCGTG

double-stranded:

CGWCG^

^GCWGC

CGWCG/

/GCWGC

Other conventions should be defined, in parentheses for text or in legends for tables and figures.5 For example:

CACNN↓NNGTG (↓ indicates cleavage at identical position in both strands)

14.6.1.1.1 Sequence Variations, Nucleotides.

Recommendations for sequence variation (mutation) nomenclature have been one of the major activities of the HUGO Mutation Database Initiative, now the Human Genome Variation Society (HGVS).6 Members devised the nomenclature after extensive community discussion.7,8,9,10,11,12 Authors should consult the Recommendations page of the HGVS website for the latest recommendations,6 the HGVS Simple section of the HGVS website,13 and the 2016 update.14 Basic style points are as follows (see 14.6.1.5.1, Sequence Variations, Amino Acids):

■For sequence variations described at the nucleotide level, the nucleotide number precedes the capital-letter nucleotide abbreviation.

12345A>T

■Numbers at the end of the term, if any, do not stand for the nucleotide number but rather indicate numbers of nucleotides involved in the change or, in the case of repeated sequences, numbers of repeats.

c.54GCA[21]  [an allele of 21 GCA repeats]

■The nucleotide number should be preceded by g plus dot (g.) for gDNA (genomic), c plus dot (c.) for cDNA (complementary or coding), n plus dot (n.) for noncoding, r plus dot (r.) for RNA, m plus dot (m.) for mitochondrial, or p plus dot (p.) for protein.

■The symbol > is used for substitutions. The following abbreviations are used: ins, insertion; del, deletion; indel, insertion and deletion; dup, duplication; inv, inversion; con, conversion; and t, translocation. An underscore character separates a range of affected nucleotide residues.

c.4375C>T [C nucleotide at position 4375 changed to a T]

c.4375_4379del [nucleotides from positions 4375 to 4379 (GATT) are missing (deleted)]

■One set of brackets is used for 2 variations in a single allele, and 2 sets with a semicolon are used for 2 variations in paired alleles.

[76C>T;283G>C] [2 variants on 1 molecule]

[76C>T];[283G>C] [the same 2 variants on 2 different molecules]

■Nucleotide numbers may be positive or negative.

■The HGVS recommends avoiding the terms mutation and polymorphism, preferring instead the terms sequence variant, sequence variation, alteration, or allelic variant. In view of this recommendation, single-nucleotide variation (SNV) is now more frequently being used instead of SNP (single-nucleotide polymorphism) and may become standard in the future. To aid readers’ understanding during this transition, at first mention SNV may be used, with SNP in parentheses:

“SNV (formerly SNP)”

Note the examples in Table 14.6-3. In general medical publications, textual explanations should accompany the shorthand terms at first mention.

Table 14.6-3. Examples of Sequence Variation Nomenclature

Term

Explanation

1691G>A

G-to-A substitution at nucleotide 1691

253Y>N

pyrimidine at position 253 replaced by another base

[76A>C;83G>C]

2 substitutions in single allele (Note: Variations in same allele are indicated by brackets.)

[76A>C] + [87delG]

substitution and deletion in paired alleles

[76A>C (+) 83G>C]

2 sequence changes in 1 individual, alleles unknown

977_978insA

A inserted between nucleotides 977 and 978

186_187insC

C inserted between nucleotides 186 and 187

926_927insAAAAAAAAAAA

insertion of 11 A’s between nucleotides 926 and 927

185_186delAG

deletion of A and G between nucleotides 185 and 186

617_618delT

deletion of T between nucleotides 617 and 618

188_199del11

11-bp deletion between nucleotides 188 and 199

1294_1334del40

40-bp deletion between nucleotides 1294 and 1334

c.5delA

A deleted at position 5 (cDNA)

c.5_7delAGG

AGG deleted at positions 5 through 7 (cDNA)

g.5_123del

nucleotides deleted from positions 5 through 123 (gDNA)

g.7dup

duplication of a T at position g.7 in the sequence ACTTACTGCC to ACTTACTTGCC

1007fs

frameshift mutation at codon 1007

112_117delinsTG

deletion from nucleotide 112 through 117 and insertion of TG

112_117delAGGTCAinsTG

112_117>TG

203_506inv

304 nucleotides inverted from positions 203 through 506

203_506inv304

167(GT)6-22

6 to 22 GT repeats starting at position 167

g.167(GT)8

8 GT repeats starting at position 167 (gDNA)

c.827_XYZ:233del

Examples7 with hypothetical gene symbol XYZ incorporated (but not italicized) (see 14.6.2, Human Gene Nomenclature)

c.827_oXYZ:233del

o: opposite (antisensea) strand

Abbreviations: bp, base pair; cDNA, complementary or coding DNA; gDNA, genomic DNA.

a A DNA molecule consists of 2 strands; one is the sense strand and one is the antisense strand. The sense strand (also called coding strand, plus strand, or nontemplate strand) contains codons and is the same as mRNA except that thymine in DNA is replaced by uracil in RNA. The antisense strand (also called noncoding strand, minus strand, or template strand) contains noncodons and acts as a template for the synthesis of mRNA. Therefore, the antisense strand is complementary to the sense strand and mRNA.15

When a gene symbol is used with a sequence variation term, only the gene symbol is italicized (see 14.6.2, Human Gene Nomenclature).

ADRB1 1165C>G (not: ADRB1 1165C>G)

Note: Sequence variants are often indicated by using virgules, but this is not recommended.12

Avoid:

1721G/A

Preferred:

1721G, 1721A

Avoid:

2417A/G

Preferred:

2417A>G

In practice, means other than the symbol > are commonly used to indicate substitutions. Of the following, the JAMA Network journals prefer the arrow:

1691G→A

1691G-A

1691GtoA

1691G-to-A

Any symbol for substitution is better than no symbol; otherwise the expression may be misinterpreted as indicating a dinucleotide at the site. For instance, 1691GA would imply a change involving the dinucleotide GA (1691G and 1692A).

When genotype is being expressed in terms of nucleotides (eg, sequence variants), italics and other punctuation for the nucleotides are not needed (see 14.6.2, Human Gene Nomenclature):

MTHFR 677 CC and TT genotypes

For nucleotide numbering of a cDNA reference sequence, nucleotide +1 is the A of the ATG initiator codon. The first nucleotide immediately 5′ (upstream) of the ATG initiator codon is −1. So for the sequence 5′AGC CTG ATG GAC CTC 3′ the G immediately 5′ of the

ATG is −1, and A is +1. The nucleotide 3′ of the translation stop codon is *1. For nucleotides in introns, those at the 5′ end of the intron are numbered with a “plus” relative to the last base of the immediately preceding exon, whereas those at the 3′ end are numbered with a “minus” relative to the first base of the immediate downstream exon. For example:

c.77+2T

cDNA, nucleotide 77 of preceding exon, position 2 in intron, T residue

c.78-1G

cDNA, nucleotide 78 of next exon, position 1 in intron, G residue

Nucleotide numbering of a DNA reference sequence is arbitrary (ie, there is no defined starting point as in cDNA). Therefore, authors should describe their numbering scheme. No plus signs or minus signs are used with gDNA reference sequences.

Listing both the official and the traditional names next to each other in the variant summary will help authors and readers become more familiar with the official (preferred) terms.

Preferred (Official):

c.88+2T>G

Replaces (Traditional):

IVS#+2T>G

Promoter variants (promoter sequence variants) have been commonly expressed with terms such as

−765G>A

which implies nucleotide numbering in terms of a cDNA reference sequence. However, authors are advised to instead (or additionally) describe the variant in relation to a gDNA reference sequence (see 14.6.1.1.2, Unique Identifiers).14

L01531.1:g.1561C>T

Terms with a capital delta have been used to indicate exonic deletions. For example:

∆ ex 1a-15

∆ ex 1a-12

∆ ex 3

14.6.1.1.2 Unique Identifiers.

Official recommendations include mentioning a sequence variant’s unique identifier, for instance, a number assigned by a locus-specific curator or the OMIM number.16 Allelic variants are designated by the 6-digit OMIM number, followed by a decimal point and a unique 4-digit variant number. The asterisk that precedes the number indicates that it is a gene (see 14.6.2.1, OMIM, for an explanation of OMIM numbering system and symbols). For a list of locus-specific database curators, see the HGVS website under Nomenclature for the Description of Sequence Variants.6 For example:

1311C>T (OMIM *305900.0018)

880C>T (OMIM *600681.0002)

14.6.1.1.3 Database Identifiers for Genomic Sequences.

Several databases record genomic sequence information:

Nucleotides:

GenBank (https://www.ncbi.nlm.nih.gov/genbank)

RefSeq (https://www.ncbi.nlm.nih.gov/refseq/)

EMBL (European Molecular Biology Laboratory) (https://www.embl.de)

DDBJ (DNA Data Bank of Japan) (https://www.ddbj.nig.ac.jp)

International HapMap Project (https://www.genome.gov/10001688/international-hapmap-project)

Proteins:

RCSB Protein Data Bank (https://www.rcsb.org/)

Protein database (https://www.ncbi.nlm.nih.gov/protein)

UniProt Knowledgebase (https://www.uniprot.org)

UniProtKB/Swiss-Prot (web.expasy.org/docs/swiss-prot_guideline.html)

PIR-PSD (Protein Information Resource: Protein Sequence Database) (https://proteininformationresource.org/pirwww/dbinfo/pir_psd.shtml)

For a review of databases in molecular biology, including several of the foregoing, see the 2018 Database Issue of the journal Nucleic Acids Research.17

Accession numbers are assigned when researchers submit unique sequences to any one of the databases. In published articles, accession numbers are useful in indicating specific sequences:

Founder effects were investigated using 2 previously undescribed, highly polymorphic microsatellite markers that flank presenilin 1. The first is a GT repeat at position 33117 (GenBank AF109907). The second is a CA repeat at position 23 000 of this same sequence.18

Accession numbers should include the version (eg, .1, .2) if possible6:

NM_000130.1

NM_000130.2

L01538.1

The following example shows variation expressed with the accession number7:

NM_004006.1:c.3G>T

For unambiguous identification, both version number and accession number should be used.6 Common formatting for nucleotide data was determined in 1988 by representatives of GenBank, EMBL (European Molecular Biology Laboratory), and DDBJ (DNA Data Bank of Japan), forming the International Nucleotide Sequence Database Collaboration.19

14.6.1.2 RNA.

Functionally associated with DNA is RNA. It contains the 3 bases adenine (A), cytosine (C), and guanine (G) but differs from DNA in having the base uracil (U) instead of thymine (T) and the sugar ribose rather than deoxyribose. The corresponding nucleosides are adenosine, cytidine, guanosine, and uridine.

An example of an RNA sequence is as follows:

5′-UUAGCACGUGCUAA-3′

Examples of RNA codons are as follows:

CAU

UUG

AUU

Expand these common abbreviations at first use:

cRNA

complementary RNA

dsRNA

double-stranded RNA

gRNA

genomic RNA

hnRNA

heteronuclear RNA (heterogeneous RNA)

mRNA

messenger RNA

miRNA

microRNA

mtRNA

mitochondrial RNA

nRNA

nuclear RNA

RNAi

RNA interference

rRNA

ribosomal RNA

siRNA

short interfering RNA

snRNA

small nuclear RNA

tRNA

transfer RNA

Types of tRNA may be further specified; follow typographic style closely (these need not be expanded after the initial expansion of tRNA):

tRNAMet

tRNA specific for methionine

Met-tRNAMet

methionyl-tRNA

tRNAfMet

tRNA specific for formylmethionine

fMet-tRNAfMet


or

N-formylmethionyl-tRNA

fMet-tRNAf


tRNAAla

tRNA specific for alanine

tRNAVal

tRNA specific for valine

The 3-dimensional structure of tRNA has several different arms, which allow it to recognize a codon on mRNA and deliver the appropriate amino acid during protein synthesis:

AA (amino acid) arm

DHU (dihydrouridine) arm

anticodon arm

TψC arm (ψ for the unusual base pseudouridine)

14.6.1.2.1 RNA Sequence Variations.

Style for abbreviated sequence variation terms described at the RNA level is essentially the same as for DNA (see 14.6.1.1.1, Sequence Variations, Nucleotides). The main exception is that the RNA nucleotide abbreviations are lowercase. The prefix r. is used to signify RNA12 but is not required.

78a>u

r.76a>c

RNA sequences are quantified by use of the same units as for DNA (ie, base, bp, kb, and Mb) (see 13.12, Units of Measure):

240-bp dsRNA

10-25 RNA bases

a 7.5-kb RNA probe

14.6.1.3 Nucleotides as Precursors and Energy Molecules.

The nucleotides of DNA and RNA are also important individually as the precursors of DNA and RNA and as energy molecules. They may bind 1, 2, or 3 phosphate molecules, giving rise to compounds with the following abbreviations (see 13.11, Clinical, Technical, and Other Common Terms) or alternative shorthand.

14.6.1.3.1 Ribonucleotides.

See Table 14.6-4 for examples of terms and their abbreviations.

Table 14.6-4. Examples of Terms and Abbreviations for Ribonucleotides

Terms

Abbreviation

Alternative shorthand

adenosine monophosphate, adenylic acid

AMP

pA

adenosine diphosphate

ADP

ppA

adenosine triphosphate

ATP

pppA

cytidine monophosphate, cytidylic acid

CMP

pC

cytidine diphosphate

CDP

ppC

cytidine triphosphate

CTP

pppC

guanosine monophosphate, guanylic acid

GMP

pG

guanosine diphosphate

GDP

ppG

guanosine triphosphate

GTP

pppG

uridine monophosphate, uridylic acid

UMP

pU

uridine diphosphate

UDP

ppU

uridine triphosphate

UTP

pppU

14.6.1.3.2 Deoxyribonucleotides.

See Table 14.6-5 for examples of terms and abbreviations for deoxyribonucleotides.

Table 14.6-5. Examples of Terms and Abbreviations for Deoxyribonucleotides

Term

Abbreviation

Alternative shorthanda

deoxyadenosine monophosphate, deoxyadenylic acid

dAMP

pdA

deoxyadenosine diphosphate

dADP


deoxyadenosine triphosphate

dATP


deoxycytidine monophosphate, deoxycytidylic acid

dCMP

pdC

deoxycytidine diphosphate

dCDP


deoxycytidine triphosphate

dCTP


deoxyguanosine monophosphate, deoxyguanylic acid

dGMP

pdG

deoxyguanosine diphosphate

dGDP


deoxyguanosine triphosphate

dGTP


deoxythymosine monophosphate, deoxythymidylic acid

dTMP

pdT

deoxythymosine diphosphate

dTDP


deoxythymosine triphosphate

dTTP


a Terms such as ppdA and pppdA are, by analogy with ribonucleotide shorthand, feasible but not commonly found.

In the foregoing examples, monophosphates are assumed to be phosphorylated at the 5′ position, and the more specific term may be used:

5′-AMP

The additional phosphate groups of diphosphates and triphosphates are linked sequentially to the first phosphate group. Other phosphate positions and variations may be specified as follows:

2′-UMP


3′-UMP

Up

3′,5′-ADP

pAp

3′,5′-AMP

cAMP (cyclic AMP)

Note that the p follows the capital letter when 3′-phosphate is indicated.

14.6.1.4 Nucleic Acid Technology.

Laboratory methods of analyzing DNA make use of special DNA sequences, which include the following:

RFLPs

restriction fragment length polymorphisms

SNPs

SNPs single-nucleotide polymorphisms (pronounced “snips”) (note that SNVs is now preferred; see 14.6.1.1.1, Sequence Variations, Nucleotides)

SNVs

single-nucleotide variants

STRs

short tandem repeats

STRPs

STR polymorphisms

STSs

sequence tagged sites

VNTRs

variable number of tandem repeats

Note: Satellite DNA repeats, microsatellite (repeating sequences of 1-9 bp) repeats (or markers), and minisatellite (repeating sequences of 10-100 bp) repeats20 (or markers) are distinct types of tandem repeat sequences.

An SNV sequence may be preceded by rs (for reference SNV ID) or ss (for submitted SVP ID), used for accession numbers assigned by the National Center for Biotechnology Information:

rs1002138(-)

14.6.1.4.1 The Reference Genome.

The publication of the draft human genome sequence in 2001 heralded the beginning of the current era of genomic medicine.21 Since that time, rapid advances in technology have facilitated increasingly accurate and inexpensive methods for interrogating the genomes of humans and model organisms for research and clinical care.

Current sequencing technologies do not sequence chromosomes from end to end. Rather, in a massively parallel process, genomic DNA is fragmented, sequenced, and reassembled for purposes of representation of a nearly complete genome.22 In some applications only the protein coding regions of the genome are sequenced (exome sequencing), but increasingly it is feasible to sequence the entire genome for research or clinical purposes.

Of importance, a genome assembly and a genome are not the same thing. “A genome is the physical entity that defines an organism. An assembly is not a physical object; it is the collection of all sequences used to represent the genome of an organism.”22 Assemblies can be of varying degrees of completeness; for example, some regions of the human genome remain refractory to sequencing or assembly with current technologies. Informaticians and geneticists are continually striving to refine the accuracy of these assemblies known as reference genomes. As sequencing and assembly technologies continue to evolve, so does the notion of a reference assembly. The sequences in the human reference genome assembly do not represent the genome of a single individual but are mosaics constructed from the DNA of many anonymous individuals. Contributions from one individual comprise approximately 70% of the assembly sequence, although more than 50 individuals are represented in GRCh38.23 The human reference genome assembly or build23 (currently GRCh38) acts as the coordinate system for the human genome and the features annotated on it and is often the representation used for comparisons with other human genomes for diagnosis or research. Initially produced by the Human Genome Project, it is now maintained by the Genome Reference Consortium. However, other genomes may also be used as a reference in comparative analyses (eg, a parent’s genome vs an offspring’s genome). In publication, the most reliable means to define a genome assembly is by its unique GenBank (INSDC: http://insdc.org/)24 accession number (eg, GRCh38 = GCA_000001405.15). If an assembly has not been deposited in GenBank, typically, at a minimum, an identifier for the sample from which the assembly is derived is provided (eg, INSDC BioSample accession [eg, SAMN06710886] or other identifier [eg, Coriell: NA10874]). Publications may include names for genome assemblies along with accession numbers or sample identifiers (eg, HuRef = GCA_000002125.2) (see 14.6.1.1.3, Database Identifiers for Genomic Sequences). Note: Use care with the term reference. Although the Human Genome Project produced the first notion of a human reference genome assembly, there are several ongoing efforts to create high-quality genome assemblies that could serve as population-specific reference assemblies.25 None of these have yet been formally recognized by the global research community as a reference, but “it may be possible that the future human ’reference’ genome is a panel of assemblies, rather than a single assembly” (Valerie Schneider, PhD, staff scientist, National Center for Biotechnology Information, written communication, May 2, 2017). There are also multiple efforts under way to use graph formats (rather than the traditional linear sequence format) to create references that represent population-level variation.26

For patients with disorders that have a primarily genetic origin, massively parallel sequencing of the total complement of an individual’s DNA (genome sequencing) has proven to be a powerful diagnostic approach. Genome-scale sequencing can be performed on DNA from white blood cells or on buccal cells from saliva. In sequence analysis, each individual’s genome contains millions of sites where his or her DNA differs from a reference sequence. Clinical interpretation requires assessing whether any of these variants are associated with disease.27 See Figure 14.6-4 for the analysis processing sequence.

Figure 14.6-4. Informatic and Human Analysis Required for Finding Rare Pathogenic Variants in a Human Genome

Genetic variants are informatically filtered to remove those with very low likelihood of pathogenicity (eg, variants known to be benign or present at very high frequency in the general population). This informatic processing incorporates annotations of individual variants (eg, population allele frequencies, prior literature reports, computational predictions of functional effect) for use in manual analysis. Reproduced from Evans et al.27

Image

14.6.1.4.2 Methods of Analysis.

Methods of analysis include the following:

ASO

allele-specific oligonucleotide probes

DGGE

denaturing gradient gel electrophoresis

EMSA

electrophoretic mobility shift assay

FISH

fluorescence in situ hybridization

OSH

oligonucleotide-specific hybridization

PCR

polymerase chain reaction

PTT

protein truncation test

RT-PCR

reverse-transcriptase polymerase chain reaction

SKY

spectral karyotyping, a type of fluorescence in situ hybridization

SSCP

single-stranded conformational polymorphism

14.6.1.4.3 Blotting.

The first blotting technique, used for identifying specific DNA sequences in gDNA isolated in vitro by means of nucleic acid probes, was named Southern blotting for its originator, Edwin Southern. Similar techniques have since been named (with droll intent) for compass directions and include Northern blotting (RNA identified; nucleic acid probe), Western blotting (protein identified; antibody probe), Southwestern blotting (DNA protein identified; DNA probe), and Far Western blotting (protein-protein interaction identified; protein probe).28,29 Recombinant DNA is DNA created by combining isolated DNA sequences of interest. Among the tools used in this process are cloning vectors, such as plasmids, phages (see 14.14.3, Virus Nomenclature, and 14.4.4, Prions), and hybrids of these, cosmids, and phagemids. Additional tools are bacterial artificial chromosomes (BACs) and yeast artificial chromosomes (YACs).

Basic explanations of these entities are available in medical dictionaries and textbooks. A few that present special nomenclature problems are described here.

14.6.1.4.4 Cloning Vectors.

Plasmids are typically named with a lowercase p followed by a letter or alphanumeric designation; spacing may vary:

pBR322

pJS97

pUC

pUC18

pSPORT

pSPORT 2

Phage cloning vectors are named for the phages. For example:

phage λ:

λgt10, λgt11, λgt22A

M13 phage:

M13KO7, M13mp

14.6.1.4.5 Restriction Enzymes.

Restriction enzymes (or restriction endonucleases) are special enzymes that cleave DNA at specific sites. They are named for the organism from which they are isolated, usually a bacterial species or strain. An authoritative source of information is REBASE.5 As originally proposed,30 their names consist of a 3-letter term, italicized and beginning with a capital letter, taken from the organism of origin, for example:

Hpa for Haemophilus parainfluenzae

followed by a roman numeral, which is a series number, for example:

HpaI

HpaII

In some cases, the series number is preceded by a capital or lowercase letter (roman, not italic), an arabic numeral, or a number and letter combination, which refers to the strain of bacterium; there are no spaces between any of these elements of the term:

EcoRI

HinfI

Sau96I

Sau3AI

Many variations in the form of the names of these enzymes have appeared (eg, Hin d III, Hin dIII, Hind III, Hind III). It is currently recommended that italics and spacing be given as noted in the preceding paragraph to differentiate the species name, strain designation, and enzyme series number. Table 14.6-6 gives examples of commonly used restriction enzymes.

Table 14.6-6. Examples of Commonly Used Restriction Enzymes and the Organism of Origin

Enzyme name

Organism of origin

AccI

Acinetobacter calcoaceticus

AluI

Arthrobacter luteus

AlwNI

Acinetobacter lwoffii N

BamHI

Bacillus amyloliquefaciens H

BclI

Bacillus caldolyticus

BstEII

Bacillus stearothermophilus ET

BstXI

Bacillus stearothermophilus X

I-CeuI

Chlamydomonas eugametos

DpnI

Streptococcus (diplococcus) pneumoniae M

EcoRI

Escherichia coli RY13

EcoRII

Escherichia coli R245

HaeII

Haemophilus aegyptius

HincII

Haemophilus influenzae Rc

HindIII

Haemophilus influenzae Rd

HinfI

Haemophilus influenzae Rf

MseI

Micrococcus species

MspI

Moraxella species

PleI

Pseudomonas lemoignei

PmlI

Pseudomonas maltophilia

PstI

Providencia stuartii

Sau3AI

Staphylococcus aureus 3A

Sau96I

Staphylococcus aureus PS96

SmaI

Serratia marcescens

SstI

Streptomyces stanford

TaqI

Thermus aquaticus YT-1

XbaI

Xanthomonas badrii

XhoI

Xanthomonas holicola

Prefixes may further specify type of enzyme action, for example:

I-CeuI

I: intron-coded endonuclease

Chlamydomonas eugametos

M.MlyI

M: methylase

Micrococcus lylae

N.MlyI

N: nicking enzyme


Restriction enzyme names are often seen as modifiers, for example:

a BamHI fragment

an EcoRI site

For information on recognition sequences, see 14.6.1.1, DNA.

14.6.1.4.6 Modifying Enzymes.

Enzymes exist that synthesize DNA and RNA (polymerases), cleave DNA (nucleases), join nucleic acid fragments (ligases), methylate nucleotides (methylases), and synthesize DNA from RNA (reverse transcriptases) (see 14.10.3, Enzyme Nomenclature). Those in laboratory use come from living systems, often from the same organisms that furnish restriction enzymes. Because the names may be similar, it is essential to specify the type of enzyme so that there is no confusion, for example:

AluI methylase

Pfu DNA polymerase (Pyrococcus furiosus)

TaqI methylase

Taq DNA ligase

Modifying enzyme names are often seen as modifiers, for example:

a TaqI RFLP

In the following enzyme terms, T plus numeral refers to the related phage (see 14.14.3, Virus Nomenclature, and 14.4.4, Prions):

T7 DNA polymerase

T4 DNA polymerase

T4 polynucleotide kinase

T4 RNA ligase

14.6.1.4.7 DNA Families.

Some sequences belonging to non—protein-coding regions of the genome can also be classified by their base content. Non—protein-coding DNA includes that which is transcribed into functional noncoding RNA molecules (eg, transfer RNA, ribosomal RNA, and regulatory RNA, such as microRNA), as well as families of repetitive sequence, some of which include transposons and retrotransposons. Families include the following:

Collective term: SINEs (short interspersed nuclear elements) Example: Alu family (named for AluI; see 14.6.1.4.5, Restriction Enzymes) Category: Interspersed

Collective term: LINEs (long interspersed nuclear elements) Example: L1 family (from LINE 1 family) Category: Tandem

14.6.1.5 Amino Acids.

Twenty amino acids are encoded by triplet base codons in DNA and constituents of proteins. Each has 1 or more distinct codons in DNA (eg, GCU, GCC, GCA, and GCG code for alanine).

Table 14.6-7 gives the amino acids of proteins and their preferred 3- and single-letter symbols. Although these amino acids have systematic names (eg, alanine is 2-aminopropanoic acid), the trivial names are the most widely recognized and used. The single-letter symbols are usually used for longer sequences; otherwise, the 3-letter symbols are preferred. Do not mix single-letter and 3-letter amino acid symbols. In publications for a general audience, it may be helpful to define the single-letter symbols (eg, in a key) and to expand the 3-letter symbols at first mention as well.

Table 14.6-7. Amino Acids of Proteins and Their 3- and Single-Letter Symbols

Amino acid

3-Letter symbol

Single-letter symbol

alanine

Ala

A

arginine

Arg

R

asparagine

Asn

N

aspartic acid

Asp

D

asparagine or aspartic acid

Asx

B

cysteine

Cys

C

glutamic acid

Glu

E

glutamic acid or glutamine

Glx

Z

glutamine

Gln

Q

glycine

Gly

G

histidine

His

H

isoleucine

Ile

I

leucine

Leu

L

lysine

Lys

K

methionine

Met

M

phenylalanine

Phe

F

proline

Pro

P

serine

Ser

S

threonine

Thr

T

tryptophan

Trp

W

tyrosine

Tyr

Y

valine

Val

V

unspecified amino acid

Xaa

X

The symbols Asp and Glu apply equally to the anions aspartate and glutamate, respectively, the forms that exist under most physiological conditions.

Other amino acids are also well known by their trivial names and have 3-letter codes. These, however, should always be expanded at first mention, as the example of cystine, whose 3-letter code is the same as that of cysteine, bears out:

citrulline

Cit

cystine

Cys

homocysteine

Hcy

homoserine

Hse

hydroxyproline

Hyp

ornithine

Orn

thyroxine

Thx

The side chains of amino acids are known as R groups, and the letter R is used in molecular formulas when indicating a nonspecified side chain, as in this general formula for an amino acid:

Image

Do not confuse the R with the single-letter abbreviation for arginine (see Table 14.6-7).

Peptide bonds are bonds between the α-carboxyl group of one amino acid and the α-amino group of the next. Long peptide sequences are the backbones of proteins. A peptide sequence might be indicated as follows, with hyphens representing peptide bonds:

Gly-Ile-Val-Glu-Gln-Cys-Cys-Ala-Ser-Val-Cys-Ser-Leu-Tyr

By convention in such a sequence, the amino end of the peptide (the end of the peptide whose amino acid has a free amino group, also known as the N terminal) is on the left and the carboxyl end (the end of the peptide whose amino acid has a free carboxyl group, also known as the C terminal) is on the right. The symbols NH2 and COOH may be included in the representation of the peptide sequence, as follows:

NH2-Gly-Ile-Val-Glu-Gln-Cys-Cys-Ala-Ser-Val-Cys-Ser-Leu-Tyr-COOH

The same left-to-right convention applies to sequences using single letters. The above sequence using single letters would be as follows:

GIVEQCCASVCSLY

When the NH2 group appears on the right of a sequence, it has a meaning other than amino end. For instance, in the following sequence, Val-NH2 indicates the amide derivative of valine:

His-Phe-Arg-Lys-Pro-Val-NH2

To indicate bonds other than the peptide bonds described above, lines, rather than hyphens, are used:

Image

(Adapted with permission from Moss.2 Copyright IUPAC and IUBMB.)

For a multiline peptide sequence in running text, use a hyphen at the right end of one line to indicate a break and at the start of the next line to indicate the peptide bond:

Ala-Ser-Tyr-Phe-Ser-

-Gly-Pro-Gly-Trp-Arg

or, in figures, use a line:

Image

(Adapted with permission from Moss.2 Copyright IUPAC and IUBMB.)

In special cases, such as cyclic compounds (illustrated here by gramicidin S), the bond from C-2 to N-2 can be shown with arrows, as follows:

Image

(Adapted with permission from Moss.2 Copyright IUPAC and IUBMB.)

As with nucleic acid sequences, alignment is important in protein sequences. In the following examples, the amino acid residues must remain aligned with the nucleic acid triplets:

Image

(Adapted with permission from Moss.2 Copyright IUPAC and IUBMB.)

An amino acid term plus number refers to the amino acid by codon number (when known) or by protein residue. For example:

Arg506

14.6.1.5.1 Sequence Variations, Amino Acids.

HGVS has expressed a preference for the 3-letter amino acid abbreviation to be used in shorthand descriptions of sequence variations in proteins because several amino acids start with the same initial letter (eg, Ala, Arg, Asn, Asp). The use of only 1 letter could lead to ambiguity or confusion. The 1-letter style still may be seen but is not recommended. For sequence variations described at the protein level, recommended style for abbreviated terms is similar to that for nucleotides (see 14.6.1.1.1, Sequence Variations, Nucleotides, and 14.6.2, Human Gene Nomenclature). Note, as indicated in Table 14.6-8, that the amino acid abbreviation begins the term, preceding the position number (in contrast to nucleotide sequence variant terms, in which the residue number precedes the residue abbreviation). Explanation of such terms at first mention is recommended. Use of the prefix p. (protein) is another recent recommendation.

Table 14.6-8. Sequence Variations in Proteins and Their 3- and Single-Letter Descriptions

3-Letter style

Single-letter style

Explanation

Arg506Gln

R506Q

arginine at residue 506 replaced by glutamine

Leu10ins

L10ins

leucine inserted at position 10

Leu141del

L141del

leucine deleted at position 141

Gln318X or Gln318ter

G318X

glutamine at 318 changed to stop codon (X or ter)

p.Trp26Cys

p.W26C

tryptophan at residue 26 replaced by cysteine

X is officially recommended as the symbol for the stop codon, but it can also be the single-letter abbreviation for unspecified or unknown amino acid. Therefore, when an amino acid sequence expressed with single letters that includes X is used, the X should be explained in the text.

When an amino acid sequence variation is used with a gene symbol, italicize only the gene symbol:

ADRB1 Arg389Gly (not: ADRB1 Arg389Gly )

(See 14.6.2, Human Gene Nomenclature.)

Note: Residue numbering begins at the translation initiator methionine, +1.

For further details on expressing sequence variations in proteins, consult the HGVS recommendations.6

Principal Author: Cheryl Iverson, MA

Acknowledgment

Thanks to the following for reviewing and providing comments: W. Gregory Feero, MD, PhD, JAMA, and Maine-Dartmouth Family Medicine Residency, Augusta; Valerie Schneider, PhD, National Center for Biotechnology Information, Bethesda, Maryland; Trevor Lane, MA, DPhil, Edanz Group, Fukuoka, Japan; and John J. McFadden, MA, JAMA Network. Thanks also to David Song, JAMA Network, for obtaining permissions.

References

1.Cammack R. The biochemical nomenclature committees. IUBMB Life. 2000;50(3):159-161. doi:10.1080/152165400300001453

2.Moss GP. International Union of Biochemistry and Molecular Biology recommendations on biochemical & organic nomenclature, symbols & terminology, etc. Updated May 21, 2018. Accessed June 25, 2018. https://www.qmul.ac.uk/sbcs/iubmb/

3.Nussbaum RL, McInnes RR, Willard HF. Thompson & Thompson Genetics in Medicine. 8th ed. Elsevier; 2016.

4.Cooper NG. The Human Genome Project: Deciphering the Blueprint of Heredity. University Science Books; 1994.

5.Roberts RJ, Vincze T, Posfai J, Macelis D. REBASE: a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Res. 2010;38(database issue):D234-D236. Accessed July 31, 2019. https://www.ncbi.nlm.gov/pmc/articles/PMC2808884

6.Human Genome Variation Society website. Updated May 17, 2018. Accessed July 31, 2019. http://www.hgvs.org

7.den Dunnen JT, Antonarakis SE. Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion. Hum Mutat. 2000;15(1):7-12. doi:10.1002/(SICI)1098-1004(200001)15:1<7::AID-HUMU4>3.0.CO;2-N

8.Antonarakis SE; Nomenclature Working Group. Recommendations for a nomenclature system for human gene mutations. Hum Mutat. 1998;11(1):1-3. doi:10.1002/(SICI)1098-1004(1998)11:1<1::AID-HUMU1>3.0.CO;2-O

9.Beutler E, McKusick VA, Motulsky AG, Scriver CR, Hutchinson F. Mutation nomenclature: nicknames, systematic names, and unique identifiers. Hum Mutat. 1996;8(3):203-206. doi:10.1002/(SICI)1098-1004(1996)8:3<203::AID-HUMU1> 3.0.CO;2-A

10.Ad Hoc Committee on Mutation Nomenclature. Update on nomenclature for human gene mutations. Hum Mutat. 1996;8(3):197-202. doi:10.1002/humu.1380080302

11.Beaudet AL, Tsui L-C. A suggested nomenclature for designing mutations. Hum Mutat. 1993;2(4):245-248. doi:10.1002/humu.1380020402

12.den Dunnen JT, Antonarakis E. Nomenclature for the description of human sequence variations. Hum Genet. 2001;109(1):121-124. doi:10.1007/s004390100505

13.Sequence variant nomenclature. HGVS Simple. Accessed June 25, 2018. http://varomen.ghvs.org/bg-material/simple/

14.den Dunnen JT, Dalgleish R, Maglott DR, et al; Human Genome Variation Society (HGVS) and Human Genome Organization (HUGO). HGVS recommendations for the description of sequence variants: 2016 update. Hum Mutat. 2016;37(6):564-569. doi:10.1002/humu.22981

15.Major Differences. Accessed March 17, 2019. http://www.majordifferences.com/2015/01/difference-between-sense-and-antisense.html

16.Online Mendelian Inheritance in Man (OMIM). National Center for Biotechnology Information website. Updated daily. Accessed July 31, 2019. https://www.ncbi.nlm.nih.gov/omim

17.Rigden DJ, Fernández XM. The 2018 Nucleic Acids Research database issue and the online molecular biology database collection. Nucleic Acids Res. doi:10.1093/nar/gkx1235

18.Athan ES, Williamson J, Ciappa A, et al. A founder mutation in presenilin 1 causing early-onset Alzheimer disease in unrelated Caribbean Hispanic families. JAMA. 2001;286(18):2257-2263. doi:10.1001/jama.286.18.2257

19.About INSDC. International Nucleotide Sequence Database Collaboration website. Accessed July 31, 2019. www.insdc.org/about

20.Difference between minisatellite and microsatellite. July 14, 2017. Accessed July 31, 2019. https://www.differencebetween.com/difference-between-minisatellite-and-vs-microsatellite

21.Pasche B. Whole-genome sequencing: a step closer to personalized medicine. JAMA. 2011;305(15):1596-1597. doi:10.1001/JAMA.2011.484

22.Schneider V, Church D. Genome Reference Consortium. In: The NCBI Handbook. 2nd ed. National Center for Biotechnology Information; 2013.

23.Schneider VA, Graves-Lindsay T, Howe K, et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 2017;27(5):849-864. doi:10.1101/gr.213611.116

24.International Nucleotide Sequence Database Collaboration (INSCD). Accessed July 31, 2019. http://www.insdc.org

25.McDonnell Genome Institute. Reference genome improvement. Accessed July 24, 2017. https://www.genome.wustl.edu/items/reference-genome-improvement/

26.Novak AM, Hickey G, Garrison E, et al. Genome graphs. Accessed July 24, 2017. doi:10.1101/101378

27.Evans JP, Powell BC, Berg JS. Finding the rare pathogenic variants in a human genome. JAMA. 2017;317(18):1904-1905. doi:10.1001/jama.2017.0432

28.Nicholas MW, Nelson K. North, South, or East? blotting techniques. J Invest Dermatol. 2013;133(7):e10. doi:10.1038/jid.2013.216

29.Wu Y, Li Q, Chen X-Z. Detecting protein-protein interaction by Far Western blotting. Nat Protoc. 2007;2(12):3278-3284. doi:10.1038/nprot.2007.459

30.Smith HO, Nathans D. A suggested nomenclature for bacterial host modification and restriction systems and their enzymes. J Mol Biol. 1973;81(3):419-423. doi:10.1016/0022-2836(73)90152-6

14.6.2 Human Gene Nomenclature.

The International System for Human Gene Nomenclature (ISGN), a system for gene symbols, was inaugurated in 19791,2 and has been continually updated. The history of naming genes and proteins is littered with redundancy because investigators often make discoveries separately and choose a name without following any sort of naming convention. Hence, the literature, especially literature more than a decade old, can be confusing because the same gene may have multiple names. Standardization helps both research and clinical care. The Human Gene Mapping Nomenclature Committee (HGNC), which developed the ISGN, put forth a “one human genome—one gene language” principle:

Certainly there exists a genetic and molecular basis for a single human gene language without dialects. All human nuclear genes as we know them follow the same genetic, molecular, and evolutionary principles. . . .Thus it is reasonable and logical to develop a standard and consolidated gene nomenclature system rather than have a human gene language based on different gene systems.3(p12)

The HGNC is 1 of 7 committees of the Human Genome Organisation (HUGO) and is “responsible for gene name validation.”4(p115) Gene names and symbols are assigned by the HGNC.5,6 To date, the HGNC has assigned more than 42 000 gene names.

Gene Symbols: A gene symbol is a short term, typically 3 to 7 characters long, that conveys in abbreviated form the name or other attribute of a gene. Human gene symbols usually consist of uppercase letters and may also contain (but never begin with) arabic numerals. Approved gene symbols do not contain Greek letters, roman numerals, superscripts, or subscripts and, usually, contain no punctuation. Gene symbols should be italicized, per official recommendations.7 Italicizing is a useful way to make clear that a gene, and not a similarly named entity such as a condition or product of the gene, is being discussed. Italics are not necessary in published catalogs of gene symbols.7 For style rules for gene symbols, see Table 14.6-9.

Approved symbols may represent other entities, such as chromosomal regions, certain syndromes, genes whose existence is inferred (supported by linkage analysis or association with known markers), cloned DNA segments, pseudogenes, and DNA fragments.

Within larger terms, only the gene symbol is italicized:

ADRB2 46G>A (not: ADRB2 46G>A)

ADRB2 Gly16Arg (not: ADRB2 Gly16Arg)

(For an explanation of 46G>A and Gly16Arg, see 14.6.1, Nucleic Acids and Amino Acids.)

Authors are encouraged to use the most up-to-date gene symbol, which may be verified at the HGNC database (www.genenames.org),5 previously known as Entrez Gene.8 One area of growth in the HGNC database has been the increase in the number of gene families: to date, the database includes more than 1100 families, “with 51% of the protein coding genes within [the] database associated to at least one family.”6 The HGNC symbols and names are seen as a standard and are used in all the major databases that concentrate on human genes and proteins, for example, UniProt and NCBI Gene, as well as disease and phenotype resources, including Online Locus Reference Genomic (LRG),9 a manually curated record that contains stable, and thus unversioned, reference sequences designed specifically for reporting sequence variants with clinical implications,6 and Online Mendelian Inheritance in Man (OMIM).

14.6.2.1 OMIM.

Online Mendelian Inheritance in Man (OMIM) is a continually updated catalog of human genes and genetic disorders and traits, with focus on the molecular relationship between genetic variation and phenotypic expression.10,11

When a specific syndrome is mentioned, it is helpful to include the OMIM number (see 14.6.1.1.2, Unique Identifiers):

bronchomalacia (OMIM 211450)

DiGeorge syndrome (OMIM #188400)

Each entry is given a unique 6-digit number. Allelic variants are designated by the OMIM number of the entry, followed by a decimal point and a unique 4-digit variant number. For example:

Allelic variants in the factor IX gene (OMIM 300746) are numbers 300746.0001-300746.0101.

Symbols precede many OMIM numbers. These are explained in the OMIM frequently asked questions (FAQ) site,12 as follows:

■An asterisk before an entry number indicates a gene.

■A number symbol (#) indicates that it is a descriptive entry, usually of a phenotype, and does not represent a unique locus.

■A plus sign, the entry contains the description of a gene of known sequence and a phenotype.

■A percent sign, the entry describes a confirmed mendelian phenotype or phenotype locus for which the underlying molecular basis is not known.

■No symbol, description of a phenotype for which the mendelian basis, although suspected, has not been clearly established or the separateness of the phenotype from that in another entry is unclear.

■A caret (^), the entry no longer exists because it was removed from the database or moved to another entry.

Consistent use of the approved gene symbol provides advantages when searching for information in multiple databases.13

Gene Names: Genes are usually named for the molecular product of the gene, the function of the gene, or the condition associated with the gene, if known. Gene names are not italicized. As shown in Table 14.6-9, the approved gene names, available in the above-mentioned databases, expand Greek letters and do not use subscripts (so that, for instance, in searching for a term with α online, one would type “alpha”). Descriptions based on the approved gene names but styled according to the journal in question (eg, using Greek letters and subscripts) or omitting some terms from the full name are permissible in general medical journals.

approved gene name:

the alpha-fetoprotein gene

description:

the α-fetoprotein gene

approved gene name:

the gene for beta-2-microglobulin

description:

the gene for β2-microglobulin

Table 14.6-9. Examples of Style Rules for Gene Symbols

Approved gene name

Approved gene symbol

Rule illustrated

α-fetoprotein

AFP

Greek letter changed to Latin letter (but not moved to end of symbol; exception to recommendation)

α-galactosidase

GLA

Greek letter changed to Latin letter and moved to end of symbol

β1-galactosidase

GLB1

Greek letter changed to Latin letter and moved with numeral to end of term; no subscripts or punctuation

β2-microglobulin

B2M

Greek letter changed to Latin letter; no subscripts or punctuation

coagulation factor VIII

F8

roman numeral changed to arabic numeral

heterogeneous nuclear ribonucleoprotein A2/B1

HNRPA2B1

no punctuation marks or spaces

MCF.2 cell line—derived transforming sequence

MCF2

no punctuation marks

5′-nucleotidase, cytosolic

NT5C

number moved from the start of symbol; no punctuation

5S RNA, cluster 1

RN5S1@

first character is always a letter, not a number; @ sign indicates gene cluster in chromosomal region

thromboxane A2 receptor

TBXA2R

no superscripts or subscripts

A number of conventions are followed when gene symbols and names are officially designated. Related genes are often assigned symbols by sequentially numbering a stem, the root symbol for the gene family:

ABC: root symbol

genes: ABCA1, ABCG4, etc

TNF: root symbol

genes: TNF, TNFAIP1, TNFAIP2, TNFAIP3, etc

Other conventions involve stereotypic abbreviations; for example, CR will usually signify a chromosome region. (However, a given letter or letter combination does not always signify conventional usage. For instance, L at or near the end of a symbol often, but not always, indicates “like.”) In Table 14.6-10, the conventions in column 1 reflect HGNC recommendations.5 (Note: DNA sequences are available from GenBank.)

Gene symbols can be used without expansion, with the identifying OMIM (see 14.6.2.1, OMIM) or GenBank (see 14.6.2.2, GenBank) number given parenthetically, as in the following examples:

Most of these trials included patients with metastatic colorectal cancer or assessed only KRAS (OMIM 190070) exon 2 variants.

The HSD3B1 gene (OMIM 109715) encodes for the enzyme 3β-hydroxysteroid dehydrogenase-1 (3βHSD1), which catalyzes adrenal androgen precursors into dihydrotestosterone (DHT), the most potent androgen.

Sequencing the APTX gene (OMIM 606350) was performed on request for cases of cerebellar ataxia with hypoalbumunemia and/or early-onset cerebellar ataxia combined with peripheral neuropathy and/or cerebellar atrophy using brain magnetic resonance imaging.

Autosomal dominant cerebellar ataxias are most often caused by CAG repeat expansions in ATXN1 (OMIM 601556), ATXN2 (OMIM 601517), ATXN3 (OMIM 607047), CACNA1A (OMIM 601011), ATXN7 (OMIM 607640), TBP (OMIM 600075), or ATN1 (OMIM 607462).

Patients with stage IV melanoma and established BRAF (GenBank NM_004333.5) or NRAS (GenBank NM_002524.4) variants treated with pembrolizumab or nivolumab alone or in combination between July 3, 2014, and May 24, 2016, were included.

14.6.2.2 GenBank.

GenBank14 is the National Institutes of Health genetic sequence database, an annotated collection of all publicly available DNA sequences. It is part of the International Nucleotide Sequence Database Collaboration, which includes 3 organizations: the DNA DataBank of Japan, the European Nucleotide Archive, and GenBank at the National Center for Biotechnology Information. These organizations exchange data daily, and a new release is issued every 2 months.

Table 14.6-10. Examples of Conventions for Gene Names and Gene Symbols

Convention illustrated

Gene symbol

Gene description

@: gene family or cluster; RN, RNA

RN5S1@

RNA, 5S ribosomal 1q42 cluster

AP: associated protein

BRAP

BRCA1-associated protein

AS: antisense

IGF2-AS

IGF2 antisense RNA (no longer used: insulinlike growth factor 2, antisense)

BP: binding protein

IL18BP

interleukin 18 binding protein

C: catalytic

G6PC

glucose 6-phosphatase, catalytic (glycogen storage disease type I, von Gierke disease)

CASP (stem), sequentially numbered

CASP1, CASP2, CASP3, etc

caspase 1, 2, 3, etc, apoptosis-related cysteine protease

CF (formerly); name modified after discovery of gene product

CFTR

cystic fibrosis transmembrane conductance regulator

CR: chromosome region

ANCR

Angelman syndrome chromosome region

CR: chromosome region

DCR

Down syndrome chromosome region

D: DNA; 19, chromosome 19; S: (unique DNA) segment; E: expressed

TOMM40 (D19S1177E is an alias; the official term should be preferred)

translocase of outer mitochondrial membrane 40 homolog (yeast) (no longer used; DNA: segment sequence)

D: domain-containing

BRD1

bromodomain containing 1

F: series letter; X, X chromosome

F81A (no longer used: DXS522E)

coagulation factor VIII—associated 1 (no longer used: DNA segment sequence)

F: series letter, X, X chromosome

FRAXF

fragile site, folic acid type, rare, fra(X)(q28) F

FAM: family with sequence similarity

ULK4P1 (no longer used: FAM7A1)

ULK4 pseudogene (no longer used; family with sequence similarity 7, member A1)

FRA: fragile site; 10, chromosome 10; G: series letter

FRA10G

fragile site, aphidicolin type, common, fra(10)(q11.2) (see 14.6.4, Human Chromosomes)

6GPD: glucose-6-phosphate dehydrogenase (named for gene product)

6GPD

glucose-6-phosphate dehydrogenase

HBA: hemoglobin subunit alpha (named for gene product)

HBA1

hemoglobin subunit alpha 1

HCL: hair color (named for characteristic)

HCL1

hair color 1 (brown)

HLA (punctuation exception for HLA genes)

HLA-A

major histocompatibility complex, class 1, A

HOX: “homeobox” gene family

HOXA7

homeobox A7

IL: interleukin

IL2RA (no longer used: IDDM10)

interleukin 2 receptor subunit alpha (no longer used: insulin-dependent diabetes mellitus 10)

INS: insulin (named for gene product)

INS

Insulin

IP: interacting protein

SCHIP1

schwannomin interacting protein 1

L: “like” sequence

G6PDL

glucose-6-phosphate dehydrogenase—like

L (in this case, L at the end does not signify “like”); named for condition

CDL1

Cornelia de Lange syndrome 1

LG: ligand

CAMLG

calcium modulating ligand

LOH: loss of heterozygosity

LINC00312 (no longer used: LOH3CR2A)

long intergenic non—protein coding RNA 312 (no longer used: loss of heterozygosity 3, chromosomal region 2, gene A)

M: mitochondrial; RP, ribosomal protein

MRPL57 (previously MRP63)

mitochondrial ribosomal protein L57

MAG: melanoma antigen (named for condition and gene product)

MAGEA2

melanoma antigen, family member A2

MT: mitochondrial

MT7SDNA

mitochondrially encoded 7S DNA

MT: mitochondrial, used with hyphen (punctuation exception)

MT-RNR1

mitochondrially encoded 12S RNA

MY: myosin

MYH14 (no longer used: DFNA4)

myosin, heavy chain 14, nonmuscle (no longer used: deafness, autosomal dominant 4)

N: inhibitor

CDKN1B

cyclin-dependent kinase inhibitor 1B

orf (lowercase exception for open reading frame)

TMEM258 (no longer used: C11orf10)

transmembrane protein 258 (no longer used: chromosome 11 open reading frame 10)

P: “pseudogene”

HBAP1

hemoglobin subunit alpha pseudogene 1

P: does not always signify “pseudogene”

HIVEP2

human immunodeficiency virus 1 enhancer binding protein 2

PD: programmed cell death (named for function)

PD-1

programmed cell death 1 protein

PD-L: programmed cell death ligand (named for function)

PD-L1

programmed cell death 1 ligand 1

PDL-L: programmed cell death ligand (named for function)

PD-L2

programmed cell death 1 ligand 2

R: receptor

INSR

insulin receptor

R: receptor; L: like

INSRL

insulin receptor—like

REN: renin (named for gene product)

REN

renin

REN: renin (named for gene product); BP, binding protein

RENBP

renin binding protein

RG: regulator

TCIRG1

T-cell, immune regulator 1, ATPase, H+ transporting, lysosomal V0 subunit A3

TTR

TTR (transthyretin) (no longer used: CTS1)

transthyretin (no longer used: carpal tunnel syndrome 1)

TUB: tubulin (named for gene product)

TUBAC3

tubulin alpha 3Cα2-tubulin

ZNF: zinc finger protein

ZNF160

zinc finger protein 160

When a gene name or symbol has been changed, both the new and former names (the latter known as the previous name) are available in gene databases.5,6,8 Authors should use the most up-to-date name. The previous symbol may be included parenthetically at first mention:

CYP2A6 (previously CYP2A3)

SOD1 (previously ALS and ALS1)

ERBB2 (previously HER2/neu)

14.6.2.3 Glossary of Genomic Terms.

To help clinicians understand the latest developments in genetics so that they can make the most informed decisions for their patients, in 2017 JAMA began a series entitled Genomics and Precision Health. Associated with this ongoing series is a glossary of genomics terms. This may be accessed at https://sites.jamanetwork.com/genetics/#glossary.15

14.6.2.4 Writing About Genes: Italicizing Gene Symbols.

Observing the rule of italicizing gene symbols makes clear whether the writer is referring to a gene or to another entity that might be confused with a gene.

In any discussion of a gene, it is recommended that the approved gene symbol be mentioned at some point, preferably in the title and abstract if relevant. However, the gene symbol need not be mentioned every time the writer refers to the gene. Authors may refer to genes (or gene loci) by their official gene names or other descriptive expression. Any of these is acceptable, depending on context and syntax. Of names, descriptions, and symbols, only the gene symbol is italicized. Examples are given in Table 14.6-11.

Table 14.6-11. Examples of Expressions of Gene Symbols

Gene symbol

Gene description

Acceptable expression

BRCA1

breast cancer 1, early-onset gene

the breast and ovarian cancer susceptibility gene

CFTR

cystic fibrosis transmembrane conductance regulator gene

the cystic fibrosis locus

F8

coagulation factor VIII, procoagulant component (hemophilia A) gene

the factor VIII locus

F8

coagulation factor VIII, procoagulant component (hemophilia A) gene

the hemophilia A locus

SYN1

synapsin I gene

the gene for synapsin I

TP53

tumor protein p53 (Li-Fraumeni syndrome) gene

the TP53 gene (p53 is the alias term; the official term should be preferred to the alias)

In the foregoing examples, the gene names and descriptions are readily distinguishable from the gene symbols. Sometimes, however, the gene symbol may be easily confused with the abbreviation for the product or condition associated with the gene unless the gene symbol is italicized. See, for instance, Table 14.6-12.

Table 14.6-12. Examples of Potentially Confusing Nongene Terms

Gene

Potentially confusing nongene term

ABO

ABO blood group system (see 14.1, Blood Groups, Platelet Antigens, and Granulocyte Antigens)

APOE

apoE (apolipoprotein E)

EPO

erythropoietin (Epo)

GRIFIN

GRIFIN protein (galectin-related interfiber protein)

HLA-A, HLA-B, etc

HLA-A, HLA-B, etc (see 14.8.5, HLA/Major Histocompatibility Complex)

MS

multiple sclerosis (MS)

many hormone genes (eg, CRH, GHRH, GNRHR, PTH, TRH)

hormone name abbreviations (eg, CRH, GHRH, GNRH receptor, PTH, TRH)

In other expressions, italics distinguish different meanings:

HD

gene for huntingtin (protein), Huntington disease gene

HD

Huntington disease

Person with HD

person with Huntington disease

TH variant

variant of the TH gene

TH deficiency

deficiency of the enzyme TH

Therefore, it is best to make clear by italicizing gene symbols and through context whether the gene or another entity is being discussed.

Gene symbols do not immediately follow the term in the gene name that they might seem to abbreviate but rather should relate to the word gene, usually following it:

the guanylate cyclase 2D gene, GUCY2D (Not: the guanylate cyclase 2D [GUCY2D] gene)

the Huntington disease gene, HD

the tyrosine hydroxylase gene, TH

The cystic fibrosis transmembrane conductance regulator gene, CFTR, is implicated in cystic fibrosis.

In the following examples, both gene aliases and approved symbols are used; however, authors are encouraged to use the approved name (see 13.11, Clinical, Technical, and Other Common Terms):

the retinal guanylate cyclase 2D (GUCY2D) gene, GUCY2D

the retinal guanylate cyclase 2D (RetGC1) gene, GUCY2D (Not: the guanylate cyclase 2D [GUCY2D] gene)

In discussions of variants, the gene symbol remains italicized; specific variants, however, are not italicized (see 14.6.1, Nucleic Acids and Amino Acids):

ADRB2 46G>A

variant of the GUCY2D gene

variant of GUCY2D

GUCY2D variant

The objective of this study was to describe the phenotype in 4 families with dominantly inherited cone-rod dystrophy, 1 with an R838C variant and 1 with an R838H variant in the guanylate cyclase 2D gene (GUCY2D) encoding retinal guanylate cyclase 1.

LRP5v171: valine substitution at codon 171 of the LRP5 gene

In gene mapping, when the order of genes along the chromosome is known, the genes are listed from short-arm end (pter) to the centromere (cen) or long-arm end (qter) (see 14.6.4, Human Chromosomes).

pter-ENO1-PGM1-AMY1-cen

In gene mapping, when the order of genes along the chromosome is not known, the genes are listed alphabetically and parentheses are used:

pter-PGD-AK2-(ACTA,APOA2,REN)-qter

Table 14.6-13 presents some examples of gene names and symbols from fields covered elsewhere in this chapter.

Table 14.6-13. Gene Names and Symbols From Fields Covered Elsewhere in This Chapter

Approved gene symbol

Gene description

14.1, Blood Groups, Platelet Antigens, and Granulocyte Antigens

A4GALT

α-1,4-galactosyltransferase (P blood group)

ABO

ABO blood group (transferase A, α-1-3-N-acetylgalactosaminyltransferase; transferase B, α-1-3-galactosyltransferase)

ACHE

acetylcholinesterase (Cartwright blood group)

ACKR1 (was atypical DARC)

chemokine receptor 1 (Duffy blood group)

AQP1 (was CO)

aquaporin 1 (Colton blood group)

ART4 (was DO)

ADP-ribosyltransferase 4 (Dombrock blood group)

BCAM (was LU)

basic cell adhesion molecule (Lutheran blood group)

BSG

basigin (OK blood group)

C4A

complement 4A (Rodgers blood group)

C4B

complement 4B (Chido blood group)

CD44

CD44 molecule (Indian blood group)

CD151 (was MER2)

CD151 molecule (Raph blood group)

CR1

complement C3b/C4b receptor 1 (Knops blood group)

CD55 (was DAF)

CD55 molecule (Cromer blood group)

ERMAP (was SC)

erythroblast membrane-associated protein (Scianna blood group)

FUT1

fucosyltransferase 1 (H blood group)

FUT3

fucosyltransferase 3 (Lewis blood group)

GYPA

glycophorin A (MNS blood group)

GYPB

glycophorin B (MNS blood group)

GYPC

glycophorin C (Gerbich blood group)

GYPE

glycophorin E

ICAM4

intercellular adhesion molecule 4 (Landsteiner-Wiener blood group)

KEL

Kell blood group

P1

P blood group (P1 antigen)

RHCE

Rh blood group, CcEe antigens

RHD

Rh blood group, D antigen

SLC4A1

solute carrier family 4, member 1 (Diego blood group)

SLC14A1

solute carrier family 14, member 1 (Kidd blood group)

XG

Xg blood group

XK

Kell blood group precursor (McLeod phenotype)

14.2, Cancer (See 14.6.3, Oncogenes and Tumor Suppressor Genes)

ACTN1

α1-actinin, actin alpha 1

ACTN2

α2-actinin, actin alpha 2

BCL2

B-cell/CLL lymphoma 2

BCL7A

BCL tumor suppressor 7A

CCND1 (formerly BCL1)

cyclin D1

CDC2

cell division cycle 2, G1 to S and G2 to M

CDK2

cyclin-dependent kinase 2

CDKN1A

cyclin-dependent kinase inhibitor 1A

CTNNB1

catenin beta 1

MEN1

menin 1

RB1

RB transcriptional copressor 1

RET (formerly MEN2A, MEN2B)

ret proto-oncogene

TGFA

transforming growth factor alpha

TGFB1

transforming growth factor beta 1

TNF

tumor necrosis factor receptor superfamily

TNFRSF1A

TNF receptor superfamily member 1A

TP53

tumor protein p53

14.3, Cardiology

ANK2 (formerly LQT4)

ankyrin 2

APOA1

apolipoprotein AI

APOB

apolipoprotein B

APOC2

apolipoprotein C2

APOD

apolipoprotein D

APOE

apolipoprotein E

GPR1

G protein—coupled receptor 1

HDLBP

high-density lipoprotein-binding protein

KCNH2 (formerly LQT2)

potassium voltage-gated channel, subfamily H, member 2

KCNQ1 (formerly LQT)

potassium voltage-gated channel subfamily Q member 1

LDLR

low-density lipoprotein receptor

LPL

lipoprotein lipase

NOS1

nitric oxide synthase 1

NOS2

nitric oxide synthase 2

NOS2P2

nitric oxide synthase 2 pseudogene 2

NOS2P1

nitric oxide synthase 2 pseudogene 1

NOS3

nitric oxide synthase 3

PLAT

plasminogen activator, tissue type

SCN5A (formerly LQT3)

sodium voltage-gated channel alpha subunit 5

TNNC1

troponin C1, slow skeletal and cardiac type

TNNC2

troponin C2, fast skeletal type

TNNI1

troponin I1, slow skeletal type

TNNI2

troponin I2, fast skeletal type

TNNI3

troponin I3, cardiac type

TNNT1

troponin T1, slow skeletal type

TNNT2

troponin T2, cardiac type

TNNT3

troponin T3, fast skeletal type

VLDLR

very-low-density lipoprotein receptor

14.7, Hemostasis

A2M

α2-macroglobulin

CALM1

calmodulin 1

CCL5

chemokine (C-C motif), ligand 5

CLEC3B (was TNA)

C-type lectin domain family 3, member B

F2

coagulation factor II (thrombin)

F2R

coagulation factor II thrombin receptor

F2RL1

F2R-like trypsin receptor 1

F3

coagulation factor III, tissue factor

F5

coagulation factor V

F7

coagulation factor VII

F7R

coagulation factor VII regulator

F8

coagulation factor VIII

F8A1

coagulation factor VIII associated 1

F9

coagulation factor IX

F10

coagulation factor X

F11

coagulation factor XI

F12

coagulation factor XII

F13A1

coagulation factor XIII, A chain

F13A2

coagulation factor XIII, A2 polypeptide

F13B

coagulation factor XIII, B chain

FGA

fibrinogen, α chain

FGB

fibrinogen, β chain

FGG

fibrinogen, γ chain

FGL1

fibrinogenlike 1

FGL2

fibrinogenlike 2

GP5

glycoprotein V (platelet)

GP6

glycoprotein VI (platelet)

GP9

glycoprotein IX (platelet)

GP1BA

glycoprotein Ib, (platelet), alpha subunit

ICAM1

intercellular adhesion molecule 1

ICAM2

intercellular adhesion molecule 2

ITGA1

α1-integrin integrin subunit alpha 1

ITGA2

α2-integrin integrin subunit alpha 2

ITGA2B

integrin subunit alpha 2B

ITGA3

α3-integrin integrin subunit alpha 3

ITGA6

α6-integrin integrin subunit alpha 6

ITGAV

vitronectin, α polypeptide, antigen V

ITGB1

integrin subunit beta 1

ITGB3

integrin subunit beta 3

ITPKA

Inositol-triphosphate 3-kinase A

KLKB1

kallikrein B1

KNG1

kininogen 1

NOS3

nitric oxide synthase 3

PDGFA

platelet-derived growth factor subunit A

PDGFC

platelet-derived growth factor C

PDGFRA

platelet-derived growth factor receptor alpha

PDGFRL

platelet-derived growth factor receptor-like

PECAM1

platelet and endothelial cell adhesion molecule 1

PLAT

plasminogen activator, tissue type

PLAU

plasminogen activator, urokinase

PLAUR

plasminogen activator, urokinase receptor

PLG

plasminogen

PLGLA1

plasminogenlike A

PLGLB1

plasminogenlike B1

PPBP

proplatelet basic protein

PROC

protein C

PROS1

protein S

PROSP

protein S pseudogene

PROZ

protein Z, vitamin K—dependent plasma glycoprotein

PTGDR

prostaglandin D2 receptor

PTGDS

prostaglandin D2 synthase

PTGFR

prostaglandin F receptor

PTGFRN

prostaglandin F2 receptor inhibitor

PTGIR

prostaglandin I2 (prostacyclin) receptor

PTGIS

prostaglandin I2 synthase

PTGS1

prostaglandin-endoperoxide synthase 1

SELE

selectin E

SELP

selectin P

SERPINA1

serpin family A, member 1

SERPINC1

serpin family C, member 1

SERPINE1

serpin family E, member 1

SERPINF2

serpin family F, member 2

TBXA2R

thromboxane A2 receptor

TBXAS1

thromboxane A synthase 1

TFPI

transferrin pseudogene 1

TFPI2

tissue factor pathway inhibitor 2

THBD

thrombomodulin

VCAM1

vascular cell adhesion molecule 1

VWF

von Willebrand factor

VWFP

von Willebrand factor pseudogene 1

14.8, Immunology

14.8.1, Chemokines

CCL1

C-C motif chemokine ligand 1

CX3CL1

C-X3-C motif chemokine ligand 1

CXCL1

C-X-C motif chemokine ligand 1

PF4

platelet factor 4

XCL1

X-C motif chemokine ligand 1

14.8.2, CD Cell Markers

CD14

CD14 molecule

CD19

CD19 molecule

CD1A

CD1a molecule

CD3D

CD3D molecule

CD46

CD46 molecule

CD55

CD55 molecule (Cromer blood group)

CD6

CD6 molecule

CD79A

CD79A molecule

CD97

CD97 molecule

CR1

complement C3b/C4b receptor type 1 (Knops blood group)

FCGR3A

Fc fragment of IgG receptor IIIa

ICAM3

intracellular adhesion molecule 3

MME

membrane metalloendopeptidase

14.8.3, Complement

C1QA

complement C1q A chain

C1QB

complement C1q B chain

C1QBP

complement C1q binding protein

C1R

complement C1r

C1S

complement C1s

C2

complement C2

C3

complement C3

C4A

complement C4a (Rodgers blood group)

C4B

complement C4b (Chido blood group)

C4BPA

complement component 4, binding protein alpha

C5

complement component C5

C5AR1

complement C5a receptor 1

C6

complement C6

C7

complement C7

C8A

complement C8, alpha chain

C8B

complement C8, beta chain

C9

complement C9

CD55 (was DAF)

CD55 molecule (Cromer blood group)

CFH

complement factor H

CFP

complement factor properdin

14.8.4, Cytokines

CRLF1

cytokine receptorlike factor 1

CRLF2

cytokine receptorlike factor 2

CSF1

colony-stimulating factor 1

CSF2

colony-stimulating factor 2

CSF3

colony-stimulating factor 3

CSF3R

colony-stimulating factor 3 receptor

EPO

erythropoietin

EPOR

erythropoietin receptor

GH1

growth hormone 1

GH2

growth hormone 2

GHR

growth hormone receptor

IFNA1

interferon alpha 1

IFNA2

interferon alpha 2

IFNB1

interferon beta 1

IFNG

interferon gamma

IFNW1

interferon omega 1

IL1A

interleukin 1 alpha

IL1B

interleukin 1 beta

IL1R1

interleukin 1 receptor type 1

IL1R2

interleukin 1 receptor type 2

IL1RAP

interleukin 1 receptor accessory protein

IL1RN

interleukin 1 receptor antagonist

IL2

interleukin 2

LEP

leptin

LEPR

leptin receptor

PRL

prolactin

SOCS1

suppressor of cytokine signaling 1

TGFA

transforming growth factor alpha

TGFB1

transforming growth factor beta 1

THPO

thrombopoietin

TNF

tumor necrosis factor

14.8.5, HLA/Major Histocompatibility Complex

HLA-A

HLA-A, major histocompatibility complex, class I, A

HLA-B

HLA-B, major histocompatibility complex, class I, B

HLA-C

HLA-C, major histocompatibility complex, class I, C

HLA-DMA

major histocompatibility complex, class II, DM alpha

HLA-DMB

major histocompatibility complex, class II, DM beta

HLA-DOA

major histocompatibility complex, class II, DO alpha

HLA-DOB

major histocompatibility complex, class II, DO beta

HLA-DPA1

major histocompatibility complex, class II, DP alpha

HLA-DQA1

major histocompatibility complex, class II, DQ alpha

HLA-DQB1

major histocompatibility complex, class II, DQ beta

HLA-DRA

major histocompatibility complex, class II, DR alpha

HLA-DRB1

major histocompatibility complex, class II, DR beta 1

HLA-E

major histocompatibility complex, class I, E

HLA-F

major histocompatibility complex, class I, F

HLA-G

major histocompatibility complex, class I, G

HLA-H

major histocompatibility complex, class I, H

HLA-J

major histocompatibility complex, class I, J

14.8.6, Immunoglobulins

IGHA1

immunoglobulin heavy constant alpha 1

IGHA2

immunoglobulin heavy constant alpha 2

IGHD

immunoglobulin heavy constant delta

IGHD1-1

immunoglobulin heavy diversity 1-1

IGHE

immunoglobulin heavy constant epsilon

IGHG1

immunoglobulin heavy constant gamma 1

IGHG2

immunoglobulin heavy constant gamma 2

IGHG3

immunoglobulin heavy constant gamma 3

IGHG4

immunoglobulin heavy constant gamma 4

IGHJ1

immunoglobulin heavy joining 1

IGHM

immunoglobulin heavy constant mu

IGHV1-2

immunoglobulin heavy variable 1-2

IGHV1-18

immunoglobulin heavy variable 1-18

IGKC

immunoglobulin kappa constant

IGKJ2

immunoglobulin kappa joining 2

IGKV1-5

immunoglobulin kappa variable 1-5

IGLC1

immunoglobulin lambda constant 1

IGLJ1

immunoglobulin lambda joining 1

IGLV10-54

immunoglobulin lambda variable 10-54

14.8.7, Lymphocytes

TRAC

T-cell receptor alpha constant

TRBC1

T-cell receptor beta constant 1

TRBC2

T-cell receptor beta constant 2

TRBV10-3

T-cell receptor beta variable 10-3

TRGC1

T- cell receptor gamma constant 1

TRGJ1

T-cell receptor gamma joining 1

TRGJ2

T-cell receptor gamma joining 2

TRDC

T-cell receptor delta constant

14.10, Molecular Medicine

APBA1

amyloid-β precursor protein binding family A, member 1

ADIPOQ

adiponectin, C1Q, and collagen domain containing

ADIPOR1

adiponectin receptor 1

ADIPOR2

adiponectin receptor 2

ACSL1

acyl-CoA synthetase long-chain family member 1

ADAMTS1

ADAM metallopeptidase with thrombospondin type 1 motif 1

AHCY

adenosylhomocysteine

AMD1

adenosylmethionine decarboxylase 1

AKT1

AKT serine/threonine kinase 1

ATP1A1

ATPase, Na+/K+ transporting subunit, alpha 1 polypeptide

BPGM

bisphosphoglycerate mutase

CALM1

calmodulin 1

CCAR1

cell division cycle and apoptosis regulator 1

CCPG1

cell cycle progression 1

CDK20

cyclin dependent kinase

CDC2

cyclin dependent kinase 2

CDK2

cyclin-dependent kinase 2

CDK7

cyclin-dependent kinase 7

CDKN1A

cyclin-dependent kinase inhibitor 1A

CDKN1C

cyclin-dependent kinase inhibitor 1C

CDKN2A

cyclin-dependent kinase inhibitor 2A

COASY

coenzyme A (CoA) synthetase

COX4I1

cytochrome c oxidase subunit 4I1

COX5B

cytochrome c oxidase subunit 5b

CRP

C-reactive protein

CYP1A2

cytochrome P450 family 1, subfamily A, member 2

DHFR

dihydrofolate reductase

DKK1

dickkopf WNT signaling pathway, inhibitor 1

ERBB2

erb-b2 receptor tyrosine kinase 2

FBP1

fructose bisphosphatase 1

FDX1

ferredoxin 1

FDX2

ferredoxin 2

FHIT

fragile histidine triad

GNA12

G protein subunit alpha 12

GNG2

G protein subunit gamma 2

GALNT1

polypeptide N-acetylgalactosaminyltransferase 1

G6PD

glucose-6-phosphate dehydrogenase

B3GALT1

beta-1,3-galactosyltransferase

CDKN2A

cyclin-dependent kinase inhibitor 2A

GFI1

growth factor independent 1 transcriptional repressor

GRB2

growth factor receptor-bound protein 2

GRIN1

glutamate ionotropic receptor, N-methyl-D-aspartate (NMDA) type, subunit 1

HBA1

hemoglobin type, subunit alpha 1

HBB

hemoglobin subunit beta

HMGCS1

3-hydroxy-3-methylglutaryl CoA synthase 1

IGF1

insulinlike growth factor 1

IGF1R

insulinlike growth factor 1 receptor (IGF-R1)

IKBKB

inhibitor of nuclear factor kappa B kinase, subunit beta

ITPKA

inositol-triphosphate 3-kinase A

MNAT1

CDK activating kinase assembly factor

MB

myoglobin

MCM2

minichromosome maintenance complex, component 2

NMNAT1

nicotinamide nucleotide adenyltransferase 1

NPY

neuropeptide Y

NPPA

natriuretic peptide

OGDH

oxoglutarate dehydrogenase

INPP5J

inositol polyphosphate-5-phosphatase J

PYY

peptide YY

RBBP4

RB binding protein 4

RNASE1

ribonuclease A family member 1 pancreatic

SFPQ

splicing factor proline and glutamine rich

SNCA

synuclein alpha

TAF1

TATA-box binding protein associated factor 1

TBP

TATA-box binding protein

THPO

thrombopoietin

TNFSF11

TNF superfamily member 11

TP53

tumor protein p53

UCP1

uncoupling protein 1

WNT1

Wnt family member 1

14.11, Neurology

ASIC2

acid sensing ion channel subunit 2

ACHE

acetylcholinesterase (Cartwright blood group)

ADORA1

adenosine A1 receptor

ADRA1A

adrenoreceptor alpha 1A

ADRB1

adrenoreceptor beta 1

BDNF

brain-derived neurotrophic factor

CACNA1A

calcium voltage-gated channel subunit alpha 1A

CHRM1

cholinergic receptor, muscarinic 1

CHRNA1

cholinergic receptor, nicotinic, alpha 1 subunit

CNTF

ciliary neurotrophic factor

COMT

catechol-O-methyltransferase

DRD1

dopamine receptor D1

EGF

epidermal growth factor

GABBR1

gamma-aminobutyric acid type B receptor subunit 1

GDNF

glial cell line—derived neurotrophic factor

GRIA1

glutamate inotropic receptor AMPA type, subunit 1

GRIN1

glutamate ionotropic receptor, NMDA type, subunit 1

HRH1

histamine receptor H1

HTR1A

5-hydroxytryptamine receptor 1A

ITPKA

inositol triphosphate 3-kinase A

KCNJ3

potassium voltage-gated channel, subfamily J, member 3

MAOA

monoamine oxidase A

NGF

nerve growth factor

NGFR

nerve growth factor receptor

NMB

neuromedin B

NOS1

nitric oxide synthase 1

NPY

neuropeptide Y

NPY1R

neuropeptide Y receptor Y1

NRTN

neurturin

NTF3

neurotrophin 3

NTS

neurotensin

NTSR1

neurotensin receptor 1

OPRD1

opioid receptor delta 1

OPRK1

opioid receptor kappa 1

OPRM1

opioid receptor mu 1

SIGMAR1

sigma nonopioid intracellular receptor 1

PCP2

Purkinje cell protein 2

SLC1A1

solute carrier family 1, member 1

SLC18A1

solute carrier family 18, member A1

SNAP25

synaptosomal-associated protein, 25 kDa

SNCA

synuclein alpha

TAC1

tachykinin, precursor 1

TAC3

tachykinin 3

TRPA1

transient receptor potential cation channel, subfamily A, member 1

TSNARE1

t-SNARE domain containing 1 (see 14.11, Neurology, for expansion)

VAMP1

vesicle-associated membrane protein 1

14.14.3 and 14.14.4, Virus and Prion Nomenclature

AAVS1

adeno-associated virus integration site 1

BNIP1

BLC2 interacting protein 1

CR2

complement component C3d receptor 2

CXADR

CXADR, Ig-like cell adhesion molecule

CXB3S

coxsackie virus B3 sensitivity

E11S

ECHO virus (serotypes 4, 6, 11, 19) sensitivity

GPR183

G protein—coupled receptor 183

EBVM1

Epstein-Barr virus modification site 1

EBVS1

Epstein-Barr virus integration site 1

HAVCR1

hepatitis A virus cellular receptor 1

RSF1

remodeling and spacing factor 1

LAMTOR5

late endosomal/lysosomal adaptor, MAPK and MTOR activator 5

HCVS

human coronavirus sensitivity

CCNT1

cyclin T1

HPV6AI1

human papillomavirus (type 6a) integration site 1

FOXN2

forkhead box N2

HV1S

herpes simplex virus type 1 sensitivity

ICAM1

intercellular adhesion molecule 1

MX1

MX dynam-like GTPase 1

PVR

poliovirus receptor

PRND

prion-like protein doppel

PRNP

prion protein

PRNPIP

prion protein interacting protein

PRNT

prion locus IncRNA, testis expressed

14.6.2.5 Alleles.

Alleles denote alternative forms of a gene. Alleles are often characterized by particular variant sequences (mutations). For variant sequence nomenclature see 14.6.1, Sequence Variations, Nucleotides.

Because alleles are alternative forms of a particular gene, they are expressed by means of both the gene name or symbol and an appendage that indicates the specific allele.

Classically, allele symbols consist of the gene symbol plus an asterisk plus the italicized allele designation.7 For example:

HBB*S S  allele of the HBB gene

As with gene terms, Greek letters are changed to Latin letters in allele terms:

APOE*E4  allele producing the ε4 type of apolipoprotein E

See HGNC guidelines for Greek to Latin alphabet conversion.16 If clear in context, the allele symbol may be used in a shorthand form that omits the gene symbol and includes only the asterisk and the allele designation that follows. For example:

*S

*E4

In the case of alleles of the major histocompatibility locus, which are not italicized (see 14.8.5, HLA/Major Histocompatibility Complex), each HLA allele name has a unique number corresponding to up to 4 sets of digits separated by colons.17 The digits before the first colon describe the type (this often corresponds to the serologic antigen carried by an allotype). The next set of digits list the subtype (numbers are assigned in the order in which DNA sequences have been determined). A portion of the gene name is usually included in the shortened form:

Full name: HLA-DRB1:03:01

Shortened form: DRB1:03:01

In practice, common or trivial names for alleles, which take various forms, are used. The same allele is often expressed in different ways that diverge from the recommended nomenclature. For example:

s: short allele of serotonin transporter gene (SLC6A4)

l: long allele of SLC6A4

As another example of common allele names, the following expressions are all used for APOE*E4; follow author preference:

ε4 allele

epsilon 4 allele

E4 allele

APOE*4

apo e4

APOEE4

14.6.2.5.1 Genotype and Phenotype Terminology.

The genotype comprises the set of alleles in an individual. Because individuals almost always have 2 of each autosome (nonsex chromosome) (see 14.6.4, Human Chromosomes), individuals have 2 alleles (which may be the same alleles or 2 different alleles) for each autosomal gene.

The simplest genotype term for an individual would describe 1 gene and consist of the names of 2 alleles. Larger genotypes would contain 2 or more allele symbol pairs.

As originally formulated in ISGN, allele groupings may be indicated by placement above and below a horizontal line or on the line. As seen in the following examples (from Shows et al2,3), such placement, as well as order, spacing, and punctuation marks (virgules [/], semicolons, spaces, and commas), has specific meanings.

Alleles of the same gene are indicated by placement above and below a horizontal line or with a virgule:

ADA*1 or ADA*1/ADA*2

ADA*2

In theoretical discussions when a single letter is substituted for the allele symbol, the line or virgule may be dispensed with:

AA

Aa

aa

ss

ll

sl

Semicolons separate pairs of alleles at unlinked loci:

ADA*1, ADH1*1, AMY1*A

ADA*2 ADH1*1 AMY1*B

or

ADA*1/ADA*2; ADH1*1/ADH1*1; AMY1*A/AMY1*B

or

ADA*1/*2; ADH*1/*1; AMY1*A/*B

A single space separates alleles together on the same chromosome from alleles together on another chromosome (phase [assignment of alleles of genes on the same or different chromosomal copy] known):

AMY1*A PGM1*2

AMY1*B PGM1*1

or

AMY1*A PGM1*2/AMY1*B PGM1*1

Commas indicate that alleles above and below the line (or on either side of the virgule) are on the same chromosome pair but not on which chromosome of the pair specifically (phase unknown):

PGM1*1, AMY1*A

PGM1*2 AMY1*B

or

PGM1*1/PGM1*2, AMY1*A/AMY1*B

A special form for hemizygous males is

G6PD*A/Y

When genotype is being expressed in terms of nucleotides (eg, a polymorphism), italics and other punctuation are not needed (see 14.6.1, Nucleic Acids and Amino Acids):

MTHFR 677 TT genotype

CC genotype

the “long/short” (5HTTLPR) polymorphism in SLC6A4

(LPR: length polymorphism region)

When the subject is being described in terms of the 2 possible amino acids at 1 position in the protein owing to a single-nucleotide variation (formerly single-nucleotide polymorphism) (nonsynonymous mutation), the corresponding amino acids are separated by a virgule (see 14.6.1, Nucleic Acids and Amino Acids):

Val/Val

(homozygous)

Met/Val

(heterozygous)

Met/Met

(homozygous)

Such terms should be explained at first mention with the amino acid terms expanded:

the common methionine/valine (Met/Val) polymorphism at codon 129

The virgule is not needed in expressions such as the following:

α1-antitrypsin MZ heterozygotes

individuals with the ZZ phenotype

The phenotype is the collection of traits in an individual that result from his or her genotype. Genotypes usually contain pairs of symbols, whereas phenotypes contain single symbols. When phenotypes are expressed in terms of the specific alleles, the phenotype term derives from the genotype term, but no italics are used, and, instead of asterisks, spaces are used.18

Genotype: ADA*1/ADA*1

Phenotype: ADA 1

Genotype: ADA*1/ADA*2

Phenotype: ADA 1, 2

Genotype: C2*C/C2*QO

Phenotype: C2 C, QO

The normal allele of a gene is identified by adding *N. Adding *D or *R to a gene symbol designates a dominant or recessive allele, respectively.

Genotype: CFTR*N/CFTR*R

Phenotype: CFTR N

Principal Author: Cheryl Iverson, MA

Acknowledgment

Thanks to the following for reviewing and providing comments: W. Gregory Feero, MD, PhD, JAMA, and Maine-Dartmouth Family Medicine Residency, Augusta; Trevor Lane, MA, DPhil, Edanz Group, Fukuoka, Japan; and John J. McFadden, MA, JAMA Network.

References

1.Klinger HP. Progress in nomenclature and symbols for cytogenetics and somatic- cell genetics. Ann Intern Med. 1979;91(3):487-488. doi:10.7326/0003-4819- 91-3-487

2.Shows TB, Alper CA, Bootsma D, et al. International system for human gene nomenclature (1979). Cytogenet Cell Genet. 1979;25(1-4):96-116. doi:10.1159/000131404

3.Shows TB, McAlpine PJ, Boucheix C, et al. Guidelines for human gene nomenclature: an international system for human gene nomenclature (ISGN, HGM9). Cytogenet Cell Genet. 1987;46(1-4):11-28. doi:10.1159/000132471

4.Rangel P, Giovannetti J. Genomes and Databases on the Internet: A Practical Guide to Functions and Applications. Horizon Scientific Press; 2002.

5.HUGO Gene Nomenclature Committee. Accessed July 31, 2019. https://www.genenames.org/

6.Gray KA, Yates B, Seal RL, Wright MW, Bruford EA. Genenames.org: the HGNC resources in 2015. Nucleic Acids Res. 2015;43(database issue):D1079-D1085. doi:10.1093/nar/gku1071

7.Wain HM, Bruford EA, Lovering RC, Lush MJ, Wright MW, Povey S. Guidelines for human gene nomenclature (2002). Genomics. 2002;79(4):464-470. doi:10.1006/geno.2002.6748

8.Entrez Gene. Accessed January 9, 2018. https://www.ncbi.nlm.nih.gov/gene

9.Locus Reference Genomic (LRG). Accessed July 23, 2019. https://www.lrg-sequence.org

10.Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledge base of human genes and genetic disorders. Nucl Acids Res. 2005;33(database issue):D514-D517. doi:10.1093/nar/gki033

11.Online Mendelian Inheritance in Man (OMIM). Updated July 22, 2019. Accessed July 23, 2019. https://omim.org

12.OMIM Frequently Asked Questions. Accessed July 23, 2019. https://omim.org/help/faq

13.HGNC. FAQs about gene nomenclatures. Accessed January 9, 2018. https://www.genenames.org/help/faq/

14.GenBank. Updated November 2017. Accessed July 23, 2019. https://www.ncbi.nlm.nih.gov/genbank/

15.Glossary of genetic terms. Accessed July 31, 2019. https://sites.jamanetwork.com/genetics/#glossary

16.HGNC guidelines. Table 1: Greek to Latin alphabet conversion. Accessed July 23, 2019. https://www.genenames.org/about/guidelines

17.Nomenclature for factors of the HLA system. Updated June 7, 2018. Accessed July 23, 2019. https://www.hla.alleles.org/nomenclature/naming.html

18.Pasternak JJ. An Introduction to Human Molecular Genetics: Mechanisms of Inherited Disease. 2nd ed. Published January 27, 2005. Accessed June 13, 2018. http://www.wiley.com/WileyCDA/WileyTitle/product_Cd0471474266.html

14.6.3 Oncogenes and Tumor Suppressor Genes.

Oncogenes and tumor suppressor genes are 2 of the main types of genes that play a central role in cancer. “An important difference between oncogenes and tumor suppressor genes is that oncogenes result from the activation (turning on) of proto-oncogenes, but tumor suppressor genes cause cancer when they are inactivated (turned off).”1

14.6.3.1 Oncogenes.

An oncogene is a “mutated gene that contributes to the development of a cancer. In their normal, unmutated state, oncogenes are called proto-oncogenes, and they play a role in the regulation of cell division.”2 Oncogenes were discovered and characterized in viruses and animal experimental systems. These genes exist widely outside the systems in which they were discovered, and their normal cellular homologues are important in cell division and differentiation.

Human oncogenes should be expressed according to the style for human gene symbols (see 14.6.2, Human Gene Nomenclature). Mouse oncogenes (and other nonhuman oncogenes) should be expressed according to style for mouse gene symbols (see 14.6.5, Nonhuman Genetic Terms). Retroviral oncogenes are expressed in a style typical of microbial genes (see 14.6.5, Nonhuman Genetic Terms), namely, 3 letters, italicized, lowercase. The protein products of the oncogenes (oncoproteins) typically use the same abbreviation as the oncogene but in roman type. In humans, the protein is all capitals; in mice, the protein has an initial capital. Some examples of human, mouse, and retroviral oncogenes appear in Table 14.6-14.

Table 14.6-14. Human, Mouse, and Retroviral Oncogenes

Retroviral oncogenes

Human gene homologue(s); mouse gene homologue(s)

Human protein product(s); mouse protein product(s); retroviral oncoprotein

Viral origin

abl

Human: ABL1, ABL2 Mouse: Abl1, Abl2

Human: ABL1, ABL2 Mouse: Abl1, Abl2 Retroviral: abl

Abelson murine leukemia

bcl-2

Human: BCL2 Mouse: Bcl2

Human: BCL2 Mouse: Bcl2 Retroviral: bcl

B-cell CLL/lymphoma 2

erba

Human: ERBB2, ERBB3, ERBB4 Mouse: Erbb2, Erbb3, Erbb4

Human: ERBB2, ERBB3, ERBB4 Mouse: Erbb2, Erbb3, Erbb4 Retroviral: erb

avian erythroblastic leukemia

ets

Human: ETS1, ETS2 Mouse: Ets1, Ets2

Human: ETS1, ETS2 Mouse: Ets1, Ets2 Retroviral: ets

avian erythroblastosis

fes

Human: FES Mouse: Fes

Human: FES Mouse: Fes Retroviral: fes

Gardner-Arnstein feline sarcoma

fms

Human: CSF1R (formerly FMS) Mouse: Csf1r (formerly Fms)

colony stimulating factor 1 receptor (CSF1R)

McDonough feline sarcoma

fos

Human: FOS, FOSB Mouse: Fos, Fosb

Human: FOS, FOSB Mouse: Fos, Fosb Retroviral: fos

FBJ murine osteogenic sarcoma

jun

Human: JUN, JUNB, JUND Mouse: Jun, Junb, Jund

Human: JUN, JUNB, JUND Mouse: Jun, Junb, Jund Retroviral: jun

avian sarcoma 17

kit

Human: KIT Mouse: Kit

Human: KIT Mouse: Kit Retroviral: kit

Hardy-Zuckerman feline sarcoma

mos

Human: MOS Mouse: Mos

Human: MOS Mouse: Mos Retroviral: mos

Moloney sarcoma

myb

Human: MYB Mouse: Myb

Human: MYB Mouse: Myb Retroviral: myb

avian myeloblastosis

myc

Human: MYC Mouse: Myc

Human: MYC Mouse: Myc Retroviral: myc

avian myelocytomatosis

raf

Human: RAF1, ARAF, BRAF Mouse: Raf1, Araf, Braf

Human: RAF1, ARAF1, BRAF Mouse: Raf1, Araf, Braf Retroviral: raf

3611 murine leukemia

ras

Human: family with many human homologues, eg, HRAS, NRAS, RAB9A, RRAS, RRAS2 Mouse: Hras1, Nras, Rab9, Rras, Rras2

Human: HRAS1, NRAS, RAB9A, RRAS, RRAS2 Mouse: Rab9a, Rras, Rras2, Hras, Nras, Rab9 Retroviral: ras

retrovirus-associated DNA sequence

sis

Human: PDGFB Mouse: Pdgfb

Human: PDGFB (platelet-derived growth factor, B chain) Mouse: Pdgfb Retroviral: sis

simian sarcoma

src

Human: SRC Mouse: Src

Human: SRC Mouse: Src Retroviral: src

Rous sarcoma

a See 14.6.3.1.1, ERBB2 and HER2/neu.

Examples of use are as follows:

ras activation and inactivation

protein derived from the ras gene, ras, functions as a signaling molecule

Commonly, the oncogene term contains a prefix that indicates the source or location of the gene: v- for virus or c- for the oncogene’s cellular or chromosomal counterpart. The c- form is also known as a proto-oncogene and in standard gene nomenclature (see 14.6.2, Human Gene Nomenclature) is given in all capitals, as in the Human Gene Homologues column of Table 14.6-14 and the following examples. Note that the v and the c are set roman.

c-abl (ABL1)

c-mos (MOS)

v-abl

v-mos

The protein product may be similarly prefixed:

c-abl

c-mos

v-abl

v-mos

Additional prefixes may further identify oncogenes. Note that these prefixes are set roman and are hyphenated. Examples of expansions of some prefixes are given below, but it should not be inferred that the gene in question is associated only with the tumor for which it is named:

B-lym

B-cell lymphoma

L-myc

small cell lung carcinoma

N-myc

neuroblastoma

H-ras

Harvey rat sarcoma

K-ras

Kirsten rat sarcoma

N-ras

neuroblastoma

For example:

The K-ras mutation assay is more sensitive than the conventional histologic diagnosis in detecting minute cancer invasion around the superior mesenteric artery.

Numbers or letters designate genes in a series. For example:

K-ras-2

H-ras-1

erb-b2

14.6.3.1.1 ERBB2 and HER2/neu.

The oncogene known as HER2/neu, which stimulates the growth of breast cancer, is actually ERBB2. HER2 (from human epidermal growth factor receptor 2) and neu are the same as ERBB2 and are current aliases for ERBB2.3 Because the term HER2/neu is widely used and recognized, it may be included in parentheses after the first mention of ERBB2.

ERBB2 (formerly HER2 or HER2/neu)

14.6.3.1.2 Fusion Oncogenes and Oncoproteins.

The result of fusion of an oncogene and another gene is known as a fusion oncogene. The product of a fusion oncogene is a fusion oncoprotein. Terms for fusion oncogenes and their products may use traditional oncogene format or standard human gene format, as in the examples in Table 14.6-15.

Table 14.6-15. Examples of Terms for Fusion Oncogenes and Their Products

Fusion oncogene

Fusion oncoprotein

Expansion4

bcr-abl

BCR-ABL

fusion of the BCR and ABL genes

c-fos/c-jun

C-FOS/C-JUN

protein product of FOS and JUN proto-oncogenes

gag-onc

GAG-ONC

general term for fusion proteins of viral gag (group-specific antigen) gene and oncogene

gag-jun

GAG-JUN

general term for fusion proteins of viral gag (group-specific antigen) gene and oncogene, with JUN representing a specific oncogene

PML-RARA

PML-RARα

promyelocytic leukemia—retinoic acid receptor α

Example of use in text:

The BCR-ABL fusion oncoprotein is the key driver of pathogenesis in most cases of chronic myelogenous leukemia.

14.6.3.2 Tumor Suppressor Genes.

Tumor suppressor genes are “normal genes that slow down cell division, repair DNA mistakes, or tell cells when to die. . . .When tumor suppressor genes don’t work properly, cells can grow out of control, which can lead to cancer.”1 Examples are given in Table 14.6-16.

Table 14.6-16. Examples of Tumor Suppressor Genes and Their Products

Gene

Gene product (aliasa)

Expansion

CDKN1A

CDKN1A (p21a)

cyclin-dependent kinase (CDK) inhibitor 1A

CDKN1B

CDKN1B (p27a)

CDK inhibitor 1B

CDKN1C

CDKN1C (p57a)

CDK inhibitor 1C

DCC

DCC, a transmembrane receptor protein

deleted in colorectal carcinoma

GLTSCR1


glioma tumor suppressor candidate region gene 1

NF1

neurofibromin 1


RB1

Rb protein

retinoblastoma 1

TP53

TP53 (p53a)

a 53-kd protein

WT1

a zinc finger protein

Wilms tumor 1 (also called Wilms tumor protein)

a Although these gene symbol aliases or nicknames may still be used by some, use of the approved gene symbol, not the alias, is strongly preferred. Such use will minimize confusion and make it possible to provide links to genome databases for online versions of the article and to facilitate data retrieval in a number of databases. If an author insists on using an alias, provide the alias parenthetically after the approved gene symbol at first mention in text and abstract. This practice will link the two and provide a learning experience for those not yet familiar with the approved gene symbol.

Principal Author: Cheryl Iverson, MA

Acknowledgment

Thanks to the following for reviewing and providing comments: W. Gregory Feero, MD, PhD, JAMA, and Maine-Dartmouth Family Medicine Residency, Augusta, Maine; Trevor Lane, MA, DPhil, Edanz Group, Fukuoka, Japan; and John J. McFadden, MA, JAMA Network.

References

1.American Cancer Society. Oncogenes and tumor suppressor genes. Last revised June 25, 2014. Accessed July 31, 2019. https://www.cancer.org/cancer/cancercauses/geneticsandcancer/genesandcancer/genes-and-cancer-oncogenes-tumor-suppressor-genes.html

2.National Human Research Gene Institute Talking Glossary of Genetic Terms. Accessed June 6, 2018. https://genome.gov/glossary

3.V-ERB-B2 avian erythroblastic leukemia viral oncogene homolog 2; ERBB2. OMIM. Updated September 27, 2016. Accessed July 31, 2019. https://omim.org/entry/164870

4.NCI Dictionary of Cancer Terms. Accessed June 6, 2018. https://www.cancer.gov/publications/dictionaries/cancer-terms?cdrid=561237

14.6.4 Human Chromosomes.

Chromosomes are structures in the cell nucleus that contain short and long arms, joined at the centromere. They are composed of chromatin (chromatin is made up of DNA, RNA, and proteins) that carries genetic information (definition after Nussbaum et al1 and Turnpenny and Ellard2). Structural variation of chromosomes has traditionally been studied from the perspective of direct visualization of bands, using staining techniques. However, sophisticated fluorescent technologies, such as FISH (fluorescence in situ hybridization),3 are now widely in use to probe for structural variations (eg, deletions, duplications, and large-scale copy number variants, as well as insertions, inversions, and translocations)4 (see 14.6.4.4, In Situ Hybridization), leading to important gains in medical diagnosis and research, as well as gene ordering and mapping. Microarray technologies are increasingly being used to detect microdeletions, inversions, deletions, and so on. Sequencing technologies are making gains as well in being able to detect structural variation. Regardless of the development of these technologies, the essential purpose of cytogenetics remains the same: to study genomic organization and the structure, function, and evolution of chromosomes.

Translocations involve a segment of one chromosome being transferred to a nonhomologous chromosome or to a new site on the same chromosome. They are often associated with negative consequences, such as cancer.5

Structural variation in cancer is different from that seen in germline variation and is clearly related to pathogenesis in some cancers (eg, Philadelphia chromosome; see 14.6.4.5, Marker Chromosomes, Derivative Chromosomes, and the Philadelphia Chromosome).

Formalized standard nomenclature for human chromosomes dates from 1960 and, since 1978, has been known as the International System for Human Cytogenetic Nomenclature (ISCN).

Material in this section is based on recommendations in ISCN 2016.6

Human chromosomes are numbered from largest to smallest from 1 to 22. There are 2 additional chromosomes, X and Y. The numbered chromosomes are known as autosomes, and X and Y as the sex chromosomes. Chromosomes can also be grouped based on similar size and centromere position, as follows6(p8):

Group A

chromosomes 1-3

Group B

chromosomes 4, 5

Group C

chromosomes 6-12, X

Group D

chromosomes 13-15

Group E

chromosomes 16-18

Group F

chromosomes 19, 20

Group G

chromosomes 21, 22, Y

A chromosome may be referred to by number or by group:

chromosome 14

a group D chromosome

14.6.4.1 Chromosome Bands.

Chromosome bands are elicited by multiple staining methods; a band is “a part of a chromosome clearly distinguishable from adjacent parts by virtue of its lighter or darker staining intensity.”6(p9-10) Banding pattern terms in the left-hand column of the following list need not be expanded. Their technique or purpose is shown to the right of the banding pattern.

Q-banding, Q-bands

quinacrine

G-banding, G-bands

Giemsa

R-banding, R-bands

reverse Giemsa

C-banding, C-bands

constitutive heterochromatin

T-banding, T-bands

telomeric

NORs

nucleolus organizing regions

Banding technique codes of several letters provide more information about the banding method. These abbreviations must be expanded, but the letters in the list above (Q, G, R, C, T, NOR) within those terms need not be expanded:

QF

Q bands by fluorescence

QFQ

Q bands by fluorescence using quinacrine

CBG

C bands by barium hydroxide using Giemsa stain

Ag-NOR

NOR staining, silver nitrate technique

Figure 14.6-5 shows a chromosome illustrating bands and subbands at different levels of resolution.

Figure 14.6-5 Frequently Altered Chromosome Territories With Significant Associations to Other Territories in the Discovery Set (37 Associations)a

Image

aFrom Bredel et al.7

The short arm is designated by p, for petit, and the long arm by the next letter of the alphabet, q.6(p11) Arm designations follow the chromosome number:

17p

short arm of chromosome 17

3q

long arm of chromosome 3

Xq

long arm of the X chromosome

Expressions such as those on the left need not be expanded. It is incorrect to refer to chromosome arms as chromosomes:

Acceptable:

chromosome arm 17p

short arm of 17

17p

Not Acceptable:

chromosome 17p

Regions are determined by major chromosome band landmarks. Chromosome arms contain 1 to 4 regions, numbered outward from the centromere. The region number follows the p or the q:

4q3 region 3 of long arm of chromosome 4

The regions are divided into bands, also numbered outward from the centromere. Bands have subdivisions or subbands (these are seen only when the chromosomes are extended). The band number follows the region number, and the subband number follows a period after the band number. When a subband is further subdivided, the sub-subband number follows the subband number without a period or other intervening punctuation. A generic formula for the order shown (with punctuation or no punctuation indicated) is chromosome,arm,region[no punctuation]band[no punctuation].subband[no punctuation]sub-subband. Some examples illustrate this:

11q23

chromosome 11, long arm, band 23 (region 2, band 3)

11q23.3

band in above subdivided, resulting in subband 23.3

20p11.23

chromosome 20, short arm, sub-subband 11.23 (region 1, band 1, subband 2, sub-subband 3)

It is correct usage to refer to the previous expressions as “band 11q23,” “band 11q23.3,” and “band 20p11.23.”

The centromere is designated band 10, as in the following:

p10

(portion of centromere facing short arm)

q10

(portion of centromere facing long arm)

Visualization of genomic information by chromosome region in humans and other organisms is available at the National Center for Biotechnology Information Genome Data Viewer.8

14.6.4.2 Karyotype.

Karyotype is the chromosome complement of an individual, tissue, or cell line. Karyotype is expressed as the number of chromosomes in a cell, including the sex chromosomes, a description of the sex chromosome composition, and, whenever applicable, any chromosome abnormality.

The karyogram and the idiogram are graphic representations of karyotype. The karyogram is “a systemized array of the chromosomes”6(p7) that has been prepared using methods such as photomicrography. An idiogram is a “diagrammatic representation of a karyotype.”6(p7)

In karyotype expressions, the sex chromosomes, which should always be specified, are separated from the chromosome number by a comma, without an intervening space, as in the following examples:

46,XX

46 chromosomes (2 each of chromosomes 1-22 and 2 X chromosomes in human female karyotype)

46,XY

46 chromosomes (2 each of chromosomes 1-22, 1 X and 1 Y in human male karyotype)

45,X

45 chromosomes (2 each of chromosomes 1-22 and 1 X chromosome) (Turner syndrome)

47,XXY

47 chromosomes (2 each of chromosomes 1-22, 2 X chromosomes, and 1 Y chromosome) (Klinefelter syndrome)

47,XYY

47 chromosomes (2 each of chromosomes 1-22, 1 X chromosome, and 2 Y chromosomes)

69,XXX

69 chromosomes (3 each of chromosomes 1-22 and 3 X chromosomes)

A virgule (forward slash) is used to indicate more than 1 karyotype in an individual, tumor, cell line, and so on:

45,X/46,XX

Descriptions of autosomal chromosome abnormalities are presented after the sex chromosomes and listed in numerical order regardless of aberration type, separated from the sex chromosomes by a comma. For instance, the karyotype of a person with trisomy 21 (Down syndrome) with an extra chromosome 21 is specified as follows:

47,XX,+21

or

47,XY,+21

A karyotype description may contain both constitutional and acquired elements. For instance, the karyotype of a tumor cell from a person with trisomy 21 could show both the constitutional anomaly and an acquired neoplastic anomaly (eg, an acquired extra chromosome 8) and would be expressed as follows:

48,XX,+8,+21c

The lowercase c specifies that the trisomy 21 is constitutional, as distinguished from the acquired trisomy 8.

An individual with more than 1 karyotypic clone may have a mosaic (single-cell origin) karyotype or a chimera (multicell origin) karyotype, which should be specified with a 3-letter abbreviation at first mention of the karyotype. For example:

mos 45,X/46,XY

chi 46,XX/46,XY

Brackets indicate the number of cells observed in a clone:

chi 46,XX[25]/46,XY[10]

A double slash (virgule, forward slash), used in chimeras that result from bone marrow transplants, separates recipient and donor cell lines. Recipient karyotype precedes the double slash, donor karyotype follows the double slash, and either or both may be specified. For example:

46,XY[3]//

//46,XX[17]

46,XY[3]//46,XX[17]

Three cells from the male recipient were identified, along with 17 cells from the female donor.

For details on order in such expressions, consult ISCN 2016.6

Meiotic karyotypes may begin with a term such as MI and contain a haploid or near-haploid number of chromosomes and may (if the sex chromosomes are associated) or may not (if the sex chromosomes are separate) have a comma between X and Y:

MI,23,XY

MI,24,X,Y

14.6.4.3 Chromosome Rearrangements.

The abbreviations and symbols in Table 14.6-17 are used in descriptions of chromosomes, including chromosome rearrangements. The symbols in the list of chromosomes from ISCN 2016 are part of an efficient shorthand that describes the exact changes in a karyotype that contains rearranged chromosomes. In publications that range beyond the field of cytogenetics, the symbols should always be defined.

Table 14.6-17. Chromosome Rearrangement Abbreviations and Symbolsa

Abbreviation

Explanation

AI

first meiotic anaphase

AII

second meiotic anaphase

ace

acentric fragment

add

additional material of unknown origin

arr

microarray

b

break

c

constitutional anomaly

cen

centromere

cgh

comparative genomic hybridization

chi

chimera

chr

chromosome

cht

chromatid

cp

composite karyotype

cx

complex rearrangements

del

deletion

der

derivative chromosome

dia

diakinesis

dic

dicentric

dim

diminished

dip

diplotene

dis

distal

dit

dictyotene

dmin

double minute

dn (de novo)

chromosome abnormality not inherited

dup

duplication

E

exchange

end

endoreduplication

enh

enhanced

fem

female

fis

centric fission

fra

fragile site

G

gap

H

heterochromatin, constitutive

hsr

homogeneously staining region

I

isochromosome

idem

stemline karyotype in a subclone

ider

isoderivative chromosome

idic

isodicentric chromosome

inc

incomplete karyotype

ins

insertion

inv

inversion or inverted

ish

in situ hybridization

lep

leptotene

MI

first meiotic metaphase

MII

second meiotic metaphase

mal

male

mar

marker chromosome

mat

maternal origin

med

medial

min

minute acentric fragment

mos

mosaic

neo

neocentromere

nuc

nuclear or interphase

oom

oogonial metaphase

or

alternative interpretation

P

short arm of chromosome

PI

first meiotic prophase

pac

pachytene

pat

paternal origin

pcc

premature chromosome condensation

pcd

premature centromere division

prx

proximal

ps

satellited short arm of chromosome

psu

pseudo-

pvz

pulverization

q

long arm of chromosome

qdp

quadruplication

qr

quadriradial

qs

satellited long arm of chromosome

r

ring chromosome

rea

rearrangement

rec

recombinant chromosome

rev

reverse, including comparative genomic

rob

robertsonian translocation

roman numerals


 I

univalent structure

 II

bivalent structure

 III

trivalent structure

 IV

quadrivalent structure

s

satellite

sce

sister chromatid exchange

sdl

sideline

Sl

stemline

spm

spermatogonial metaphase

stk

satellite stalk

subtel

subtelomeric region

t

translocation

tas

telomeric association

ter

terminal end of chromosome or telomere

tr

triradial

trc

tricentric chromosome

trp

triplication

upd

uniparental disomy

var

variant or variable region

xma

chiasma(ta)

zyg

zygotene

:

break, in detailed system

::

break and reunion, in detailed system

;

separates altered chromosomes and break points in structural rearrangements involving 2 or more chromosomes; separates probes on different derivative chromosomes

from-to, in detailed system

+

additional normal or abnormal chromosomes; increase in length; locus present on a specific chromosome

loss; decrease in length; locus absent from a specific chromosome

~

intervals and boundaries of a chromosome segment or number of chromosomes, fragments, or markers

<>

angle brackets for ploidy

[]

square brackets for number of cells or genome build

=

number of chiasmata

×

multiple copies of rearranged chromosomes

?

questionable identification of a chromosome or chromosome structure

/

separates clones or contiguous probes

//

separates chimeric clones

a Adapted from McGowan-Jordan et al,6 with permission of S Karger AG.

Single-letter abbreviations combined with other abbreviations are set closed up:

chte  chromatid exchange

Three-letter symbols combined are set with a space:

cht del

chromatid deletion

psu dic

pseudodicentric

Chromosome rearrangement terms can be written using a short system or short form. Complex abnormalities are designated by the more specific detailed system or long form. The detailed form uses symbols such as arrows to describe individual derivative chromosomes that result from complex rearrangements (even the short system can result in a complex expression). For example:

Short: 46,XY,t(2;5)(q21;q31)

Long: 46,XY,t(2;5)(2pter→2q21::5q31→5qter;5pter→5q31::2q21→2qter)

The complete nomenclature, formulated for consistency in the description of chromosomal rearrangements, is detailed in ISCN 2016.6 The following sections contain terms that illustrate some of the basic principles of the ISCN. Terms such as these may stand alone or may be part of longer expressions such as those previously listed.

14.6.4.3.1 Order.

For aberrations that involve more than 1 chromosome, the sex chromosome appears first, then other chromosomes in numerical order (or, less commonly, in group order if only the group is specified).

t(X;13)(q27;q12)  translocation involving bands Xq27 and 13q12

For 2 breaks in the same chromosome, the short arm precedes the long arm, and there is no internal punctuation:

inv(2)(p21q31)  inversion in chromosome 2

Exceptions to numerical order convey special conditions; for example, when a piece of one chromosome is inserted into another (3-break rearrangement), the recipient chromosome precedes the donor:

ins(5;2)(p14;q21q31)  insertion of portion of long arm of chromosome 2 into short arm of chromosome 5

14.6.4.3.2 Plus and Minus Signs.

A plus sign preceding a chromosome indicates addition of the entire chromosome:

+14  entire chromosome 14 gained

A plus sign following p or q and the chromosome number indicates an addition to that chromosome:

14p+  addition to 14p

Such a term is ambiguous; it might refer to one of many possible specific additions to 14p of an individual karyotype, to an unknown addition to 14p, or to additions to 14p in general. A term such as 14p+ may be used after context has been provided. In the case of karyotype descriptions, this means using more specific terms that incorporate symbols, such as add, der, and ins:

Shorter Term: 14p+

Karyotype term: add(14)(p13)

Shorter Term: 14q+

Karyotype term: add(14)(q32)

For example:

The 14q+ cytogenetic abnormality was found to be add(14)(q32).

A minus sign preceding a chromosome signifies loss of the entire chromosome:

−5 all of chromosome 5 missing

A minus sign following a chromosome arm signifies loss from that arm, but this should be reserved for text, whereas more specific notation is used in karyotype descriptions. For example:

Text

Karyotype

5q−

del(5)(q13q31)

A deletion of the entire long arm of a chromosome should not be expressed in text with a minus sign.

del(5q) (not 5q−)

Use more specific terms in karyotypes.

14.6.4.3.3 Punctuation.

Parentheses: The number of the affected chromosome follows the rearrangement symbol in parentheses:

inv(2) inversion in chromosome 2

Details of the aberration follow in a second set of parentheses:

inv(2)(p13p24) inversion in chromosome 2 involving bands 13 and 24 of the short arm

Semicolon: In structural rearrangements that involve 2 or more chromosomes, a semicolon is used:

t(2;5)(q21;q31) translocation involving breaks at 2q21 and 5q31

Comma: Commas separate the chromosome number, sex chromosomes, and each term describing an abnormality:

46,XX,r(18)(p11q22) female karyotype with ring chromosome 18 with ends joined at bands p11 and q22

14.6.4.3.4 Underlining.

In different clones within the same karyotype, an underline (underscore) distinguishes homologous aberrations of the same chromosome (eg, 2 homologous chromosome 1s):

46,XX,der(1)t(1;3)(p34;q21)/46,XX,der(1)t(1;3)(p34;q21)

In manuscripts, authors should indicate that the underline is intended, so that it will not be set as italics, per typographic convention, in the published version.

14.6.4.3.5 Or.

The word or indicates “alternative interpretations of an aberration”6(p48) or alternative results (for instance, breaks that appear in consecutive bands using different techniques):

add(19)(p13 or q13)

add(10)(q22 or q23)

14.6.4.3.6 Spacing.

As seen in previous examples, there is no spacing between the elements of a karyotype description (except after mos and chi, between 2 or more 3-letter abbreviations [eg, cht del, rev ish enh], and before and after “or”).

14.6.4.3.7 Long Karyotypes.

Multiline karyotypes carry over from 1 line of text to the next with no punctuation other than that of the original expression (eg, no hyphen at the end of the first line), as in the following tumor karyotype:

46,XX,t(8;21)(q22;q22)[12]/45,idem,−X[19]/46,idem,

−X,+8[5]/47,idem,−X,+8,+9[8]

14.6.4.4 In Situ Hybridization.

Style for terms that describe karyotypes identified by means of this technique alone or along with cytogenetic analysis (traditional karyotyping techniques) is similar to that described above (see 14.6.1, Nucleic Acids and Amino Acids). Some symbol meanings may differ. Table 14.6-18 is adapted from ISCN 2016.6

Table 14.6-18. In Situ Hybridization Abbreviations and Symbolsa

Term

Explanation

amp

amplified signal

arr

microarray

cgh

comparative genomic hybridization

con

connected signals

dim

diminished

enh

enhanced

fib ish

extended chromatin/DNA fiber in situ hybridization

ish

in situ hybridization

nuc ish

nuclear or interphase in situ hybridization

pcp

partial chromosome paint

rev ish

reverse in situ hybridization

sep

separated signals

subtel

subtelomeric region

wcp

whole chromosome paint

;

separates altered chromosomes and break points in structural arrangements that involve >1 chromosome; separates probes on different derivative chromosomes

.

[period] separates various techniques

+

additional normal or abnormal chromosomes; increase in length; locus present on a specific chromosome

++

2 hybridization signals or hybridization regions on a specific chromosome

loss; decrease in length; locus absent from a specific chromosome

×

multiple copies of rearranged chromosomes; aberrant polyploidy clones in neoplasias; precedes number of signals seen; multiple copies of a chromosome or chromosomal region

a Adapted from McGowan-Jordan et al,6 with permission of S Karger AG.

Examples are as follows:

46,XY.ish del(22)(q11.2q11.2)(D22S75−)

47,XY,+mar.ish der(8)(D8Z1+)

(D22S75 refers to the probe for the DNA segment sequence D22S75; see 14.6.2, Human Gene Nomenclature.)

14.6.4.5 Marker Chromosomes, Derivative Chromosomes, and the Philadelphia Chromosome.

A marker chromosome “is a structurally abnormal chromosome that cannot be unambiguously identified or characterized by conventional banding cytogenetics”6(p70) and might be included in a karyotype as shown below:

47,XX,+mar

A structurally abnormal chromosome in which any part can be recognized is considered a derivative chromosome, defined as “a structurally rearranged chromosome generated either by a rearrangement involving two or more chromosomes or by multiple aberrations within a single chromosome.”6(p60)

A derivative chromosome is specified in parentheses, followed by the aberrations involved in the generation of the derivative chromosome. The aberrations are not separated by a comma. For instance,

der(1)t(1;3)(p32;q21)t(1;11)(q25;q13)

signifies a derivative chromosome 1 generated by 2 translocations, one involving the short arm with a break point in 1p32 and the other involving the long arm with a breakpoint in 1q25.

For example, Philadelphia chromosome is the name given to a particular derivative chromosome found in chronic myelogenous leukemia and some types of acute leukemia. The Philadelphia chromosome can be abbreviated as Ph chromosome or, if clear in context, Ph. Appendages, as in Ph1, Ph1, Ph1, or Ph′, are not necessary, and Ph is the preferred form. The Ph chromosome is the derivative chromosome 22 that results from the translocation t(9;22)(q34;q11.2) and may be described as follows:

der(22)t(9;22)(q34;q11.2)

The Ph chromosome is the result of a rearrangement that juxtaposes the oncogene ABL with the breakpoint cluster region gene BCR (see 14.6.2, Human Gene Nomenclature, and 14.6.3, Oncogenes and Tumor Suppressor Genes).

Principal Author: Cheryl Iverson, MA

Acknowledgment

Thanks to the following for reviewing and providing comments: W. Gregory Feero, MD, PhD, JAMA, and Maine-Dartmouth Family Medicine Residency, Augusta; Trevor Lane, MA, DPhil, Edanz Group, Fukuoka, Japan; and John J. McFadden, MA, JAMA Network. Thanks also to David Song, JAMA Network, for obtaining permissions.

References

1.Nussbaum RL, McInnes RR, Willard HF. Thompson & Thompson Genetics in Medicine. 8th ed. Saunders; 2016.

2.Turnpenny PD, Ellard S. Emery’s Elements of Medical Genetics. 14th ed. Churchill Livingstone; 2012.

3.Riegel M. Human molecular cytogenetics: from cells to nucleotides. Genet Mol Biol. March 2014:37(suppl 1):194-209.

4.Feuk L, Carson AR, Scherer SW. Structural variation in the human genome. Nat Rev Genet. 2006;7(2):85-97. doi:10.1038/nrg1767

5.O’Connor C. Human chromosome translations and cancer. Nature Educ. 2008;1(1):56.

6.McGowan-Jordan J, Simons A, Schmid M, eds. ISCN 2016: An International System for Human Cytogenetic Nomenclature (2016). S Karger AG; 2016.

7.Bredel M, Scholtens DM, Harsh GR, et al. A network model of a cooperative genetic landscape in brain tumors. JAMA. 2009;302(3):261-275. doi:10.1001/jama.2009.99

8.Genome Data Viewer. Accessed July 31, 2019. https://www.ncbi.nlm.nih.gov/genome/gdv/

14.6.5 Nonhuman Genetic Terms.

Comparative genome analysis has shown that eukaryote species share genes to a great extent.1 Therefore, similar or identical names designate the same gene across species whenever possible. Italicization of gene symbols is uniformly observed.

14.6.5.1 Vertebrates.

Animal gene symbols resemble human gene symbols (see 14.6.2, Human Gene Nomenclature).2,3 However, unlike human gene symbols, animal gene symbols typically use or include lowercase letters and punctuation marks.

Gene terms for the laboratory mouse (Mus musculus domesticus) and laboratory rat (Rattus norvegicus), often seen in medical publications because of the common use of those species in investigating diseases that affect humans, are prototypic of such style.

14.6.5.1.1 Mouse and Rat Gene Nomenclature.

Mouse and rat gene nomenclature guidelines were unified in 2003 by the International Committee on Standardized Genetic Nomenclature for Mice and Rat Genome and Nomenclature Committee.4

Mouse and rat gene symbols resemble human symbols in several respects.4,5 They are descriptive, short (typically 3-5 characters), and italicized. Symbols begin with letters not numbers. They contain roman letters in place of Greek letters and arabic numerals in place of roman numerals.

Mouse and rat gene symbols differ from human symbols in the use of lowercase letters. Symbols usually contain an initial capital. Capital letters within a mouse gene symbol may indicate the laboratory code (see 14.6.5.1.4, Laboratory Codes) or code for another species/vector. A symbol with all lowercase letters (ie, no initial capital) indicates a recessive trait. Mouse and rat gene symbols may contain hyphens and other punctuation.

The central source for mouse gene terms is the Mouse Genome Database,6 and for rats, RATMAP: Rat Genome Database3 (Box 14.6-1). Gene names and symbols may be verified by means of the search features at those sites.

Box 14.6-1. Resources/Websites for Nonhuman Species

Website (reference)

URL

Description

ArkDb2

Now closed. See Hu et al2

General genomics and proteomics databases: resources for human, goat, mouse, deer, rat, and horse genomes

RATMAP: Rat Genome Database3

https://rgd.mcw.edu/

Genetic, genomic, phenotype, and disease data generated from rat research; also provides access to corresponding human and mouse data for cross-species comparisons

MGI: Mouse Genome Informatics6

www.informatics.jax.org

Official names for mouse genes, alleles, and strains

FlyBase10

http://flybase.org

Database of Drosophila genes and genomes

WormBase12

https://www.wormbase.org/

Genetics, genomics, and biology of Caenorhabditis elegans and related nematodes

OMIA13

https://omia.org/

Catalog/compendium of inherited disorders, other traits, and genes in animal species other than human, mouse, and rat

SGD15

https://www.yeastgenome.org/nomenclature-conventions

Comprehensive integrated biological information for the budding yeast Saccharomyces cerevisiae

Entrez Genomes18

https://www.ncbi.nlm.nih.gov/genome

More than 3000 completely sequenced organisms, including Archaea, bacteria, eukaryotes, viruses, viroids, and plasmids

Maize Genetics and Genomics Database21

https://maisegdb.org/

Federally funded informatics service to researchers focused on the crop and plant and model organism Zea mays

Rice Genome Annotation Project22

rice.plantbiology. msu.edu/

National Science Foundation—sponsored database that provides sequence and annotations for the rice genome

SoyBase and the Soybean Breeder’s Toolbox23

https://soybase.org/

Repository for genetics, genomics, and related data sources for the soybean

Style rules and conventions for mouse and rat gene symbols are given in Tables 14.6-19 through 14.6-21. (Note: The gene descriptions in the tables that follow are based on but not identical to the approved gene names available in the Mouse Genome Informatics database,7 which are more complete and do not use Greek letters and other typographic variants. For instance, in searching for a term with α online, one would type “alpha.”) Note that a given letter or letter combination often but not always signifies conventional usage. For instance, l at or near the end of a symbol often, but not always, indicates “like.” Mammalian Orthology Markers (OrthoMaM),8 a database of orthologous mammalian markers, allows comparative searches of more than 40 vertebrate species. It can be queried to better understand the evolutionary dynamics of genes.

Table 14.6-19. Style Rules for Mouse Gene Symbols and Comparison With Human Gene Symbols (Examples)

Mouse gene symbol

Mouse gene description

Rule illustrated

Human gene symbol (when known)

a

nonagouti

lowercase initial capital because named for mutant recessive trait

ASIP

Afp

α-fetoprotein

initial capital, otherwise lowercase, Greek letter changed to roman

AFP

B2m

β2-microglobulin

no subscript

B2M

Gla

α-galactosidase

Greek letter changed to roman and moved to end of symbol

GLA

Gt(ROSA)26Sor

gene trap, ROSA 26, Philippe Sorianoa

parentheses may be used


Rn4.5s

4.5S RNA

period permissible


Rn5s

5S RNA

symbol does not begin with number

RN5S1@ (@ signifies gene family; see 14.6.2, Human Gene Nomenclature)

a The eponymous naming of genes is not uncommon.

Table 14.6-20. Examples of Mouse Gene Symbols Compared With Human Gene Symbols

Mouse gene symbol

Mouse gene description

Convention illustrated

Human gene symbol (when available)

Brca1

breast cancer 1

same as human symbol except for case

BRCA1

Cafq1

caffeine metabolism QTL 1

q: quantitative locus


C4bp-ps1

complement component 4 binding protein, pseudogene 1

-ps: pseudogene

C4BPB

D10Mit1

DNA segment, Chr 10, Massachusetts Institute of Technology 1

symbol for DNA segment identified only in the mouse; includes laboratory code (see 14.6.5.1.4, Laboratory Codes)


D17H21S56

DNA segment, Chr 17, human D21S56

H21 indicates DNA segment resides on human chromosome 21

D21S56

G6pdx

glucose-6-phosphate dehydrogenase X-linked

similar but not identical to human gene symbol

G6PD

Gna-rs1

guanine nucleotide binding protein, related sequence 1

-rs: related sequence

GNL1

Gtl10

gene trap locus 10

Gt: gene trap


Gt(ROSA)26Sor

gene trap ROSA 26, Philippe Soriano

vector in parentheses; laboratory code indicated (see 14.6.5.1.4, Laboratory Codes)


H2-Aa

histocompatibility 2, class II antigen A, α


HLA-DQA1

Hbb

hemoglobin β-chain complex

same as human symbol except for case

HBB

Hc9

heterochromatin, Chr 9

Hc: heterochromatin


Hras1

Harvey rat sarcoma virus oncogene 1

see 14.6.3, Oncogenes and Tumor Suppressor Genes

HRAS

Ighmbp2 (formerly nmd)

immunoglobulin heavy chain μ binding protein 2 (formerly neuromuscular degeneration)

name change with new information about gene

IGHMBP2

l17Wis9

lethal, Chr 17, University of Wisconsin 9

initial l: lethal


Lamb1-1

β1 laminin, subunit 1

hyphen separates 2 adjacent numbers

LAMB1

Lzp-s

P lysozyme structural

s: structural


mt-Rnr1

12S RNA, mitochondrial

mt: mitochondrial

MT-RNR1

Mcptl

mast cell protease—like

l: like


Nidd1, Nidd2, Nidd3, Nidd4

non—insulin-dependent diabetes mellitus 1, 2, 3, 4

same stem (root) for gene families


Nup160

nucleoporin 160

name change (formerly Gtl1-13)

NUP160

Rnr13

rRNA, chromosome 13 cluster



Tcrb

T-cell receptor β-chain


TRB@ (formerly TCRB; @ signifies gene family or cluster; see 14.6.2, Human Gene Nomenclature)

Tel10p

telomeric sequence, Chr 10, centromere end

Tel: telomere; 10: Chr 10; p: short arm


Tg(APOE)1Vln

transgene insertion 1, Fred Van Leuven

Tg: transgene; parenthetic material: inserted gene, in this case the human gene APOE; Vln: founder or “laboratory of” designation


Table 14.6-21. Conventions for Mouse Gene Symbols Identified in Collaborative Sequencing Efforts (Examples)a

Mouse gene symbol

Mouse gene description

Convention illustrated

Human gene symbol (when available)

0610005C13Rik

RIKEN cDNA 0610005C13 gene

RIKEN symbol assigned to sequence that does not match known genes in other species; Rik: RIKEN Institute, Japan


Cdc42ep3

CDC42 effector protein (rho GTPase binding) 3; formerly 3200001F04Rik

RIKEN symbol changed when gene identified in another organism

CDC42EP3

BC023055

cDNA sequence BC023055

BC indicates sequence from Mammalian Gene Collection of the National Institutes of Health

C10orf83

Aldob

aldolase 2, B isoform, formerly BC016435

Mammalian Gene Collection symbol changed when gene identified in another organism

ALDOB

AF179933

cDNA sequence AF179933

GenBank symbol for genes with no other information available in other organisms or sequencing efforts


Ppt2

palmitoyl-protein thioesterase 2, formerly AA672937 and 0610007M19Rik

GenBank sequence ID withdrawn when gene identified in other organism

PPT2

a See Database Identifiers for Genomic Sequences in 14.6.1, Nucleic Acids and Amino Acids.

14.6.5.1.2 Mouse Alleles.

A mouse allele symbol consists of a mouse gene term often, but not always, with a superscript. As with mouse gene terms, mouse allele terms are italicized.

Allele symbols can be verified within the records of a mouse gene:

■Search for the gene symbol at http://www.informatics.jax.org/marker

■Select the link for the gene symbol that has been located

■Under Phenotypes, select Phenotypic Diseases

Conventions and rules for mouse allele symbols are shown in Table 14.6-22.

Table 14.6-22. Rules and Conventions for Mouse Allele Terms (Examples)

Allele symbol

Allele name

Convention or rule illustrated

abn

abnormal

recessive trait, thus begins with lowercase; because there is no superscript indicating an allelic term, use context to clarify

Dbf

doublefoot

dominant trait, thus begins with capital; because there is no superscript indicating an allelic term, use context to clarify

Dnahc11iv

situs inversus viscerum allele of dynein, axon, heavy chain 11 gene

allele superscript designation is lowercase (recessive)

Ins2Akita

Akita allele of insulin 2 gene

allele superscript designation has initial capital (dominant)

Lama2dy-2J

dystrophia muscularis allele, Jackson 2, of α2-laminin gene (second allele discovered at the Jackson Laboratory)

laboratory code included in superscript (see 14.6.5.1.4, Laboratory Codes); hyphens used

MatpUw-dbr

underwhite dominant brown alleles of membrane-associated transporter protein gene

multiple alleles separated by hyphen in superscript

In a phenotype expression, a superscript plus sign indicates wild type, for example,

Nf1tm1Fcr/Nf1+

which indicates a phenotype with a mutant neurofibromatosis allele (targeted mutation 1, Fredrick Cancer Research and Development Center) and the wild-type neurofibromatosis allele.

14.6.5.1.3 Mouse Chromosomes.

Chromosome nomenclature is similar for mice and humans (see 14.6.4, Human Chromosomes). However, in mice, rearrangement terms are capitalized. The following listing and subsequent examples are from the International Committee on Standardized Genetic Nomenclature for Mice4:

Cen

centromere

Del

deletion

Df

deficiency

Dp

duplication

Hc

pericentric heterochromatin

Hsr

homogeneous staining region

In

inversion

Is

insertion

MatDf

maternal deficiency

MatDi

maternal disomy

MatDp

maternal duplication

Ms

monosomy

Ns

nullisomy

PatDf

paternal deficiency

PatDi

paternal disomy

PatDp

paternal duplication

Rb

robertsonian translocation

T

translocation

Tc

transchromosomal

Tel

telomere

Tet

tetrasomy

Tg

transgenic insertion

Tp

transposition

Ts

trisomy

UpDf

uniparental deficiency

UpDi

uniparental disomy

UpDp

uniparental duplication

As with human chromosomes, lowercase p represents the short arm and lowercase q the long arm. When specific chromosomes are referred to, the word Chromosome is capitalized (and abbreviated Chr after first mention), for example:

Human chromosome 1 shows extensive homology to several mouse chromosomes, especially Chromosome (Chr) 4 and Chr 1.

Chromosome anomaly symbols usually include a unique laboratory code (see 14.6.5.1.4, Laboratory Codes) and a series number, for example:

In5Rk

fifth inversion found by Roderick

T37H

37th translocation found at Harwell

Chromosome number appears in parentheses:

In(2)5Rk inversion in Chr 2

Semicolons separate numbers of chromosomes involved in translocations:

T(4;X)37H translocation involving Chr 4 and Chr X

Periods indicate the centromere in robertsonian translocations:

Rb(9.19)163H robertsonian translocation that involves Chr 9 and Chr 19

In insertions, the donor chromosome number comes first:

Is(7;1)40H insertion from Chr 7 to Chr 1

For further rules and conventions for chromosomes, see the Chromosome Nomenclature section of the Mouse Genome Informatics website.4

14.6.5.1.4 Laboratory Codes.

Laboratory registration codes appear as 1- to 5-letter symbols in animal genetic terms, including chromosomal, DNA locus, and mouse strain nomenclature (see below). Such codes help identify specific colonies, useful in genetic studies that can extend over many generations. Laboratory codes are registered with the Institute of Laboratory Animal Research at the National Academy of Sciences in Washington, DC.9 These codes uniquely identify an investigator, laboratory, or institution that produces or maintains an animal strain. Laboratory codes have initial capitals and appear without expansion. Examples are as follows:

Arb

Arthritis and Rheumatism Branch, National Institute of Arthritis and Musculoskeletal and Skin Diseases

Ddd

University of Durham, Drug Dependence Group

J

The Jackson Laboratory

Jr

John Rapp

Kyo

Kyoto University

Maar

Silvère van Maarel Leiden University Medical Center

McW

Medical College of Wisconsin

N

National Institutes of Health

Ty

Benjamin A. Taylor, The Jackson Laboratory

Wil

Jean Wilson, University of Texas

14.6.5.1.5 Mouse Strains.

Mouse strain names6 are registered at the Mouse Genome Informatics website. Mouse strain names are available at the International Committee Standardized Genetic Nomenclature for Mice database.4 (Rat strain names are registered at the Rat Genome Database.3)

Mouse strain names consist of capital letters or combinations of capital letters and numbers:

A

BXH

CBA

C57BL

FVB

HDA32

A few earlier strains have names that are entirely numeric, for example:

129

A substrain is indicated by a term following the strain name after a virgule, usually the laboratory registration codes (see above), for example:

129/J

A/J

atherosclerosis in CBA/J mice

FVB/N mice used as controls

A serial number may precede the laboratory code, such as the 10 before the J in this example:

C57BL/610J

(Note: The 6 belongs to the substrain name.)

Exceptions to the initial capital after the virgule exist in the case of 2 well-known strains (not substrains) of mouse:

BALB/c

C57BR/cd

Many standard laboratory mouse strains are derived from crosses dating back to the early 20th century or even older lines, and the names reflect abbreviations for characteristics:

A

albino

BALB

Bagg, albino

DBA

dilute, brown, nonagouti

However, mouse strain names are not expanded.

Strain names may be abbreviated using approved abbreviations, for example:

B

C57BL

C

BALB/c

Note that some abbreviations are the same as some names of different strains (eg, the strain C and the abbreviation C), so context must clarify. Additional abbreviations are available at the International Committee on Standardized Genetic Nomenclature for Mice and Rat Genome and Nomenclature Committee.4

Abbreviations and the letter X are used to indicate recombinant inbred strains (female parental strain first), for example:

CXB BALB/c x C57BL

Capital F followed by a number in parentheses may appear after a strain designation to indicate the number of inbred generations:

F(20) 20 inbred generations

For further guidelines on mouse strain nomenclature, see the Mouse Genome Informatics website.4

14.6.5.2 Invertebrates.

14.6.5.2.1 Drosophila melanogaster.

Gene symbols for the fruit fly Drosophila melanogaster are generally capital and lowercase and, for recessive phenotypes, all lowercase. This convention is also observed for gene names. Gene symbols may include punctuation.10 Nomenclature rules and symbol search are available at FlyBase10 (Box 14.6-1). Examples are as follows:

Ppi

Preproinsulinlike

SerT

Serotonin transporter

su(Hw)

Suppressor of Hairy wing

tRNA:S7:23Ea

Transfer RNA:ser7:23Ea (ser7: seventh isoform of serine; 23E: map position)

As with mouse alleles, Drosophila alleles are indicated with superscripts:

Hnr, Hnr2 (Henna gene, eye color—defective alleles)

14.6.5.2.2 Caenorhabditis elegans.

The gene symbols for this nematode (roundworm) (Box 14.6-1) consist of 3 lowercase letters, a hyphen, an arabic numeral (sometimes a decimal), and, sometimes, a roman numeral after a space11,12:

dpy-1

dpy-5 I

let-37 X

sir-2.1

Parentheses indicate mutation in the gene:

let-37(mn138)

Mutation symbols consist of 1- or 2-letter terms plus a number:

mn138

A characteristic of a mutation may be indicated by a 2-letter ending set in roman type:

hc17 ts (ts: temperature sensitive)

14.6.5.3 Online Mendelian Inheritance in Animals.

Online Mendelian Inheritance in Animals (OMIA) (Box 14.6-1) is the counterpart to Online Mendelian Inheritance in Man (OMIM; see 14.6.2, Human Gene Nomenclature)13,14 and includes a database of inherited disorders, other traits, and genes in animal species other than humans, mice, and rats.

14.6.5.4 Microorganism Gene Nomenclature.

14.6.5.4.1 Yeasts.

Gene symbols for the fungus Saccharomyces cerevisiae (Box 14.6-1) consist of 3 capital letters plus a number (or, occasionally, a number-letter) ending,15 for example:

ACT1

actin

CDC25

adenylate cyclase regulatory protein

COX5A

cytochrome c oxidase chain Va

This represents a change from earlier style in which all-lowercase symbols were used for loci named for recessive mutations and all-capital symbols for loci named for dominant mutations. Allele symbols still follow the case convention (ie, capital for dominant, lowercase for recessive).

14.6.5.4.2 Bacterial Gene Nomenclature.

Gene terms typically consist of an italicized lowercase 3-letter abbreviation often with an uppercase locus designator. The phenotype or encoded entity (eg, enzyme) is in all roman letters with an initial capital.16,17 See examples below.

araA

AraA (L-arabinose isomerase)

asr

Asr (acid shock protein)

imp (formerly ostA)

OstA (organic solvent intolerance; imp: increased membrane permeability)

katE

KatE (catalase)

soda

SodA (superoxide dismutase, manganese)

sodB

SodB (superoxide dismutase, iron)

The genetic nomenclature for bacteriophages is different from that for bacteria; there may be a separate convention for each phage.17

A number of bacterial genome databases are available on the internet. The National Center for Biotechnology Information sponsors Entrez Genomes18 (select Gene, then search for the gene in question) (Box 14.6-1).

Alleles are designated with a number after the uppercase letter or following a hyphen, when not assigned to a locus. Wild-type alleles are designated with a superscript plus sign, mutant phenotypes with a superscript minus sign:

ara+

araA1

ara-23

sodA1

14.6.5.4.3 Retroviral Gene Nomenclature.

HIV and other retroviruses contain 3 main structural genes and a number of regulatory genes19 (see 14.6.3, Oncogenes and Tumor Suppressor Genes):

Structural:

env

envelope gene

gag

group-specific core antigen gene

pol

polymerase gene

Regulatory:

nef

negative factor

rev

regulator of viral protein expression

tat

transactivator of viral transcription

vif

viral infectivity

vpr

viral protein R

vpu

viral protein U

vpx

viral protein X

Compare typographic style (Table 14.6-23) of gene names and their products (p stands for protein, gp for glycoprotein).

Table 14.6-23. Some Examples of Typographic Style of Gene Names and Their Products

Gene

Gene product (protein or polypeptide)

Protein products (examples)a

env

Env

gp41, gp120

gag

Gag

p6, p7, p17, p24

pol

Pol

p12, p32, p66/51

nef

Nef

p27

rev

Rev

p19

tat

Tat

p14

vif

Vif

p24

vpr

Vpr

p15

vpu

Vpu

p16

vpx

Vpx

p14

a A helpful resource for protein nomenclature is UniProt,20 a central resource for functional information on proteins, including amino acid sequence, protein name or description, taxonomic data, and citation information.

14.6.5.4.4 Plant Genetics.

Plants are extremely important food sources, and genetic alteration of plants is increasingly used to confer disease and pest resistance as well as to enhance the nutritional value of food crops. Such genetically modified organisms in food sources have generated controversy and relate to biomedicine. Included below are a few guidelines for 3 common food sources for which complete genome sequence data are available: corn (maize), rice, and soybeans.

Corn: The name and symbol of the gene should be lowercase and italic, eg, defective kernel12, dek12. Note: There is no hyphen between the gene name and the numerical suffix.21

Rice: A transcription unit, equivalent to a gene or locus, uses the naming scheme x.tyyyy, where x refers to the pseudomolecule assembly identifier and yyyy to the distinct identifier of the transcription unit.22

Soybeans: The full locus identifier can be used as part of each gene name, or a locus name can be provided separately to describe a set of genes, for example:

We studied Glyma.01g123450 in genotype, assembly, and annotation version Glyma.Wm82.a2.v1.

Thereafter, the shorter locus name, Glyma.01g123450, may be used.23

Principal Author: Cheryl Iverson, MA

Acknowledgment

Thanks to the following for reviewing and providing comments: W. Gregory Feero, MD, PhD, JAMA, and Maine-Dartmouth Family Medicine Residency, Augusta; Trevor Lane, MA, DPhil, Edanz Group, Fukuoka, Japan; and Garth D. Ehrlich, PhD, Center for Advanced Microbial Processing, Drexel University College of Medicine, Philadelphia, Pennsylvania.

References

1.Gene Ontology Consortium. Accessed June 7, 2018. http://www.geneontology.org/

2.Hu J, Mungall C, Law A, et al. The ARKdb: genome databases for farmed and other animals. Nucleic Acids Res. 2001;29(1):106-110. doi:10.1093/nar/29.1.106

3.RatMapGroup. RATMAP: Rat Genome Database. Accessed June 7, 2018. https://rgd.mcw.edu/

4.International Committee on Standardized Genetic Nomenclature for Mice and Rat Genome and Nomenclature Committee. Guidelines for nomenclature of mouse and rat strains. Revised January 2016. Accessed July 31, 2019. www.informatics.jax.org/mgihome/nomen/strains.shtml

5.Maltais LJ, Blake JA, Chu T, Lutz CM, Eppig JT, Jackson I. Rules and guidelines for mouse gene, allele, and mutation nomenclature: a condensed version. Genomics. 2002;79(4):471-474. doi:10.1006/geno.2002.6747

6.Jackson Laboratory. MGI: Mouse Genome Informatics. Updated May 29, 2018. Accessed July 31, 2019. www.informatics.jax.org

7.Mouse Genome Informatics. Mammalian Orthology Query Form. Accessed August 5, 2019. http://www.informatix.jax.org

8.The OrthoMaM (Orthologous Mammalian Markers) database. April 2015. Accessed August 5, 2019. http://www.orthomam.univ-montp2.fr/orthomam/html/index.php

9.ILAR: Institute for Laboratory Animal Research. International Laboratory Code Registry. Accessed August 5, 2019. http://dels.nas.edu/global/ilar/Lab-Codes

10.FlyBase: a database of Drosophila genes & genomes. Released May 3, 2018. Accessed July 31, 2019. http://flybase.org

11.C elegans genetic nomenclature basics. Last modified March 5, 2014. Accessed August 5, 2019. http://home.sandiego.edu/~cloerlab/nomenclature.html

12.WormBase. Last edited June 4, 2018. Accessed July 31, 2019. https://www.wormbase.org

13.Nicholas F. Online Mendelian Inheritance in Animals (OMIA). Updated May 31, 2018. Accessed July 31, 2019. http://omia.org/

14.Rangel P, Giovannetti J. Genomes and Databases on the Internet: A Practical Guide to Functions and Applications. Horizon Scientific Press; 2002.

15.SGD gene nomenclature conventions. Accessed July 31, 2019. https://www.yeastgenome.org/nomenclature-conventions

16.Demerec M, Adelberg EA, Clark AJ, Hartman PE. A proposal for a uniform nomenclature in bacterial genetics. Genetics. 1966;54(1):61-76.

17.Journal of Bacteriology instructions to authors. Updated January 2019. Accessed August 5, 2019. https://jb.asm.org/sites/additional-assets/JB-ITA.pdf

18.National Center for Biotechnology Information (NCBI). Entrez Genomes. Accessed July 31, 2019. https://www.ncbi.nlm.nih.gov/genome

19.Collins DR, Collins KL. HIV-1 accessory proteins adapt cellular adaptors to facilitate immune evasion. PLoS Pathogens. Published January 23, 2014. doi:10.1371/journal.ppat.1003851

20.UniProt. Updated 2018. Accessed June 7, 2018. www.uniprot.org

21.Maize Genetics and Genomics Database. Updated May 8, 2018. Accessed July 31, 2019. https://www.maizegdb.org/

22.Rice Genome Annotation Project. Accessed July 31, 2019. rice.plantbiology.msu.edu/

23.Soybase and the Soybean Breeder’s Toolbox. Accessed July 31, 2019. https://soybase.org/

14.6.6 Pedigrees.

Pedigree format recommendations are established by the Pedigree Standardization Task Force (now called the Pedigree Standardization Work Group) of the National Society of Genetic Counselors1,2 (see 5.8.3, Rights in Published Reports of Genetic Studies). The 2008 update recommends including on the pedigree the reason for referral (eg, abnormal findings on ultrasonography, family history of cancer).

A square represents a male individual; a circle, a female individual; and a diamond, an individual whose sex is not specified, a person with a congenital disorder of sex development, or a person who is transgender (Figure 14.6-6).2

Figure 14.6-6. Shapes Used to Represent an Individual in a Pedigree

Square indicates male; circle, female; and diamond, individual whose sex is not specified, a person with a congenital disorder of sex development, or a person who is transgender.

Image

Shading indicates an affected individual (Figure 14.6-7). Partitions with different shading should be used for individuals with more than one condition. Define all shading in a legend or key.

Figure 14.6-7. Use of Shading in a Pedigree

Image

Multiple individuals are indicated by a number inside the shape (Figure 14.6-8). For unknown number, a roman “n” is preferred to a question mark.

Figure 14.6-8. Indication of Number of Individuals in a Pedigree

Image

A slash mark (Figure 14.6-9) indicates a deceased individual.

Figure 14.6-9. Indication of a Deceased Individual in a Pedigree

Image

A pregnancy is indicated with a capital “P” inside the shape (Figure 14.6-10). Symbols would not be shaded unless the pregnancy was affected.

Figure 14.6-10. Indication of a Pregnancy in a Pedigree

Image

The proband (the first affected family member who seeks medical attention) is indicated by a capital “P” with an arrow outside the shape (Figure 14.6-11).

Figure 14.6-11. Indication of the Proband in a Pedigree

Image

The consultand (person seeking medical attention) is indicated with an arrow (Figure 14.6-12).

Figure 14.6-12. Indication of the Consultand (Person Seeking Medical Attention) in a Pedigree

Image

Textual information appears below the individual symbol (Figure 14.6-13). Preferred order is age information, evaluation, and pedigree number.

Figure 14.6-13. Indication of Textual Information About an Individual in a Pedigree

Image

An obligate carrier (ie, unaffected individual inferred by pedigree analysis to carry a trait) is indicated with a central dot (Figure 14.6-14).

Figure 14.6-14. Indication of an Obligate Carrier (Unaffected Individual Inferred by Pedigree Analysis to Carry a Trait) in a Pedigree

Image

A small triangle indicates a pregnancy not carried to term (Figure 14.6-15). Sex, if known, is indicated with text. (Sex is often unknown, especially with miscarriages.) Shading is used as described above for affected individuals. The symbol should be shaded only if the cause of the abnormality is known, and the abnormality should be defined in the key or under the symbol.

Figure 14.6-15. Indication of a Pregnancy Not Carried to Term in a Pedigree

ECT indicates ectopic pregnancy. A slash indicates termination of pregnancy.

Image

Stillborn individuals use full-sized shapes with SB in the caption (Figure 14.6-16).

Figure 14.6-16. Indication of Stillborn Individuals in a Pedigree

Image

Partner relationships are indicated by a straight, horizontal line (Figure 14.6-17). It is preferred that the male partner be shown on the left.

Figure 14.6-17. Indication of Partner Relationships in a Pedigree

Image

A vertical line (the line of descent) indicates the offspring (Figure 14.6-18).

Figure 14.6-18. Indication of the Line of Descent in a Pedigree

Image

Siblings should appear in order of birth (oldest to the left), connected by lines as shown in Figure 14.6-19.

Figure 14.6-19. Indication of Siblings in a Pedigree

Image

Offspring are indicated by vertical lines (Figure 14.6-20). Use of a shorter line to indicate a pregnancy not carried to term is no longer recommended because it is made redundant graphically by the use of a triangle for pregnancies not carried to term.

Figure 14.6-20. Indication of Offspring in a Pedigree

Image

An ended relationship is indicated by a double slash (Figure 14.6-21).

Figure 14.6-21. Indication of an Ended Relationship in a Pedigree

Image

Consanguinity (kinship because of common ancestry) is indicated by a double line (Figure 14.6-22), and the relationship should be noted (eg, first cousins, second cousins).

Figure 14.6-22. Indication of Consanguinity in a Pedigree

Image

Two diagonal lines indicate twins; 3, triplets (Figure 14.6-23). A horizontal bar specifies monozygotic; no horizontal bar, dizygotic; and a question mark, unknown.

Figure 14.6-23. Indication of Twins or Triplets in a Pedigree

A horizontal bar specifies monozygotic; no horizontal bar, dizygotic; and a question mark, unknown.

Image

No offspring is indicated by perpendicular lines; infertility, by perpendicular lines with a double horizontal line (Figure 14.6-24).

Figure 14.6-24. Indication of No Offspring or of Infertility in a Pedigree

Image

Brackets indicate an adopted individual and dashed lines legal parentage, for example, adoptive parent (Figure 14.6-25).

Figure 14.6-25. Indication of an Adopted Individual and of Legal Parentage in a Pedigree

Image

In pedigrees that show relationships defined by assisted reproductive technologies (Figure 14.6-26), D indicates donor (sperm or ovum) and S, surrogate carrier of the pregnancy.

Figure 14.6-26. Indication of Relationships That Are Defined by Assisted Reproductive Techniques in a Pedigree

Image

Diagonal lines indicate other parental relationships (Figure 14.6-27).

Figure 14.6-27. Indication of Other Parental Relationships in a Pedigree

Image

Haplotypes may be indicated with shaded rectangles below the individual (Figure 14.6-28). Meaning should be clarified by means of a key.

Figure 14.6-28. Indication of Haplotypes in a Pedigree

Image

In a complete pedigree (Figure 14.6-29), generations are indicated on the left by roman numerals. See Bennett3 for more examples of complete pedigrees.

Figure 14.6-29. Example of a Complete Pedigree, With Generations Indicated on the Left by Roman Numerals

Image

14.6.6.1 Deidentification of Pedigrees.

As noted in 5.8.3, Rights in Published Reports of Genetic Studies, the rules for ethical approval of studies, obtaining informed consent, and protecting patients’ rights to privacy in scientific publication also apply to genetic studies of family pedigrees. If appropriate consent cannot be obtained from those identified in pedigree charts, nonessential identifying information can be removed or not presented. However, data should not be altered or “scrambled” in an attempt to protect the identities of individuals or family members, although relevant information may be masked. As noted in 5.8.3, Rights in Published Reports of Genetic Studies, for example, in pedigree charts, diamonds or another sex-neutral symbol can be used instead of squares or circles if the sex of family members is not essential to the report (eg, if the disease or condition is known not to be sex linked), or sections of pedigrees may be excluded from pedigree charts or not described in detail if appropriate consent could not be obtained, as long as such omissions are noted.

Principal Author: Cheryl Iverson, MA

Acknowledgment

Thanks to the following for reviewing and providing comments: Robin L. Bennett, MS, CGC, Division of Medical Genetics, University of Washington, Seattle; Trevor Lane, MA, DPhil, Edanz Group, Fukuoka, Japan; and John J. McFadden, MA JAMA Network. Thanks also to Karen Bucher, JAMA, for preparing the illustrations.

References

1.Bennett RL, Steinhause KA, Uhrich SB, et al. Recommendations for standardized human pedigree nomenclature. Am J Hum Genet. 1995;56(3):745-752. Also published in J Genet Counseling. 1995;4(4):267-279.

2.Bennett RL, French KS, Resta RG, Doyle DL. Standardized human pedigree nomenclature: update and assessment of the recommendations of the National Society of Genetic Counselors. J Genet Counseling. 2008;17:424-433. doi:10.1007/s10897-008-9169-9

3.Bennett RL. Handy reference tables of pedigree nomenclature. In: The Practical Guide to the Genetic Family History. Wiley-Liss Inc; 1999.