Tuesday, July 5, 2011

Genome Update: Representing variation in the LRC on chr. 19q13.4


Human GRCh37 patch release 5 includes eight NOVEL patches representing different haplotypes in  the Leukocyte Receptor Complex (LRC) region on chromosome 19q13.4 (GL949746.1, GL949747.1, GL949748.1, GL949749.1, GL949750.1, GL949751.1, GL949752.1, GL949753.1). This region contains multiple clusters of genes belonging to the immunoglobulin superfamily, including killer immunoglobulin-like receptors (KIRs), leukocyte immunoglobulin-like receptors (LILRs) and leucocyte-associated immunoglobulin-like receptors (LAIRs). The LRC complex is of major importance in human disease across a wide context. Research efforts have focused in particular on the KIR cluster, since this ~150kb  region displays extensive haplotypic variation due to both differences in coding sequences and the presence or absence of particular loci. 

Several reports indicated problems with the representation of the LRC region in both NCBI36 and GRCh37. In GRCh37, one improvement was made when the NCBI36 chr. 19 unlocalized scaffold NT_113949.1, which contained a second representation of this region, was determined to be mis-assembled and was excluded from the assembly (tracked in HG-196). However, in both assembly versions, the chromosome 19 sequence for this variable region is derived from multiple clone libraries, suggesting a haplotype representation problem. On-going GRC efforts to replace this region of chromosome 19 in future assembly versions with a new single haplotype  from the CHORI-17 hydatidiform mole library are being tracked in HG-1079. The NOVEL LRC patches that have now been released provide partial representations of the LRC region for eight different haplotypes.

Four of the NOVEL patch LRC haplotypes are derived from the same PGF and COX cell lines that were used in the Major Histocompatibility Complex (MHC) project (7 haplotypes from the MHC project have already been incorporated into GRCh37 as alternate loci: GL000250.1, GL000251.1, GL000252.1, GL000253.1, GL000254.1, GL000255.1, GL000256.1). However, whilst PGF and COX are homozygous for the HLA region of the MHC, they are heterozygous for the KIR region of the MHC, and hence are represented here as PGF1 and 2, and COX1 and 2 (PMID:17092261). The other four LRC haplotypes, named s, t, j and i, are derived from a study by Traherne et al. that identified rare contracted KIR haplotypes in families of European origin (PMID: 19959527).

The sequence coverage of the s, t, j and i haplotypes is limited to the KIR region, whilst that of COX1/2 and PGF 1/2 extends in to the LILR and LAIR clusters. Corresponding manual gene annotation for each of these haplotypes has been generated as part of the Vega project.

Figure 1 (below): Alignment of the 8 LRC region NOVEL patches to GRCh37 chr. 19. The blue bars at top represent the tiling path of chr. 19 (NC_000019.9). Genes annotated on this sequence are shown in green. The gray tracks below represent the alignments: the thin horizontal lines indicate gaps, while the small vertical red bars indicate mismatches.