Clustering of Juvenile Canavan disease in an Indian community due to population bottleneck and isolation: genomic signatures of a founder event

Kotambail, Ananthapadmanabha; Selvam, Pavalan; Muthusamy, Karthik; Thomas, Maya; Sudhakar, Sniya Valsa; Ghati, Chetan; Danda, Sumita; Arunachal, Gautham

doi:10.1038/s41431-022-01198-4

Article
Published: 07 October 2022

Clustering of Juvenile Canavan disease in an Indian community due to population bottleneck and isolation: genomic signatures of a founder event

European Journal of Human Genetics volume 31, pages 73–80 (2023)Cite this article

392 Accesses
2 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Mild/juvenile Canavan disease (M/JCD) is less frequently reported in the literature and little is known about its pathogenetic mechanisms. We report a comprehensive investigation into the pathogenetic mechanism of a novel NM_000049.4(ASPA):c.526G>A variant in two families. The families belong to Telugu Devanga Chettiar community (TDC) from southern India. TDC has a complex history of migration from their historical origin centuries ago with high endogamy. TDC probably has the highest clustering M/JCD recorded historically (around 24 cases). The pathogenic variant was shown to cause non-classical splicing defect resulting in two different transcripts. The splicing aberration, a loss of function mechanism coupled with a milder missense effect can explain the milder phenotype compared to the infantile-onset CD. The high clustering of an extremely rare form of neurodegenerative disorder with reduced fitness, led us to speculate the possibility of a founder event. Genotyping array of TDC and multiple distinct populations of Indian origin for several population genetic parameters was performed. It yielded robust signatures of a founder event in TDC, such as a high fixation index, increased runs of homozygosity and identity-by-descent in the absence of consanguinity; a large haplotype with high linkage disequilibrium among markers comprising the pathogenic variant; a robust population structure; mutation dating, estimating the age of the potential founder of TDC at around 375 years; possibly a high carrier rate in TDC. This study has not only focused its attention on natural history and pathogenetics but also paves way for carrier screening programs in TDC and future therapeutic studies.

You have full access to this article via your institution.

Download PDF

Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries

Article Open access 30 April 2024

Genomic data in the All of Us Research Program

Article Open access 19 February 2024

A GGC-repeat expansion in ZFHX3 encoding polyglycine causes spinocerebellar ataxia type 4 and impairs autophagy

Article 29 April 2024

Introduction

Canavan disease (CD) is an autosomal recessive neurodegenerative disorder due to homozygous or compound heterozygous pathogenic variants in ASPA. Though it is pan ethnic, highest prevalence of cases is seen in Ashkenazi Jews. The overall prevalence of CD is 1 in 200,000 to 400,000 [1], whereas the prevalence in the Ashkenazi population is around 1 in 500 to 1 in 6700 [2, 3]. CD presents in early infancy by 3 to 6 months with evident macrocephaly and global developmental delay. As the disease advances, spasticity, seizures and optic atrophy sets in with early mortality. In contrast, the less common M/JCD presents with near normal period of infancy or only mild delay in motor milestones. Most children with M/JCD achieve walking, attend school and are normocephalic. M/JCD has a highly variable and protracted course with gradual development of neurological manifestations. It is established that, the severity of the presentation depends on the degree of residual activity of the aspartoacylase enzyme. Mildest cases with the highest residual activity, a clear genotype–phenotype correlation is currently possible [1].

We report here a comprehensive genetic investigation into the pathogenetics of two families (Fig. 1) hailing from a community of southern India with probably the highest clustering of M/JCD recorded historically, with a novel pathogenic variant. The probands belong to a community referred to as Telugu Devanga Chettiars. As per the historical records, this weaver community migrated during the expansion of Vijayanagara empire between 15th and 16th centuries from Andhra Pradesh and settled in Tiruchirappalli [4]. A second wave of migration began at about 150 years ago from Tiruchirappalli to a small geographical region near Tiruvannamalai in Tamil Nadu. Currently TDC has a population size of just over 3000, inhabiting three closely situated villages. The language spoken by TDC is Telugu, while the dominant language spoken in the district is Tamil. They are orthodox Hindus and customarily marry only within the community. Overall, their culture, linguistics, mating pattern and the geographical isolation has resemblance to an island population. Historically at least 24 such cases of M/JCD have been observed by the community within the last 3–4 generations with an average life span of 20–25 years. Currently around 10 such cases are alive, 3 of whom have been evaluated by us. The mating pattern is less than sufficient to explain the high rates of an extremely rare form of neurodegenerative disorder with reduced fitness. Hence, our objectives were; (1) to explore the mechanism underlying disease causation by the novel pathogenic variant and (2) explore the possibility of a founder event and subsequent endogamy leading to this high prevalence.

**Fig. 1: Pedigree of the family representing the generations, MRCA, and segregation of *ASPA* pathogenic variant.**

Materials and methods

Clinical presentation

Proband 1, currently 8-year-old male was referred to us at the age of 3 years for the evaluation of developmental delay. His best achieved developmental milestones at presentation were walking with support, immature pincer grasp and speaking bi-syllables. Antenatal and perinatal history was uneventful with a normal birth weight of 3.24 kg and head circumference of 36 cm. Neurological examination revealed normocephaly, horizontal nystagmus, hypotonia with diminished deep tendon reflexes and gait ataxia. Fundus examination showed no evidence of optic atrophy. Slow gain in development was observed during follow-up.

MR Brain imaging done at 18 and 27 months showed characteristic features of M/JCD; confirmed by an elevated N-acetyl aspartate peak in MRS (Supplementary Fig. 1). There was no interval progression between the scans. Urine gas chromatography/mass spectrometry analysis revealed a 15-fold elevation of N-acetyl aspartic acid levels.

Proband 2 is a 7-year-old male who was referred at the age of 1 year for evaluation of developmental delay. Antenatal and perinatal history was uneventful with a normal birth weight, and head circumference of 3 kg and 34.5 cm respectively. During his first visit, mild lag in all developmental domains were present (developmental age ~10–11 months). On examination, growth parameters were within normal range including head circumference. Nervous system examination showed no deficits. During follow-up at 2 years of age, ataxia of gait and stance was observed. Deficits in hand coordination was observed at 3.5 years of age. MRI Brain showed similar findings as proband 1.

Mutation analysis

Sanger sequencing of the coding exons of the ASPA was performed in the probands and family members using BigDye Terminator v3.1 kit and ABI 3500 Genetic Analyzer following standard manufacturer's protocol (Supplementary Fig 2). Various in silico analyses tools were used and the variant was categorized according to the American college of medical genetics (ACMG) guidelines [5]. Whole exome sequencing (WES) was performed for both the probands using Twist core exome kit (Twist Biosciences, USA) and sequenced on Illumina NextSeq 550 sequencer. The data was analyzed using Sentieon and VarSeq (Golden Helix, Inc, USA).

cDNA studies

Total RNA was isolated from the patient-derived fibroblast cell line [6]. cDNA was synthesized using RevertAid first strand cDNA synthesis kit (Thermo Fisher Scientific, Inc., USA). Sanger sequencing was performed using primers for exons 2–5 (Supplementary Table S6).

Genome-wide genotyping array

A total of 14 individuals from 2 families of TDC (Fig. 1) and 20 healthy control individuals were genotyped. Controls included were (i) TML, Tamil speaking population sharing the same geographical locality as TDC (ii) north Indians (NOI) natively speaking Indo-European languages, residing in the northern part of India and (iii) local population with second or third-degree consanguineous parentage (CSG). Genotyping was performed using Infinium HumanCytoSNP-12 v2.1 BeadChip as per the manufacturer’s protocol (Illumina, Inc., USA). Genotype data were extracted and were exported as PLINK.ped/.map files [7, 8]. Following QC steps, a total of 299,140 markers were considered. Of which, only autosomal markers were considered for further downstream analysis. After removing markers from sex chromosomes, a total of 275,544 remained. Furthermore, we extracted data from the 1000 Genomes Project [9] for Indian Telugu in the UK (ITU, n = 20), Gujarati Indians in Houston (GIH, n = 20), and Yoruba in Ibadan (YRI, n = 20) populations for the corresponding markers which were genotyped in-house. SNPs with missing genotypes in more than 10% individuals were excluded. This resulted in final set of 261,599 markers.

Estimation of fixation index and runs of homozygosity

Fixation index (F) with LD pruned (with r² threshold of 0.3) data and runs of homozygosity (ROH) were calculated by PLINK1.9 [8]. LD pruning was not performed for ROH estimation, as our aim was to investigate not only parental relatedness but also to determine any occurrence of population bottleneck and isolation [10]. The threshold size of more than 500 kb was considered, as this largely eliminates the effect of linkage disequilibrium (LD) and therefore the frequent short ROH due to haplotype structure [10]. ROH so estimated were further subdivided into 3 length categories i.e., 0.5–1.99 Mb (class I ROH), 2–4.99 Mb (class II ROH) and >5 Mb (class III ROH). This subdivision is based on the knowledge of the role of diverse population genetic processes determining the size and frequency of ROH [11, 12]. The individual autozygosity measure based on ROH (F_ROH) was computed for three different ROH length thresholds as described previously [10]. Sum total length of ROH (SROH) against the total number of ROH (NROH) on x and y-axes respectively were plotted. Median values for all population groups were computed as described previously [11]. ROH spanning the entire ASPA was also explored among the population groups. ROH calculated from genotyping array was also compared for accuracy in size and marker details from the WES data.

Estimation of identity-by-descent (IBD) and kinship coefficients

Kinship coefficients and proportion of the genome that is IBD (propIBD) were computed using KING v2.2.7 [13]. We then computed pair-wise kinship coefficients among the members within families and between families. To robustly detect IBD segments, Refined IBD (v17Jan20.102) was used after phasing genotypes by Beagle 5.4 for seven unrelated individuals in the pedigree [14, 15].

Phasing and haplotype reconstruction in pedigrees

Haplotype reconstruction of chromosome 17p region (2980 markers) was performed using MERLIN v1.12 [16]. The haplotypes including recombination events were visualized using HaploPainter v1.043 [17]. The ROH regions around the ASPA pathogenic variant from both the probands were taken as starting point for haplotype analysis. The chromosome carrying the pathogenic variant from each of the carrier family members were aligned together and examined for similarity in markers among all the carriers and the affected. The haplotype regions so inferred were further re-examined for accuracy using PHASE v2.1.1 [18]. The LD (r² and Dʹ) estimates for the markers in the so delineated haplotype region among the population groups were compared.

Population structure and admixture analysis

To investigate the current population structure and any admixture between local populations, we performed three separate statistical paradigms using data from 88 unrelated individuals with kinship coefficients < 0.044 as per KING. Principal component analysis (PCA) using SmartPCA from EIGENSOFT v7.2.1 [19, 20] with the results visualized in R program. To eliminate the distorting effect of LD on eigenvector structure and eigenvalues, ‘ldregress’ was applied. Further, in view of significant endogamy in Indian population leading to inflated F_ST estimates, we chose ‘inbreed: YES’ parameter [21]. The F_ST estimates, Tracy-Widom statistics and ANOVA p values were tabulated for the inferred populations. A Bayesian model-based admixture mapping was performed using STRUCTURE v2.3.4 [22, 23] on the LD pruned data with correlated allele frequencies and admixture model. The output was visualized using CLUMPAK [24]. Maximum-likelihood-based inference of population tree using TreeMix v1.13 [25] on the LD pruned data with YRI as the root population was performed. The output was visualized using R ‘plotting_funcs.R’ script.

Mutation age estimation

In view of the small sample size and the variant being novel, the Gamma method was used as described previously [26].

Results

Both the probands harbored a novel homozygous variant c.526G>A (NC_000017.10:g.3386886G>A; NM_000049.4(ASPA):c.526G>A) in exon 3. The missense variant p.Gly176Ser has not been reported previously in ClinVar/HGMD or in the population databases such as gnomAD, 1000 Genomes, GenomeAsia100K, and our in-house database. The variant is predicted to be damaging by both SIFT and PolyPhen2. The nucleotide c.526 is predicted to be highly conserved by GERP++ and PhyloP across 100 vertebrates. Analysis using MaxEntScan and NNSplice revealed a splicing defect. varSEAK splice tool predicted the replacement of the consensus splice donor site by the activation of a cryptic splice donor site 4 bp upstream (new position; c.523G). The variant segregated in a Mendelian recessive fashion among the family members. Hence, this variant was classified as pathogenic with ClinVar accession number VCV001700565.1.

The cDNA sequencing revealed a frameshift pattern starting 1 bp upstream of the novel variant (Supplementary Fig. 2). Deconvoluting the cDNA sequence revealed, presence of 2 possible transcripts. One transcript with the activation of an upstream cryptic splice site GT (at position c.523_524) along with splicing of the 4^th exon as predicted. This would result in a premature termination at codon 176 (p.Gly176Ter) and theoretically many more premature termination codons further downstream. Such an mRNA is expected to undergo nonsense-mediated decay (NMD), as the variant is 223 bp upstream of the last exon-junction complex (EJC). Second transcript with the retainment of canonical splice site (escaping the splicing aberration) resulting in a complete transcript. Probably this results in a normal length protein harboring the missense change p.Gly176Ser. This non-classical splicing defect due to an exonic mutation has been classified as type IV splicing defect [27]. Such examples have been reported previously [28,29,30]. This is in line with the reported literature that, a milder missense variant and another pathogenic LOF variant often leads to M/JCD [1]. Further, studies in the form of NMD analysis/enzyme activity levels are required to confirm the above possibilities.

Fixation index

The history of migration of the TDC from their historical origin centuries ago probably with a bottleneck and a history of endogamy leads one to speculate the possibility of low genetic diversity. Fixation index (F) is the most basic measure of such a variation and the results were evidently concrete. On an average, there was 0.5–0.7% reduction in heterozygosity in the TDC as compared to the expected estimates under random mating. We also noticed a higher median F value compared to individuals (with no history of consanguinity) inhabiting the same geographical location (Table 1). The deviation, however, is an order of magnitude less compared to the individuals born of consanguineous parentage. This was an early indication to the possible role of endogamy but with a lesser degree than consanguinity in the TDC. Reduction in heterozygosity also implies an increased proportion of loci that are homozygous: the Wahlund effect [31].

Table 1 Table depicts various subpopulations studied with n samples.

Full size table

ROH frequency and size among subpopulations

In Fig. 2A, the class I ROH represents similarity of the population living in the same geographical location as opposed to a population from a distant continent. For the class I ROH, no single subpopulation of Indian origin could be statistically differentiated based on SROH and NROH. However, each of the subpopulation showed higher median SROH and NROH compared to YRI which were statistically significant (Supplementary Table S4). This is comparable to previously reported positive correlation of mean SROH with the increasing geographical distance of the study population from East Africa [12].

**Fig. 2: Total number of ROH (NROH) and sum total length of ROH (SROH) distribution for different length thresholds, represented as box and whisker plots.**

The size criterion for class II ROH was based on well-documented literature evidence, where subpopulations begin to show differences in both mean SROH and NROH [10, 12]. It is clear from the data (Fig. 2B) that TDC had significantly higher SROH and NROH compared to other subpopulations with no history of parental relatedness (Supplementary Table S4). Even though a small difference in the median values was found between TDC and individuals with parental relatedness (consanguinity), the difference was not statistically significant (p = 0.117). These findings are concordant with the literature evidence of distribution of intermediate-sized ROH [10, 11]. The intermediate-sized ROH are often due to the effect of increased background relatedness determined by the effective population size [10, 11]. Historical evidence of a probable population bottleneck in TDC and cultural isolation with consequent endogamy as well as the current population size, effectively explains increased background relatedness. Finally, for the class III ROH (Fig. 2C), TDC showed higher SROH and NROH (p < 0.005) compared to subpopulations with no history of parental relatedness. However, CSG differed significantly from TDC (p < 0.0016), which is in line with the increased occurrence of multiple long ROH in individuals with recent parental relatedness. It is apparent from the data that, despite the absence of consanguinity, there is substantially higher than expected levels of intermediate and long-sized ROH in the TDC. As can be noticed in Fig. 2D the TDC is on the left of the trendline suggesting that NROH is proportionately more than SROH. This is due to the fact that the class III ROH are few in TDC but the population is enriched with class II ROH. In contrast, the CSG group had the right shift because of numerous long class III ROH without a proportional increase in NROH. The pattern observed in TDC is analogous to worldwide patterns of numerous native populations with history of bottleneck, genetic isolation with a small effective population size, and consequent endogamy [11].

F_ROH is a better measure of autozygosity even in the absence of accurate and deep pedigrees, as F_ROH extracts true IBD as opposed F values from pedigrees (F_PED) where the founders are often poorly defined. F_ROH also reliably captures autozygosity originating from ancient as well as recent parental relatedness [11]. TDC had higher F_ROH values compared to all other subpopulations (p < 0.05) except CSG (Table 1). The overall results suggest an increased amount of autozygosity in the TDC, in spite of the absence of recent parental relatedness.

IBD and kinship coefficients

Results of the kinship coefficient φ as per KING between pairs of individuals showed accurate relatedness among individuals with self-reported pedigree relationship. The results also unexpectedly unearthed cryptic relatedness with higher φ than expected among TDC individuals who were unrelated as per the pedigree (Supplementary Table S1). The propIBD-based relationship inference provides more robust estimation of relatedness and the results of which were concordant with the φ results. Refined IBD, a robust IBD segment caller showed that the summated fraction of genome that is identical by descent (LOD > 3.0) in the TDC founders were several folds higher than the unrelated individual pairs corroborating with the findings of propIBD estimates. This can be explained by the equation \(f = \left( {\frac{1}{2}} \right)^i\left( {1 + Fa} \right)\) [32]; where Fa is the cryptic probability of a founder in a pedigree who himself/herself is a product of inbreeding. Fa values increase in a population with endogamy alone or bottleneck superimposed with endogamy. Given the history of TDC, an inflated kinship coefficient/IBD estimates is not startling.

Haplotype analysis

As evident in Fig. 3, the family 2 harbored an unbroken segment of size 7.96 Mb which was segregating in a Mendelian fashion. The size exactly matches with the ROH found in the proband. However, in family 1 the haplotype has undergone reduction in size probably because of at least two independent ancestral recombination events, one each on the paternal and maternal side. The sizes of the segregating haplotypes were 4.73 Mb and 1.02 Mb respectively. The core haplotype of 1.02 Mb consisting of 147 SNPs is common to all the carriers and the affected. The Dʹ and r² values between SNPs in the region of the haplotype were substantially high compared to other subpopulations. The sequence of the haplotype was also confirmed using PHASE 2.2 program. Further, similar sized haplotype was not found among any other individuals from various populations analyzed.

**Fig. 3: Phased haplotypes on chromosome 17p representing shared haplotypes spanning *ASPA* locus between individuals from the two families.**

Population structure

PCA clearly grouped individuals into distinct clusters which correlated well with their geographical location and native languages (Table 2). The data were in good agreement with eigenvalues, Tracy–Widom statistic, and ANOVA P values (Supplementary Table S5). Clear cut population differentiation between TDC and TML who are inhabiting the same geographical region was noted (Fig. 4A). Both the populations were far removed from the other subpopulations such as NOI, GIH, and ITU along the PC1 and PC2 axes. Further, all these populations aligned on the PC2 axis and were at a notable distance from YRI on the PC1 axis (Supplementary Fig. 4A) as documented in previous studies [21]. This was the first clear indication as to the differentiation of TDC from other populations and the presence of a population structure. F_ST output after correcting for inbreeding, as per the Balding-Nichols model (Table 3) showed moderately great differentiation of TDC from YRI and moderate from other subpopulations of the Indian ancestry without any overlap [33]. The only overlap was between ITU with GIH (F_ST = 0.003; threshold for overlap considered < 0.004) [34].

Table 2 Statistical significance of population differentiation in the PCA analysis.

Full size table

**Fig. 4: Population clustering and pattern of divergence among the populations studied.**

Table 3 Wright’s F-statistics (F_ST) estimated using the Balding-Nichols model [20, 39] indicating the pair-wise differentiation among the subpopulations.

Full size table

TreeMix provided the conclusive maximum-likelihood tree representing the topological relationship of subpopulations. The Newick format ML tree (Fig. 4B) shows, TDC and TML splitting from a common ancestral population which in itself a split from the common ancestors of ITU. The length of the branch representing TDC clearly shows a significant divergence (p = 7 × 10⁻⁵) and probably minimal or no migration between TDC and TML. STRUCTURE results concluded the same (Supplementary Fig. 4B).

There is enough evidence as per the results of F_ST, PCA, STRUCTURE and Treemix to support a process akin to ‘lineage sorting’ having taken place. This is probably due to the significant population contraction (leading to drift) and long genetic isolation. As per the Island model (expected \(F_{{{{{\rm{ST}}}}}} = 1 - e^{ - \frac{1}{{2Ne}}}\)), a significant population contraction resulting in drift will increase differentiation and there by F_ST [35]. One of the main assumptions of the Island model is that no migration has taken place between the subpopulations. Based on the self-report of individuals from TDC, knowledge of their strict cultural practices, it is evident along with the above-mentioned findings that the TDC has undergone genetic isolation after a founding event with hardly any migration thereafter.

Mutation dating

Assuming a correlated genealogy, the mutation arose 15.1 generations ago, with a confidence interval of 3.3–26.5 corresponding to around 375 years (assuming one generation = 25 years). This coincides well with the age of historical migration of the first founders of TDC to the current location of inhabitation [4]. The Gamma method as opposed to coalescent-based methods, in fact provides an estimation of the age of the most recent common ancestor (MRCA) rather than the exact age of the mutational process.

Discussion

M/JCD often results from retained residual enzymatic activity of aspartoacylase. Various studies in M/JCD have observed that, a milder missense pathogenic variant such as p.Tyr288cys or p.Pro257Arg in compound heterozygous trans state with another pathogenic variant usually a loss of function (LOF) variant or a missense variant in exons coding for the major functional domains [1, 36, 37]. The novel pathogenic variant c.526G>A found in both the probands, has been demonstrated to cause both the splicing defect (LOF) and probably a full-length protein with a missense change. This escape of a portion of transcripts from probable NMD is in line with the previous findings of pathogenetics of M/JCD. These evidences explain the milder phenotype in the probands with significantly longer life span compared to infantile CD.

The pedigree suggests no consanguinity what so ever in both the families. The two probands are related through the MRCA with not less than 9th degree relatedness. The known relatedness is through the paternal side of proband 1 and maternal side of proband 2 (Fig. 1). Even though there is MRCA between the two families, the deleterious allele is not passed through the individual connecting MRCA with the probands (IV-6 is not a carrier). Further, there is evidence for a high prevalence of the disorder in the population and currently at least 10 alive cases. The approximate prevalence provides an indirect estimate of a very high carrier frequency of about 1 in 10, calculated as per HWE. The M/JCD is a severe disorder with zero fitness. Ideally such deleterious recessive alleles are purged away in an inbred population; as homozygosity for such alleles eventually will be selected away. Hence, the novel variant is unlikely to be a new mutation in the recent past. Only a random genetic drift event or selection can change allele frequencies and not inbreeding alone, which can only change genotype frequencies [32].

The robust signatures of the founder event in TDC are supported by the following observations. (a) The high prevalence of the deleterious allele with reduced fitness. (b) Significant population differentiation as evidenced by F_ST, PCA, TreeMix results, without the probable effects of selection and migration. (c) A moderately high fixation index value with correspondingly high autozygosity (F_ROH) in the absence of consanguinity in TDC and the robust population structure infers the Wahlund effect [31]. (d) The left shift pattern of NROH vs. SROH of TDC for the class II and III ROH is indicative of a population bottleneck followed by endogamy [10]. (e) Elevated kinship coefficients, propIBD estimates and proportion of autosomal genome under IBD in TDC in the absence of consanguinity suggests an increased background relatedness. Higher proportions of IBD with longer IBD segments are often caused by founder events [38]. (f) A large haplotype spanning across ASPA in the absence of consanguinity is another indicator. ASPA locus and the flanking region on chromosome 17p is not a known ROH hotspot/ cold spot [12]. Genomic region encompassing ASPA is usually kept out of ROH by recombination events acted upon by selection pressures to avoid occurrence of homozygous deleterious alleles. Hence, the observation of smaller sized haplotypes in 17p region among rest of the subpopulations. However, ROH and the large haplotype in this region appears to be specific to TDC. (g) Mutation dating suggesting that the MRCA, for the entire community (the founder) who can be traced back in age, coincides with the documented historical migration [4]. Here we have provided a consilience of multiple genetic evidences to support the founder event in TDC. Numerous founder events related to recessive disorders, similar to the one described here have been well studied in the Indian population [21, 34].

The study has not only thrown light on pathogenetics, natural history of M/JCD in TDC but also the need for a community carrier screening program to prevent further occurrences of this disabling disorder. The science unearthed in this effort can also be a harbinger for future therapeutic studies considering the protracted course of M/JCD allowing itself to be amenable for early interventions.

Data availability

The novel variant identified has been submitted to the ClinVar database with the accession number VCV001700565.1 (https://www.ncbi.nlm.nih.gov/clinvar/variation/1700565/). The ethical approval and informed consent of this study do not allow the genetic data to be publicly released. Any additional data can be provided upon official request to the corresponding author after appropriate regulatory clearances.

References

Mendes MI, Smith DE, Pop A, Lennertz P, Fernandez Ojeda MR, Kanhai WA, et al. Clinically distinct phenotypes of Canavan disease correlate with residual aspartoacylase enzyme activity. Hum Mutat. 2017;38:524–31.
Article Google Scholar
Matalon R, Michals K, Kaul R. Canavan disease: from spongy degeneration to molecular analysis. J Pediatr. 1995;127:511–7.
Article Google Scholar
Feigenbaum A, Moore R, Clarke J, Hewson S, Chitayat D, Ray PN, et al. Canavan disease: carrier-frequency determination in the Ashkenazi Jewish population and development of a novel molecular diagnostic assay. Am J Med Genet A. 2004;124A:142–7.
Article Google Scholar
Ramaswamy V. Mapping Migrations of South Indian Weavers before, during and after the Vijayanagar Period: Thirteenth to Eighteenth Centuries. Globalising Migration History. Leiden: Brill; 2014. p. 89–121.
Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17:405–24.
Article Google Scholar
Villegas J, McPhaul M. Establishment and culture of human skin fibroblasts. Curr Protoc Mol Biol. New Jersey: John Wiley & Sons Inc; 2005, Chapter 28:Unit 28 3.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.
Article Google Scholar
Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 2015;4:7.
Article Google Scholar
Genomes Project C, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature 2015;526:68–74.
Article Google Scholar
McQuillan R, Leutenegger AL, Abdel-Rahman R, Franklin CS, Pericic M, Barac-Lauc L, et al. Runs of homozygosity in European populations. Am J Hum Genet. 2008;83:359–72.
Article Google Scholar
Ceballos FC, Joshi PK, Clark DW, Ramsay M, Wilson JF. Runs of homozygosity: windows into population history and trait architecture. Nat Rev Genet. 2018;19:220–34.
Article Google Scholar
Pemberton TJ, Absher D, Feldman MW, Myers RM, Rosenberg NA, Li JZ. Genomic patterns of homozygosity in worldwide human populations. Am J Hum Genet. 2012;91:275–92.
Article Google Scholar
Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26:2867–73.
Article Google Scholar
Browning BL, Tian X, Zhou Y, Browning SR. Fast two-stage phasing of large-scale sequence data. Am J Hum Genet. 2021;108:1880–90.
Article Google Scholar
Browning BL, Browning SR. Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics. 2013;194:459–71.
Article Google Scholar
Abecasis GR, Cherny SS, Cookson WO, Cardon LR. Merlin-rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002;30:97–101.
Article Google Scholar
Thiele H, Nurnberg P. HaploPainter: a tool for drawing pedigrees with complex haplotypes. Bioinformatics 2005;21:1730–2.
Article Google Scholar
Stephens M, Smith NJ, Donnelly P. A new statistical method for haplotype reconstruction from population data. Am J Hum Genet. 2001;68:978–89.
Article Google Scholar
Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2:e190.
Article Google Scholar
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–9.
Article Google Scholar
Reich D, Thangaraj K, Patterson N, Price AL, Singh L. Reconstructing Indian population history. Nature 2009;461:489–94.
Article Google Scholar
Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: dominant markers and null alleles. Mol Ecol Notes. 2007;7:574–78.
Article Google Scholar
Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 2003;164:1567–87.
Article Google Scholar
Kopelman NM, Mayzel J, Jakobsson M, Rosenberg NA, Mayrose I. Clumpak: a program for identifying clustering modes and packaging population structure inferences across K. Mol Ecol Resour. 2015;15:1179–91.
Article Google Scholar
Pickrell JK, Pritchard JK. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 2012;8:e1002967.
Article Google Scholar
Gandolfo LC, Bahlo M, Speed TP. Dating rare mutations from small samples with dense marker data. Genetics 2014;197:1315–27.
Article Google Scholar
Anna A, Monika G. Splicing mutations in human genetic disorders: examples, detection, and confirmation. J Appl Genet. 2018;59:253–68.
Article Google Scholar
Eng L, Coutinho G, Nahas S, Yeo G, Tanouye R, Babaei M, et al. Nonclassical splicing mutations in the coding and noncoding regions of the ATM Gene: maximum entropy estimates of splice junction strengths. Hum Mutat. 2004;23:67–76.
Article Google Scholar
Xu W, Yang X, Hu X, Li S. Fifty-four novel mutations in the NF1 gene and integrated analyses of the mutations that modulate splicing. Int J Mol Med. 2014;34:53–60.
Article Google Scholar
Siavriene E, Petraityte G, Mikstiene V, Rancelis T, Maldziene Z, Morkuniene A, et al. A novel CHD7 variant disrupting acceptor splice site in a patient with mild features of CHARGE syndrome: a case report. BMC Med Genet. 2019;20:127.
Article Google Scholar
Wahlund S. Zusammensetzung von Populationen und Korrelationserscheinungen vom Standpunkt der Vererbungslehre aus betrachtet. Hereditas. 1928;11:65–106.
Article Google Scholar
Hamilton MB. Population genetics. 2nd ed. New Jersey: John Wiley & Sons Inc; 2021.
Wright S. Evolution and the genetics of populations, volume 4: variability within and among natural populations. Chicago: University of Chicago Press; 1984.
Nakatsuka N, Moorjani P, Rai N, Sarkar B, Tandon A, Patterson N, et al. The promise of discovering population-specific disease-associated genes in South Asia. Nat Genet. 2017;49:1403–07.
Article Google Scholar
Wright S. Evolution in Mendelian populations. Genetics. 1931;16:97–159.
Article Google Scholar
Delaney KE, Kralik SF, Hainline BE, Golomb MR. An atypical case of Canavan disease with stroke-like presentation. Pediatr Neurol. 2015;52:218–21.
Article Google Scholar
Bley A, Denecke J, Kohlschutter A, Schon G, Hischke S, Guder P, et al. The natural history of Canavan disease: 23 new cases and comparison with patients from literature. Orphanet J Rare Dis. 2021;16:227.
Article Google Scholar
Bray SM, Mulle JG, Dodd AF, Pulver AE, Wooding S, Warren ST. Signatures of founder effects, admixture, and selection in the Ashkenazi Jewish population. Proc Natl Acad Sci USA. 2010;107:16222–7.
Article Google Scholar
Nicholson G, Smith AV, Jonsson F, Gustafsson O, Stefansson K, Donnelly P. Assessing population differentiation and isolation from single-nucleotide polymorphism data. J R Stat Soc Series B Stat Methodol. 2002;64:695–715.
Article Google Scholar

Download references

Acknowledgements

We thank the patients and their families of the TDC for their participation and support for this study.

Funding

The sample testing including sequencing, seggregation analysis and chromosomal microarray were carried as a part of diagnostic testing. Part of the research such as genotyping of population samples whose data were used for the analysis of population genetic parameters were performed under the intramural project NIMH/PROJ/GAU/00580/2018-19 funded by the National Institute of Mental Health and Neuro Sciences (NIMHANS), Bengaluru.

Author information

These authors contributed equally: Ananthapadmanabha Kotambail, Pavalan Selvam.

Authors and Affiliations

Department of Human Genetics, National Institute of Mental Health and Neuro Sciences (NIMHANS), Bengaluru, Karnataka, India
Ananthapadmanabha Kotambail, Chetan Ghati & Gautham Arunachal
Department of Clinical Genetics, Christian Medical College and Hospital, Vellore, Tamil Nadu, India
Pavalan Selvam & Sumita Danda
Paediatric Neurology Unit, Department of Neurological Sciences, Christian Medical College and Hospital, Vellore, Tamil Nadu, India
Karthik Muthusamy & Maya Thomas
Department of Radiodiagnosis, Christian Medical College and Hospital, Vellore, Tamil Nadu, India
Sniya Valsa Sudhakar

Authors

Ananthapadmanabha Kotambail
View author publications
You can also search for this author in PubMed Google Scholar
Pavalan Selvam
View author publications
You can also search for this author in PubMed Google Scholar
Karthik Muthusamy
View author publications
You can also search for this author in PubMed Google Scholar
Maya Thomas
View author publications
You can also search for this author in PubMed Google Scholar
Sniya Valsa Sudhakar
View author publications
You can also search for this author in PubMed Google Scholar
Chetan Ghati
View author publications
You can also search for this author in PubMed Google Scholar
Sumita Danda
View author publications
You can also search for this author in PubMed Google Scholar
Gautham Arunachal
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AK: designed and performed experiments such as genotyping, data analysis, interpretation, and drafting of the manuscript. PS: performed sequencing, mutation analysis, and drafting of the manuscript. KM and MT: involved in clinical care of probands and provided inputs to the manuscript. SVS: provided expert inputs regarding brain imaging and contributed to writing of the manuscript. CG: supported the overall work and provided expert inputs. SD: involved in genetic counseling of the families and supported overall work. GA: conceptualized the study, experimental design, data analysis and interpretation, obtaining funding, study supervision, and drafting/revising the final manuscript.

Corresponding author

Correspondence to Gautham Arunachal.

Ethics declarations

Ethical approval

The study protocol was approved by the Institutional Review Board of the Christian Medical College (CMC), Vellore and Institutional Ethics Committee of the National Institute of Mental Health and Neuro Sciences (NIMHANS), Bengaluru, India. Subjects were recruited after the appropriate consenting process. The initial part of this work was carried out at CMC where GA was working as a faculty in the department of Clinical Genetics. Since GA moved to NIMHANS, the rest of the work was completed at NIMHANS.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary figues

Supplementary tables

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kotambail, A., Selvam, P., Muthusamy, K. et al. Clustering of Juvenile Canavan disease in an Indian community due to population bottleneck and isolation: genomic signatures of a founder event. Eur J Hum Genet 31, 73–80 (2023). https://doi.org/10.1038/s41431-022-01198-4

Download citation

Received: 14 June 2022
Revised: 10 August 2022
Accepted: 15 September 2022
Published: 07 October 2022
Issue Date: January 2023
DOI: https://doi.org/10.1038/s41431-022-01198-4

This article is cited by

Cellular and molecular mechanisms of aspartoacylase and its role in Canavan disease
- Martin Grønbæk-Thygesen
- Rasmus Hartmann-Petersen
Cell & Bioscience (2024)
New year, new genes
- Alisdair McNeill
European Journal of Human Genetics (2023)

Subjects

Abstract

Similar content being viewed by others

Introduction

Materials and methods

Clinical presentation

Mutation analysis

cDNA studies

Genome-wide genotyping array

Estimation of fixation index and runs of homozygosity

Estimation of identity-by-descent (IBD) and kinship coefficients

Phasing and haplotype reconstruction in pedigrees

Population structure and admixture analysis

Mutation age estimation

Results

Fixation index

ROH frequency and size among subpopulations

IBD and kinship coefficients

Haplotype analysis

Population structure

Mutation dating

Discussion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links