Abstract
Mild/juvenile Canavan disease (M/JCD) is less frequently reported in the literature and little is known about its pathogenetic mechanisms. We report a comprehensive investigation into the pathogenetic mechanism of a novel NM_000049.4(ASPA):c.526G>A variant in two families. The families belong to Telugu Devanga Chettiar community (TDC) from southern India. TDC has a complex history of migration from their historical origin centuries ago with high endogamy. TDC probably has the highest clustering M/JCD recorded historically (around 24 cases). The pathogenic variant was shown to cause non-classical splicing defect resulting in two different transcripts. The splicing aberration, a loss of function mechanism coupled with a milder missense effect can explain the milder phenotype compared to the infantile-onset CD. The high clustering of an extremely rare form of neurodegenerative disorder with reduced fitness, led us to speculate the possibility of a founder event. Genotyping array of TDC and multiple distinct populations of Indian origin for several population genetic parameters was performed. It yielded robust signatures of a founder event in TDC, such as a high fixation index, increased runs of homozygosity and identity-by-descent in the absence of consanguinity; a large haplotype with high linkage disequilibrium among markers comprising the pathogenic variant; a robust population structure; mutation dating, estimating the age of the potential founder of TDC at around 375 years; possibly a high carrier rate in TDC. This study has not only focused its attention on natural history and pathogenetics but also paves way for carrier screening programs in TDC and future therapeutic studies.
Similar content being viewed by others
Introduction
Canavan disease (CD) is an autosomal recessive neurodegenerative disorder due to homozygous or compound heterozygous pathogenic variants in ASPA. Though it is pan ethnic, highest prevalence of cases is seen in Ashkenazi Jews. The overall prevalence of CD is 1 in 200,000 to 400,000 [1], whereas the prevalence in the Ashkenazi population is around 1 in 500 to 1 in 6700 [2, 3]. CD presents in early infancy by 3 to 6 months with evident macrocephaly and global developmental delay. As the disease advances, spasticity, seizures and optic atrophy sets in with early mortality. In contrast, the less common M/JCD presents with near normal period of infancy or only mild delay in motor milestones. Most children with M/JCD achieve walking, attend school and are normocephalic. M/JCD has a highly variable and protracted course with gradual development of neurological manifestations. It is established that, the severity of the presentation depends on the degree of residual activity of the aspartoacylase enzyme. Mildest cases with the highest residual activity, a clear genotype–phenotype correlation is currently possible [1].
We report here a comprehensive genetic investigation into the pathogenetics of two families (Fig. 1) hailing from a community of southern India with probably the highest clustering of M/JCD recorded historically, with a novel pathogenic variant. The probands belong to a community referred to as Telugu Devanga Chettiars. As per the historical records, this weaver community migrated during the expansion of Vijayanagara empire between 15th and 16th centuries from Andhra Pradesh and settled in Tiruchirappalli [4]. A second wave of migration began at about 150 years ago from Tiruchirappalli to a small geographical region near Tiruvannamalai in Tamil Nadu. Currently TDC has a population size of just over 3000, inhabiting three closely situated villages. The language spoken by TDC is Telugu, while the dominant language spoken in the district is Tamil. They are orthodox Hindus and customarily marry only within the community. Overall, their culture, linguistics, mating pattern and the geographical isolation has resemblance to an island population. Historically at least 24 such cases of M/JCD have been observed by the community within the last 3–4 generations with an average life span of 20–25 years. Currently around 10 such cases are alive, 3 of whom have been evaluated by us. The mating pattern is less than sufficient to explain the high rates of an extremely rare form of neurodegenerative disorder with reduced fitness. Hence, our objectives were; (1) to explore the mechanism underlying disease causation by the novel pathogenic variant and (2) explore the possibility of a founder event and subsequent endogamy leading to this high prevalence.
Materials and methods
Clinical presentation
Proband 1, currently 8-year-old male was referred to us at the age of 3 years for the evaluation of developmental delay. His best achieved developmental milestones at presentation were walking with support, immature pincer grasp and speaking bi-syllables. Antenatal and perinatal history was uneventful with a normal birth weight of 3.24 kg and head circumference of 36 cm. Neurological examination revealed normocephaly, horizontal nystagmus, hypotonia with diminished deep tendon reflexes and gait ataxia. Fundus examination showed no evidence of optic atrophy. Slow gain in development was observed during follow-up.
MR Brain imaging done at 18 and 27 months showed characteristic features of M/JCD; confirmed by an elevated N-acetyl aspartate peak in MRS (Supplementary Fig. 1). There was no interval progression between the scans. Urine gas chromatography/mass spectrometry analysis revealed a 15-fold elevation of N-acetyl aspartic acid levels.
Proband 2 is a 7-year-old male who was referred at the age of 1 year for evaluation of developmental delay. Antenatal and perinatal history was uneventful with a normal birth weight, and head circumference of 3 kg and 34.5 cm respectively. During his first visit, mild lag in all developmental domains were present (developmental age ~10–11 months). On examination, growth parameters were within normal range including head circumference. Nervous system examination showed no deficits. During follow-up at 2 years of age, ataxia of gait and stance was observed. Deficits in hand coordination was observed at 3.5 years of age. MRI Brain showed similar findings as proband 1.
Mutation analysis
Sanger sequencing of the coding exons of the ASPA was performed in the probands and family members using BigDye Terminator v3.1 kit and ABI 3500 Genetic Analyzer following standard manufacturer's protocol (Supplementary Fig 2). Various in silico analyses tools were used and the variant was categorized according to the American college of medical genetics (ACMG) guidelines [5]. Whole exome sequencing (WES) was performed for both the probands using Twist core exome kit (Twist Biosciences, USA) and sequenced on Illumina NextSeq 550 sequencer. The data was analyzed using Sentieon and VarSeq (Golden Helix, Inc, USA).
cDNA studies
Total RNA was isolated from the patient-derived fibroblast cell line [6]. cDNA was synthesized using RevertAid first strand cDNA synthesis kit (Thermo Fisher Scientific, Inc., USA). Sanger sequencing was performed using primers for exons 2–5 (Supplementary Table S6).
Genome-wide genotyping array
A total of 14 individuals from 2 families of TDC (Fig. 1) and 20 healthy control individuals were genotyped. Controls included were (i) TML, Tamil speaking population sharing the same geographical locality as TDC (ii) north Indians (NOI) natively speaking Indo-European languages, residing in the northern part of India and (iii) local population with second or third-degree consanguineous parentage (CSG). Genotyping was performed using Infinium HumanCytoSNP-12 v2.1 BeadChip as per the manufacturer’s protocol (Illumina, Inc., USA). Genotype data were extracted and were exported as PLINK.ped/.map files [7, 8]. Following QC steps, a total of 299,140 markers were considered. Of which, only autosomal markers were considered for further downstream analysis. After removing markers from sex chromosomes, a total of 275,544 remained. Furthermore, we extracted data from the 1000 Genomes Project [9] for Indian Telugu in the UK (ITU, n = 20), Gujarati Indians in Houston (GIH, n = 20), and Yoruba in Ibadan (YRI, n = 20) populations for the corresponding markers which were genotyped in-house. SNPs with missing genotypes in more than 10% individuals were excluded. This resulted in final set of 261,599 markers.
Estimation of fixation index and runs of homozygosity
Fixation index (F) with LD pruned (with r2 threshold of 0.3) data and runs of homozygosity (ROH) were calculated by PLINK1.9 [8]. LD pruning was not performed for ROH estimation, as our aim was to investigate not only parental relatedness but also to determine any occurrence of population bottleneck and isolation [10]. The threshold size of more than 500 kb was considered, as this largely eliminates the effect of linkage disequilibrium (LD) and therefore the frequent short ROH due to haplotype structure [10]. ROH so estimated were further subdivided into 3 length categories i.e., 0.5–1.99 Mb (class I ROH), 2–4.99 Mb (class II ROH) and >5 Mb (class III ROH). This subdivision is based on the knowledge of the role of diverse population genetic processes determining the size and frequency of ROH [11, 12]. The individual autozygosity measure based on ROH (FROH) was computed for three different ROH length thresholds as described previously [10]. Sum total length of ROH (SROH) against the total number of ROH (NROH) on x and y-axes respectively were plotted. Median values for all population groups were computed as described previously [11]. ROH spanning the entire ASPA was also explored among the population groups. ROH calculated from genotyping array was also compared for accuracy in size and marker details from the WES data.
Estimation of identity-by-descent (IBD) and kinship coefficients
Kinship coefficients and proportion of the genome that is IBD (propIBD) were computed using KING v2.2.7 [13]. We then computed pair-wise kinship coefficients among the members within families and between families. To robustly detect IBD segments, Refined IBD (v17Jan20.102) was used after phasing genotypes by Beagle 5.4 for seven unrelated individuals in the pedigree [14, 15].
Phasing and haplotype reconstruction in pedigrees
Haplotype reconstruction of chromosome 17p region (2980 markers) was performed using MERLIN v1.12 [16]. The haplotypes including recombination events were visualized using HaploPainter v1.043 [17]. The ROH regions around the ASPA pathogenic variant from both the probands were taken as starting point for haplotype analysis. The chromosome carrying the pathogenic variant from each of the carrier family members were aligned together and examined for similarity in markers among all the carriers and the affected. The haplotype regions so inferred were further re-examined for accuracy using PHASE v2.1.1 [18]. The LD (r2 and Dʹ) estimates for the markers in the so delineated haplotype region among the population groups were compared.
Population structure and admixture analysis
To investigate the current population structure and any admixture between local populations, we performed three separate statistical paradigms using data from 88 unrelated individuals with kinship coefficients < 0.044 as per KING. Principal component analysis (PCA) using SmartPCA from EIGENSOFT v7.2.1 [19, 20] with the results visualized in R program. To eliminate the distorting effect of LD on eigenvector structure and eigenvalues, ‘ldregress’ was applied. Further, in view of significant endogamy in Indian population leading to inflated FST estimates, we chose ‘inbreed: YES’ parameter [21]. The FST estimates, Tracy-Widom statistics and ANOVA p values were tabulated for the inferred populations. A Bayesian model-based admixture mapping was performed using STRUCTURE v2.3.4 [22, 23] on the LD pruned data with correlated allele frequencies and admixture model. The output was visualized using CLUMPAK [24]. Maximum-likelihood-based inference of population tree using TreeMix v1.13 [25] on the LD pruned data with YRI as the root population was performed. The output was visualized using R ‘plotting_funcs.R’ script.
Mutation age estimation
In view of the small sample size and the variant being novel, the Gamma method was used as described previously [26].
Results
Both the probands harbored a novel homozygous variant c.526G>A (NC_000017.10:g.3386886G>A; NM_000049.4(ASPA):c.526G>A) in exon 3. The missense variant p.Gly176Ser has not been reported previously in ClinVar/HGMD or in the population databases such as gnomAD, 1000 Genomes, GenomeAsia100K, and our in-house database. The variant is predicted to be damaging by both SIFT and PolyPhen2. The nucleotide c.526 is predicted to be highly conserved by GERP++ and PhyloP across 100 vertebrates. Analysis using MaxEntScan and NNSplice revealed a splicing defect. varSEAK splice tool predicted the replacement of the consensus splice donor site by the activation of a cryptic splice donor site 4 bp upstream (new position; c.523G). The variant segregated in a Mendelian recessive fashion among the family members. Hence, this variant was classified as pathogenic with ClinVar accession number VCV001700565.1.
The cDNA sequencing revealed a frameshift pattern starting 1 bp upstream of the novel variant (Supplementary Fig. 2). Deconvoluting the cDNA sequence revealed, presence of 2 possible transcripts. One transcript with the activation of an upstream cryptic splice site GT (at position c.523_524) along with splicing of the 4th exon as predicted. This would result in a premature termination at codon 176 (p.Gly176Ter) and theoretically many more premature termination codons further downstream. Such an mRNA is expected to undergo nonsense-mediated decay (NMD), as the variant is 223 bp upstream of the last exon-junction complex (EJC). Second transcript with the retainment of canonical splice site (escaping the splicing aberration) resulting in a complete transcript. Probably this results in a normal length protein harboring the missense change p.Gly176Ser. This non-classical splicing defect due to an exonic mutation has been classified as type IV splicing defect [27]. Such examples have been reported previously [28,29,30]. This is in line with the reported literature that, a milder missense variant and another pathogenic LOF variant often leads to M/JCD [1]. Further, studies in the form of NMD analysis/enzyme activity levels are required to confirm the above possibilities.
Fixation index
The history of migration of the TDC from their historical origin centuries ago probably with a bottleneck and a history of endogamy leads one to speculate the possibility of low genetic diversity. Fixation index (F) is the most basic measure of such a variation and the results were evidently concrete. On an average, there was 0.5–0.7% reduction in heterozygosity in the TDC as compared to the expected estimates under random mating. We also noticed a higher median F value compared to individuals (with no history of consanguinity) inhabiting the same geographical location (Table 1). The deviation, however, is an order of magnitude less compared to the individuals born of consanguineous parentage. This was an early indication to the possible role of endogamy but with a lesser degree than consanguinity in the TDC. Reduction in heterozygosity also implies an increased proportion of loci that are homozygous: the Wahlund effect [31].
ROH frequency and size among subpopulations
In Fig. 2A, the class I ROH represents similarity of the population living in the same geographical location as opposed to a population from a distant continent. For the class I ROH, no single subpopulation of Indian origin could be statistically differentiated based on SROH and NROH. However, each of the subpopulation showed higher median SROH and NROH compared to YRI which were statistically significant (Supplementary Table S4). This is comparable to previously reported positive correlation of mean SROH with the increasing geographical distance of the study population from East Africa [12].
The size criterion for class II ROH was based on well-documented literature evidence, where subpopulations begin to show differences in both mean SROH and NROH [10, 12]. It is clear from the data (Fig. 2B) that TDC had significantly higher SROH and NROH compared to other subpopulations with no history of parental relatedness (Supplementary Table S4). Even though a small difference in the median values was found between TDC and individuals with parental relatedness (consanguinity), the difference was not statistically significant (p = 0.117). These findings are concordant with the literature evidence of distribution of intermediate-sized ROH [10, 11]. The intermediate-sized ROH are often due to the effect of increased background relatedness determined by the effective population size [10, 11]. Historical evidence of a probable population bottleneck in TDC and cultural isolation with consequent endogamy as well as the current population size, effectively explains increased background relatedness. Finally, for the class III ROH (Fig. 2C), TDC showed higher SROH and NROH (p < 0.005) compared to subpopulations with no history of parental relatedness. However, CSG differed significantly from TDC (p < 0.0016), which is in line with the increased occurrence of multiple long ROH in individuals with recent parental relatedness. It is apparent from the data that, despite the absence of consanguinity, there is substantially higher than expected levels of intermediate and long-sized ROH in the TDC. As can be noticed in Fig. 2D the TDC is on the left of the trendline suggesting that NROH is proportionately more than SROH. This is due to the fact that the class III ROH are few in TDC but the population is enriched with class II ROH. In contrast, the CSG group had the right shift because of numerous long class III ROH without a proportional increase in NROH. The pattern observed in TDC is analogous to worldwide patterns of numerous native populations with history of bottleneck, genetic isolation with a small effective population size, and consequent endogamy [11].
FROH is a better measure of autozygosity even in the absence of accurate and deep pedigrees, as FROH extracts true IBD as opposed F values from pedigrees (FPED) where the founders are often poorly defined. FROH also reliably captures autozygosity originating from ancient as well as recent parental relatedness [11]. TDC had higher FROH values compared to all other subpopulations (p < 0.05) except CSG (Table 1). The overall results suggest an increased amount of autozygosity in the TDC, in spite of the absence of recent parental relatedness.
IBD and kinship coefficients
Results of the kinship coefficient φ as per KING between pairs of individuals showed accurate relatedness among individuals with self-reported pedigree relationship. The results also unexpectedly unearthed cryptic relatedness with higher φ than expected among TDC individuals who were unrelated as per the pedigree (Supplementary Table S1). The propIBD-based relationship inference provides more robust estimation of relatedness and the results of which were concordant with the φ results. Refined IBD, a robust IBD segment caller showed that the summated fraction of genome that is identical by descent (LOD > 3.0) in the TDC founders were several folds higher than the unrelated individual pairs corroborating with the findings of propIBD estimates. This can be explained by the equation \(f = \left( {\frac{1}{2}} \right)^i\left( {1 + Fa} \right)\) [32]; where Fa is the cryptic probability of a founder in a pedigree who himself/herself is a product of inbreeding. Fa values increase in a population with endogamy alone or bottleneck superimposed with endogamy. Given the history of TDC, an inflated kinship coefficient/IBD estimates is not startling.
Haplotype analysis
As evident in Fig. 3, the family 2 harbored an unbroken segment of size 7.96 Mb which was segregating in a Mendelian fashion. The size exactly matches with the ROH found in the proband. However, in family 1 the haplotype has undergone reduction in size probably because of at least two independent ancestral recombination events, one each on the paternal and maternal side. The sizes of the segregating haplotypes were 4.73 Mb and 1.02 Mb respectively. The core haplotype of 1.02 Mb consisting of 147 SNPs is common to all the carriers and the affected. The Dʹ and r2 values between SNPs in the region of the haplotype were substantially high compared to other subpopulations. The sequence of the haplotype was also confirmed using PHASE 2.2 program. Further, similar sized haplotype was not found among any other individuals from various populations analyzed.
Population structure
PCA clearly grouped individuals into distinct clusters which correlated well with their geographical location and native languages (Table 2). The data were in good agreement with eigenvalues, Tracy–Widom statistic, and ANOVA P values (Supplementary Table S5). Clear cut population differentiation between TDC and TML who are inhabiting the same geographical region was noted (Fig. 4A). Both the populations were far removed from the other subpopulations such as NOI, GIH, and ITU along the PC1 and PC2 axes. Further, all these populations aligned on the PC2 axis and were at a notable distance from YRI on the PC1 axis (Supplementary Fig. 4A) as documented in previous studies [21]. This was the first clear indication as to the differentiation of TDC from other populations and the presence of a population structure. FST output after correcting for inbreeding, as per the Balding-Nichols model (Table 3) showed moderately great differentiation of TDC from YRI and moderate from other subpopulations of the Indian ancestry without any overlap [33]. The only overlap was between ITU with GIH (FST = 0.003; threshold for overlap considered < 0.004) [34].
TreeMix provided the conclusive maximum-likelihood tree representing the topological relationship of subpopulations. The Newick format ML tree (Fig. 4B) shows, TDC and TML splitting from a common ancestral population which in itself a split from the common ancestors of ITU. The length of the branch representing TDC clearly shows a significant divergence (p = 7 × 10−5) and probably minimal or no migration between TDC and TML. STRUCTURE results concluded the same (Supplementary Fig. 4B).
There is enough evidence as per the results of FST, PCA, STRUCTURE and Treemix to support a process akin to ‘lineage sorting’ having taken place. This is probably due to the significant population contraction (leading to drift) and long genetic isolation. As per the Island model (expected \(F_{{{{{\rm{ST}}}}}} = 1 - e^{ - \frac{1}{{2Ne}}}\)), a significant population contraction resulting in drift will increase differentiation and there by FST [35]. One of the main assumptions of the Island model is that no migration has taken place between the subpopulations. Based on the self-report of individuals from TDC, knowledge of their strict cultural practices, it is evident along with the above-mentioned findings that the TDC has undergone genetic isolation after a founding event with hardly any migration thereafter.
Mutation dating
Assuming a correlated genealogy, the mutation arose 15.1 generations ago, with a confidence interval of 3.3–26.5 corresponding to around 375 years (assuming one generation = 25 years). This coincides well with the age of historical migration of the first founders of TDC to the current location of inhabitation [4]. The Gamma method as opposed to coalescent-based methods, in fact provides an estimation of the age of the most recent common ancestor (MRCA) rather than the exact age of the mutational process.
Discussion
M/JCD often results from retained residual enzymatic activity of aspartoacylase. Various studies in M/JCD have observed that, a milder missense pathogenic variant such as p.Tyr288cys or p.Pro257Arg in compound heterozygous trans state with another pathogenic variant usually a loss of function (LOF) variant or a missense variant in exons coding for the major functional domains [1, 36, 37]. The novel pathogenic variant c.526G>A found in both the probands, has been demonstrated to cause both the splicing defect (LOF) and probably a full-length protein with a missense change. This escape of a portion of transcripts from probable NMD is in line with the previous findings of pathogenetics of M/JCD. These evidences explain the milder phenotype in the probands with significantly longer life span compared to infantile CD.
The pedigree suggests no consanguinity what so ever in both the families. The two probands are related through the MRCA with not less than 9th degree relatedness. The known relatedness is through the paternal side of proband 1 and maternal side of proband 2 (Fig. 1). Even though there is MRCA between the two families, the deleterious allele is not passed through the individual connecting MRCA with the probands (IV-6 is not a carrier). Further, there is evidence for a high prevalence of the disorder in the population and currently at least 10 alive cases. The approximate prevalence provides an indirect estimate of a very high carrier frequency of about 1 in 10, calculated as per HWE. The M/JCD is a severe disorder with zero fitness. Ideally such deleterious recessive alleles are purged away in an inbred population; as homozygosity for such alleles eventually will be selected away. Hence, the novel variant is unlikely to be a new mutation in the recent past. Only a random genetic drift event or selection can change allele frequencies and not inbreeding alone, which can only change genotype frequencies [32].
The robust signatures of the founder event in TDC are supported by the following observations. (a) The high prevalence of the deleterious allele with reduced fitness. (b) Significant population differentiation as evidenced by FST, PCA, TreeMix results, without the probable effects of selection and migration. (c) A moderately high fixation index value with correspondingly high autozygosity (FROH) in the absence of consanguinity in TDC and the robust population structure infers the Wahlund effect [31]. (d) The left shift pattern of NROH vs. SROH of TDC for the class II and III ROH is indicative of a population bottleneck followed by endogamy [10]. (e) Elevated kinship coefficients, propIBD estimates and proportion of autosomal genome under IBD in TDC in the absence of consanguinity suggests an increased background relatedness. Higher proportions of IBD with longer IBD segments are often caused by founder events [38]. (f) A large haplotype spanning across ASPA in the absence of consanguinity is another indicator. ASPA locus and the flanking region on chromosome 17p is not a known ROH hotspot/ cold spot [12]. Genomic region encompassing ASPA is usually kept out of ROH by recombination events acted upon by selection pressures to avoid occurrence of homozygous deleterious alleles. Hence, the observation of smaller sized haplotypes in 17p region among rest of the subpopulations. However, ROH and the large haplotype in this region appears to be specific to TDC. (g) Mutation dating suggesting that the MRCA, for the entire community (the founder) who can be traced back in age, coincides with the documented historical migration [4]. Here we have provided a consilience of multiple genetic evidences to support the founder event in TDC. Numerous founder events related to recessive disorders, similar to the one described here have been well studied in the Indian population [21, 34].
The study has not only thrown light on pathogenetics, natural history of M/JCD in TDC but also the need for a community carrier screening program to prevent further occurrences of this disabling disorder. The science unearthed in this effort can also be a harbinger for future therapeutic studies considering the protracted course of M/JCD allowing itself to be amenable for early interventions.
Data availability
The novel variant identified has been submitted to the ClinVar database with the accession number VCV001700565.1 (https://www.ncbi.nlm.nih.gov/clinvar/variation/1700565/). The ethical approval and informed consent of this study do not allow the genetic data to be publicly released. Any additional data can be provided upon official request to the corresponding author after appropriate regulatory clearances.
References
Mendes MI, Smith DE, Pop A, Lennertz P, Fernandez Ojeda MR, Kanhai WA, et al. Clinically distinct phenotypes of Canavan disease correlate with residual aspartoacylase enzyme activity. Hum Mutat. 2017;38:524–31.
Matalon R, Michals K, Kaul R. Canavan disease: from spongy degeneration to molecular analysis. J Pediatr. 1995;127:511–7.
Feigenbaum A, Moore R, Clarke J, Hewson S, Chitayat D, Ray PN, et al. Canavan disease: carrier-frequency determination in the Ashkenazi Jewish population and development of a novel molecular diagnostic assay. Am J Med Genet A. 2004;124A:142–7.
Ramaswamy V. Mapping Migrations of South Indian Weavers before, during and after the Vijayanagar Period: Thirteenth to Eighteenth Centuries. Globalising Migration History. Leiden: Brill; 2014. p. 89–121.
Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17:405–24.
Villegas J, McPhaul M. Establishment and culture of human skin fibroblasts. Curr Protoc Mol Biol. New Jersey: John Wiley & Sons Inc; 2005, Chapter 28:Unit 28 3.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.
Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 2015;4:7.
Genomes Project C, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature 2015;526:68–74.
McQuillan R, Leutenegger AL, Abdel-Rahman R, Franklin CS, Pericic M, Barac-Lauc L, et al. Runs of homozygosity in European populations. Am J Hum Genet. 2008;83:359–72.
Ceballos FC, Joshi PK, Clark DW, Ramsay M, Wilson JF. Runs of homozygosity: windows into population history and trait architecture. Nat Rev Genet. 2018;19:220–34.
Pemberton TJ, Absher D, Feldman MW, Myers RM, Rosenberg NA, Li JZ. Genomic patterns of homozygosity in worldwide human populations. Am J Hum Genet. 2012;91:275–92.
Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26:2867–73.
Browning BL, Tian X, Zhou Y, Browning SR. Fast two-stage phasing of large-scale sequence data. Am J Hum Genet. 2021;108:1880–90.
Browning BL, Browning SR. Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics. 2013;194:459–71.
Abecasis GR, Cherny SS, Cookson WO, Cardon LR. Merlin-rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002;30:97–101.
Thiele H, Nurnberg P. HaploPainter: a tool for drawing pedigrees with complex haplotypes. Bioinformatics 2005;21:1730–2.
Stephens M, Smith NJ, Donnelly P. A new statistical method for haplotype reconstruction from population data. Am J Hum Genet. 2001;68:978–89.
Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2:e190.
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–9.
Reich D, Thangaraj K, Patterson N, Price AL, Singh L. Reconstructing Indian population history. Nature 2009;461:489–94.
Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: dominant markers and null alleles. Mol Ecol Notes. 2007;7:574–78.
Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 2003;164:1567–87.
Kopelman NM, Mayzel J, Jakobsson M, Rosenberg NA, Mayrose I. Clumpak: a program for identifying clustering modes and packaging population structure inferences across K. Mol Ecol Resour. 2015;15:1179–91.
Pickrell JK, Pritchard JK. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 2012;8:e1002967.
Gandolfo LC, Bahlo M, Speed TP. Dating rare mutations from small samples with dense marker data. Genetics 2014;197:1315–27.
Anna A, Monika G. Splicing mutations in human genetic disorders: examples, detection, and confirmation. J Appl Genet. 2018;59:253–68.
Eng L, Coutinho G, Nahas S, Yeo G, Tanouye R, Babaei M, et al. Nonclassical splicing mutations in the coding and noncoding regions of the ATM Gene: maximum entropy estimates of splice junction strengths. Hum Mutat. 2004;23:67–76.
Xu W, Yang X, Hu X, Li S. Fifty-four novel mutations in the NF1 gene and integrated analyses of the mutations that modulate splicing. Int J Mol Med. 2014;34:53–60.
Siavriene E, Petraityte G, Mikstiene V, Rancelis T, Maldziene Z, Morkuniene A, et al. A novel CHD7 variant disrupting acceptor splice site in a patient with mild features of CHARGE syndrome: a case report. BMC Med Genet. 2019;20:127.
Wahlund S. Zusammensetzung von Populationen und Korrelationserscheinungen vom Standpunkt der Vererbungslehre aus betrachtet. Hereditas. 1928;11:65–106.
Hamilton MB. Population genetics. 2nd ed. New Jersey: John Wiley & Sons Inc; 2021.
Wright S. Evolution and the genetics of populations, volume 4: variability within and among natural populations. Chicago: University of Chicago Press; 1984.
Nakatsuka N, Moorjani P, Rai N, Sarkar B, Tandon A, Patterson N, et al. The promise of discovering population-specific disease-associated genes in South Asia. Nat Genet. 2017;49:1403–07.
Wright S. Evolution in Mendelian populations. Genetics. 1931;16:97–159.
Delaney KE, Kralik SF, Hainline BE, Golomb MR. An atypical case of Canavan disease with stroke-like presentation. Pediatr Neurol. 2015;52:218–21.
Bley A, Denecke J, Kohlschutter A, Schon G, Hischke S, Guder P, et al. The natural history of Canavan disease: 23 new cases and comparison with patients from literature. Orphanet J Rare Dis. 2021;16:227.
Bray SM, Mulle JG, Dodd AF, Pulver AE, Wooding S, Warren ST. Signatures of founder effects, admixture, and selection in the Ashkenazi Jewish population. Proc Natl Acad Sci USA. 2010;107:16222–7.
Nicholson G, Smith AV, Jonsson F, Gustafsson O, Stefansson K, Donnelly P. Assessing population differentiation and isolation from single-nucleotide polymorphism data. J R Stat Soc Series B Stat Methodol. 2002;64:695–715.
Acknowledgements
We thank the patients and their families of the TDC for their participation and support for this study.
Funding
The sample testing including sequencing, seggregation analysis and chromosomal microarray were carried as a part of diagnostic testing. Part of the research such as genotyping of population samples whose data were used for the analysis of population genetic parameters were performed under the intramural project NIMH/PROJ/GAU/00580/2018-19 funded by the National Institute of Mental Health and Neuro Sciences (NIMHANS), Bengaluru.
Author information
Authors and Affiliations
Contributions
AK: designed and performed experiments such as genotyping, data analysis, interpretation, and drafting of the manuscript. PS: performed sequencing, mutation analysis, and drafting of the manuscript. KM and MT: involved in clinical care of probands and provided inputs to the manuscript. SVS: provided expert inputs regarding brain imaging and contributed to writing of the manuscript. CG: supported the overall work and provided expert inputs. SD: involved in genetic counseling of the families and supported overall work. GA: conceptualized the study, experimental design, data analysis and interpretation, obtaining funding, study supervision, and drafting/revising the final manuscript.
Corresponding author
Ethics declarations
Ethical approval
The study protocol was approved by the Institutional Review Board of the Christian Medical College (CMC), Vellore and Institutional Ethics Committee of the National Institute of Mental Health and Neuro Sciences (NIMHANS), Bengaluru, India. Subjects were recruited after the appropriate consenting process. The initial part of this work was carried out at CMC where GA was working as a faculty in the department of Clinical Genetics. Since GA moved to NIMHANS, the rest of the work was completed at NIMHANS.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kotambail, A., Selvam, P., Muthusamy, K. et al. Clustering of Juvenile Canavan disease in an Indian community due to population bottleneck and isolation: genomic signatures of a founder event. Eur J Hum Genet 31, 73–80 (2023). https://doi.org/10.1038/s41431-022-01198-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41431-022-01198-4
This article is cited by
-
Cellular and molecular mechanisms of aspartoacylase and its role in Canavan disease
Cell & Bioscience (2024)
-
New year, new genes
European Journal of Human Genetics (2023)