Introduction

Dysthyroid optic neuropathy (DON) is the commonest blinding complication affecting 4–8% of patients with thyroid associated orbitopathy (TAO), with an estimated annual incidence of 0.6–1.3 cases per 100,000 population1,2. While exact mechanisms of DON remain elusive, apical compression by enlarged extraocular muscles and/or fat (crowding)3,4, ischemia due to increased retrobulbar pressure, mechanical stretch due to proptosis and perineural inflammation have been proposed5. Empirical treatments including surgical apical decompression, systemic steroids and orbital radiotherapy are often effective to restore vision. It is thus imperative to confirm diagnosis early to avoid irreversible visual loss and unnecessary treatments in alternative causes5. Ancillary tests, for example optical coherence tomography6, orbital imaging7 and electrophysiological studies (EPS), including visual evoked potential (VEP) and electroretinogram (ERG) were attempts to objectively assess the presence, predict the development and correlate with the severity of DON8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23. However, methodologies and results were heterogeneous across studies. In this systematic review, we studied published reports on EPS in DON.

Results

Characteristics of included studies

Our search yielded 768 reports from databases. After removing 331 duplicated records, we studied 437 publications. Among them 415 studies were found to be irrelevant according to our eligibility criteria (see Methods below). For the remaining 22 studies, 8 reports were excluded: 1 report on a duplicated study population24, 1 case report25, 1 review article26, and 5 studies with irrelevant or insufficient results1,27,28,29,30. 2 additional studies were identified from manual search of references8,9. 16 studies were finally included for the systematic review (Fig. 1)8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23. No clinical trial was identified.

Figure 1
figure 1

Flowchart of the literature search and study selection process.

The pooled sample included 787 patients (1,327 eyes). The age of patients with DON ranged from 14 to 77 years old13. VEP was used in 14 studies8,9,10,11,12,13,15,16,17,18,19,20,21,22. 3 studies tested pERG12,14,23. No study was found on flash or multifocal ERG (Table 1).

Table 1 Characteristics of included studies in the systematic review.

Phenotypic definition of subjects

Study populations were phenotypically defined as patients with DON, TAO, or healthy subjects. Clinical features of DON included optic disc swelling, relative afferent pupillary defect, decreased visual acuity, impaired color vision, and visual field defect31. DON was considered “definite” if there was optic disc swelling or 2 of the four other clinical features above without alternative explanation in a patient with TAO32. Subclinical or “equivocal” DON was proposed by some as the presence of optic nerve dysfunction in TAO patients without the full-blown clinical features of DON15, often identified by abnormal electrophysiological changes8,12,13,16,18,22.

Flash VEP (fVEP) in TAO & DON

Only 2 earlier studies reported use of fVEP in TAO and DON patients (Tables 2 and 3)9,11. Alteration in P2 amplitude was reported in clinically evident DON11. Tsaloumas et al. found significantly smaller P2 amplitude in DON eyes which improved either after orbital decompressions (6.83 ± 0.92 vs. 13.12 ± 1.65 µV; P < 0.05) or 2 weeks of high-dose systemic steroids (7.00 ± 1.10 vs. 9.61 ± 1.43 µV; P < 0.0511. However, treatment-related improvement was not shown in Setala’s study after decompression or radiotherapy9.

Table 2 Summary outcomes of observational case series and case-control studies on the use of VEP in DON/TAO.
Table 3 Summary outcomes of longitudinal case series comparing VEP changes before and after treatment for DON/TAO.

Pattern VEP (pVEP) in TAO & DON

Comparison of pVEP results in DON, TAO, and normal controls

P100 latency, P100 amplitude, and N75 latency were compared between DON and normal controls in 3 studies10,11,17. An increase in P100 latency of patients with DON was reported by Shawkat et al. (115.2 ± 5.7 vs. 103.2 ± 4.3 ms, P = 0.0005)10, Tsaloumas et al. (129.2 ± 7.1 vs. 108.2 ± 1.2 ms, P < 0.005)11, and Ambrosio et al. (P < 0.0001)17. A decrease in P100 amplitude was found in eyes with DON compared to control by Tsaloumas et al. (3.67 ± 0.81 vs. 8.97 ± 0.59 µV, P < 0.001)11 and Ambrosio et al. (P < 0.0001)17.

Comparisons between TAO eyes with or without DON were reported in 3 studies10,11,15. Significant increases in P100 latency in eyes with DON were shown by Shawkat et al. (115.2 ± 5.7 vs. 110.3 ± 5.1 ms, P = 0.043)10, Tsaloumas et al. (129.2 ± 7.13 vs. 111 ± 1.86 ms, P < 0.005)11, and Rutecka-Debniak et al. (124.4 ± 15.4 vs. 114.9 ± 11.2 ms, P = 0.05)15. Significant decreases in P100 amplitude in DON patients were reported by Shawkat et al. (11.9 ± 6.4 vs. 21.2 ± 9.7 µV, P = 0.018)10 and Tsaloumas et al. (3.67 ± 0.81 vs. 8.55 ± 0.73 µV, P < 0.001)11. Moreover, the mean N75 latency of eyes with DON was also increased (90.0 ± 17.9 vs. 80.3 ± 14.7 ms, P = 0.01)15.

Five studies reported significant increases in P100 and N75 latencies comparing eyes from TAO patients without DON to healthy eyes (Table 2)8,12,13,16,18. Wijngaarde et al. first reported significant increase in P100 latency of TAO to healthy eyes (P < 0.01)8. Spadea et al. (126.7 ± 10.7 vs. 118.5 ± 5.7 ms, P < 0.05)12, Salvi et al. (105.6 ± 0.5 vs. 102.0 ± 0.5 ms, P < 0.001)13, Acaroglu et al. (122.0 ± 14.4 vs. 105.9 ± 7.7 ms, P = 0.0004)16, and Pawlowski et al. (106.2 ± 4.4 vs. 102.4 ± 2.7 ms, P < 0.01) also found increased P100 latencies in eyes from TAO subjects without clinical evidence of DON when compared with controls18. In addition, Pawlowski et al. found an increase in N75 latency (79.0 ± 3.7 vs. 73.9 ± 2.8 ms, P < 0.001)19, while Spadea et al. showed a decrease in P100 amplitude (3.47 ± 3.81 vs. 9.78 ± 4.26 µV, P < 0.05) in TAO patients comparing to normal subjects12. However, the differences between eyes from TAO patients and normal controls in N75 and P100 latencies were insignificant in other studies (Shawkat et al.10 and Tsaloumas et al.11). While these TAO patients did not show clinical evidence of DON, abnormal pVEP in particular prolonged P100 latencies may present electrophysiological evidence of early or subclinical optic nerve dysfunction in TAO patients.

Correlation of pVEP latencies with clinical parameters

Four studies investigated correlation between pVEP latencies and clinical parameters (Table 4)8,16,18,20. Wijngaarde et al. reported a mild but significant correlation (r = 0.27, P value not available) of P100 latency with Snellen visual acuity8, while Wei et al. reported a similar degree of correlation without statistical significance (r = 0.278, P > 0.05) using LogMAR visual acuity20. In the latter study, correlation of P100 latency was moderate and statistically significant with total cross-sectional areas of all extraocular rectus muscles (EOM-A) (r = 0.496, P < 0.01); moderate but insignificant with ratio between the total cross-sectional area of all extraocular rectus muscles and the orbital area (r = 0.482, P > 0.05), mild and insignificant with total error of 100-hue color sensation (r = 0.363, P > 0.05) and with mean deviation of retinal sensitivity (MD) in perimetry (r = −0.342, P > 0.05). On the other hand, the correlation between peripapillary nerve fiber layer thickness and degree of exophthalmos with P100 latency was insignificant20. Acaroglu et al. reported a mild but significant correlation between the disease activity (clinical activity score) and P100 latency (r = 0.364, P = 0.04)16.

Table 4 Correlations between pVEP latencies and clinical measurements of DON/TAO.

The correlation between degree of exophthalmos and pVEP varied among studies. Pawlowski et al. reported a moderate and significant correlation between degree of proptosis and N75 latency (r = 0.51, P < 0.01) but not with p100 latency18. On the other hand, Wijngaarde et al. described a mild correlation coefficient between degree of proptosis and P100 latency (r and P value not available)8, while Wei et al. reported poor and insignificant correlation (r = −0.126, P value not available)20.

pVEP after treatments

Four studies reported the pVEP results before and after treatments including high-dose steroids, orbital radiotherapy and/or decompression (Table 3)11,15,19,21. While treatment strategies varied, increase in p100 amplitude and/or decrease in p100 latency post-treatment were generally observed. More improvements were observed in eyes with DON than those without. Three studies reported more than 10% decrease in P100 latency after treatment of DON. Tsaloumas et al. reported a significant decrease (from 129.2 ± 7.13 to 114.0 ± 4.47 ms, P < 0.01)11, and so did Rutecka-Debniak et al. (from 126.0 ± 15.9 to 108.0 ± 5.3 ms, P = 0.01)15 and Liao et al. (from 134.8 ± 22.1 to 107.3 ± 4.0 ms, P < 0.001)19. Rutecka-Debniak et al. also reported a significant decrease in N75 latency in eyes with DON after treatment (from 93.3 ± 18.7 to 78.8 ± 7.7 ms, P = 0.01)15. Significant increase in P100 amplitude over 50% was reported by Tsaloumas et al. after decompression (from 3.67 ± 0.81 to 6.50 ± 0.67 µV, P < 0.01) and high-dose steroids treatment (from 5.30 ± 0.89 to 8.06 ± 0.80 µV, P < 0.01)11. Lipski et al. also reported significant increase in P100 amplitude after bony orbital decompression (from 4.45 ± 2.3 to 8.8 ± 6.32 µV, P < 0.05)21.

In TAO eyes with no clinical evidence of DON but prolonged P100 latency, Rutecka-Debniak et al. reported a significant decrease after treatment (from 114.8 ± 12.6 to 107.3 ± 13.2 ms, P = 0.05)15. There was no post-treatment change in TAO eyes with normal pre-treatment VEP.

Multifocal VEP (mfVEP) in TAO

In 2012, Perez-Rico et al. first reported the use of mfVEP in TAO patients without DON22. There was a significant increase in mean latency in TAO group compared to age-matched control (2.12 ± 1.72 vs. 6.57 ± 1.90 ms, P < 0.05) and 23 eyes (35.4%) had abnormal mfVEP amplitude and/or latency. By interocular comparison, 12.3% of TAO eyes showed decreased amplitude and 13.8% of them showed increased latency. Visual acuity was significantly related to mfVEP amplitude changes (mean difference = −0.104, P = 0.018), while intraocular pressure measured at upgaze was significantly related to mfVEP latency changes (mean difference = 2.595, P = 0.028). No statistically significant relationship was observed between mfVEP parameters and standard automated perimetry results or nerve fiber layer thickness measured on optical coherence tomography22.

Electroretinography (ERG) in TAO

Comparing TAO eyes with controls, Spadea et al. found significant decreases in amplitudes for both P50 (1.17 ± 0.58 vs. 1.74 ± 0.50 µV, P < 0.05) and N95 (1.71 ± 1.10 vs. 2.37 ± 0.59 µV, P < 0.05)12. No significant difference was found in latency12. Genovesi-Ebert et al. reported significantly smaller (P < 0.0001) pERG amplitude in TAO eyes without providing numerical results14. They also described a negative correlation of pERG amplitude with optic nerve diameter measured by ultrasonography. Pawlowski et al. reported significant decrease in P50 amplitude in TAO eyes (2.04 ± 0.99 vs. 2.69 ± 0.88 µV, P < 0.05) but not in N95 amplitude or latencies23. 3 studies reported drop in P50 amplitude12,14,23, with statistical significance shown by Spadea et al. and Pawlowski et al.12,23.

Assessment of the quality of study and grading of clinical recommendation

The 12 studies on VEPs were assessed according to the NOS (Newcastle-Ottawa Scale) quality assessment of case-control studies33 (Table 5). The study with best quality was carried out by Tsaloumas et al. in 199411. Clinical recommendation of EPS in detecting and monitoring visual dysfunction in TAO was rated according to the American Academy of Ophthalmology on preparing Preferred Practice Pattern (PPP) guidelines (Table 6)34. pVEP was given level A importance in application and level II in strength of evidence.

Table 5 Quality Assessment for Included Case-control Studies.
Table 6 Clinical recommendation of VEP or ERG in detecting visual dysfunction in TAO.

Discussion

Clinical features of DON may include impaired visual acuity and color vision, visual field, afferent and relative affect pupillary defect (APD/RAPD), optic disc hyperemia or swelling5,31,35. In practice, these features rarely co-exist while ocular co-morbidities often confound with clinical assessment35. The European Group on Graves’ Orbitopathy (EUGOGO) was the first to propose that the presence of optic disc swelling alone or any other two of the above abnormalities without an alternative explanation suggested the presence of DON in any TAO patient35,36. Among the 94 eyes recruited, impaired visual acuity (<20/40), color vision, visual field defects, relative afferent pupillary defect and optic disc swelling were present in only 73%, 77%, 71%, 45%, and 56% of eyes subsequently diagnosed to have “definite DON”. On the other hand, these abnormalities were also found in 32%, 7%, 13%, 0% and 5% of eyes subsequently diagnosed to have “no DON”. These results implied that none of the individual findings of optic nerve dysfunction was found to be sensitive or specific enough to diagnose or exclude DON. Proptosis or increased clinical activity scores (≥3/7) were absent in more than one-third of eyes with “definite” DON35. Despite its serious visual consequences, no widespread consensus on the diagnostic criteria of DON is available to date. The challenge in diagnosing DON at its early stage or in patients with ocular comorbidities remains.

Electrophysiological studies (EPS), including visual evoked potential (VEP) and electroretinogram (ERG) were adopted to provide objective evaluation and correlation with the presence and/or severity of DON. VEP refers to the electrophysiological signals extracted from visual cortex during visual stimulation over the retina37. Any disturbance along the visual pathway or visual cortex results in VEP abnormalities (decrease in amplitude or increase in latency). It was first reported in 1972 by Halliday et al. to assess optic neuritis38. Subsequently it was used in patients with DON in 1980 by Wijngaarde et al.8. Three types of VEP have been used: flash VEP (fVEP), pattern VEP (pVEP), and multifocal VEP (mfVEP) (Table 7). fVEP uses a diffuse flash stimulating the entire retina for a mass response. Therefore, localized abnormal response may be averaged out and left undetected. pVEP uses checkerboard pattern reversal simulation covering the central 15° visual field. The major components of pVEP are a large positive wave at peak latency of about 100 milliseconds (P100) and a negative wave peaking at 70 milliseconds (N70). Any delay in P100 latency or decrease in amplitude measured from N70 to P100 suggests the presence of optic neuropathy37. Since the first report on pVEP in assessing visual function in TAO patients by Wijngaarde et al. in ref.8,9 other studies were published comparing the use of pVEP in TAO patients with or without DON (Table 2). mfVEP records signals from multiple stimuli given simultaneously across 20° to 25° of the central visual field enabling assessment of small local defects39.

Table 7 Features of included studies.

ERG records the electrical response of the retina upon light stimulation by various types of corneal electrodes. ERG is widely used in retinal disorders but rarely in TAO40. Pattern electroretinogram (pERG) uses reversing black and white checkerboard stimulus to collect signals from inner retina and indirectly measure retinal ganglion cell function. Commonly used parameters of pERG include a prominent positive wave at approximately 50 millisecond (P50) and a larger negative wave at about 95 millisecond (N95)41. pERG was used for evaluating early ganglion cell dysfunction in glaucoma patients since 1980s42,43. pERG alteration was also reported in animal models of optic nerve transection during retrograde degeneration of retinal ganglion cells44,45. In clinical practice, combined interpretation of pVEP and pERG helps to differentiate retinal (abnormal pVEP and pERG) from optic nerve disorders (abnormal pVEP and normal pERG)46.

Here we report the first systematic review on the use of EPS in DON. pVEP has been the most widely reported EPS in DON. Case-control studies reported significant differences of pVEP parameters among eyes with DON, TAO only and from controls8,10,11,12,13,15,16,17,18,22. Prolonged P100 latency was found comparing either eyes with DON to eyes without from TAO patients or eyes from TAO patients to control. P100 latency correlated with visual acuity, clinical activity score, color vision, visual field, and orbital imaging parameters8,20. Significant improvement in pVEPs were found in patients after successful treatment of DON11,15,19,21.

We acknowledge insufficient evidence to support the use of pVEP as part of the diagnostic criteria of DON due to its limited availability and inherent variability. To improve generalizability for meta-analysis, future studies should adopt testing protocols by the International Society for the Clinical Electrophysiology of Vision (ISCEV) standards37,41,47,48,49, include age and/or gender-specific reference ranges, post-treatment follow-up results and all clinical parameters recommended by the EUGOGO5,31,35,37. Longitudinal follow-up of pVEP on TAO patients with equivocal or early clinical features of DON may shed insight on the natural history, treatment response and clincal implication on the evolving entity of “subclinical” DON.

In conclusion, pVEP was the most studied EPS in DON. Latency and amplitude of P100 were shown to be promising for the diagnosis and monitoring of DON. Future studies on pVEP using standardized settings will be required to fully evaluate its diagnostic accuracy and clinical utility in the management of DON.

Methods

Literature search

Literature search was performed in MEDLINE, EMBASE, and the Cochrane databases via Ovid platform. We formulated sensitive search strategies using the Boolean logic and search terms with controlled vocabularies (Medical Subject Heading terms): (“thyroid associated” OR “endocrine” OR “dysthyroid” OR “Graves”) AND (“orbitopathy[ies]” OR “ophthalmopathy[ies]”) OR (“ophthalmic Graves’ disease”) in combination with “optic neuropathy(ies)” (Table 8). The search was supplemented by manual screening of the reference lists of the relevant articles and reviews. Language filter was not applied in the search. We identified records published from January 1st, 1977 to August 20th, 2017.

Table 8 Search strategies used in MEDLINE and EMBASE.

Eligibility criteria

Studies were included in the systematic review according to the following criteria: (1) studies that used electrophysiological tests (e.g. VEP or ERG) to evaluate optic nerve dysfunction in patients with TAO or DON; and (2) studies can be observational case series, case-control study, cohort study, interventional case series, and clinical trials. Animal studies, case reports, reviews, abstracts, conference proceedings, and editorials were excluded.

Assessment of the quality of study and level of evidence

NOS (Newcastle-Ottawa Scale)33 was adopted to evaluate the quality of the case-control studies. The clinical recommendation of VEP or ERG in detecting and monitoring visual dysfunction in TAO were rated from 2 aspects, “importance to the care process” and “the strength of evidence in the available literature”, according to the American Academy of Ophthalmology on preparing Preferred Practice Pattern (PPP) guidelines34. “Importance to the care process” represents the value of this application to improve the quality of the patient’s care in a meaningful way. Level A indicates the most important; level B indicates moderately important and level C indicates relevant but not critical application. “Strength of evidence” was rated in 3 levels. Level I includes evidence obtained from at least one properly conducted, well-designed, randomized, controlled trial. It also includes meta-analysis of randomized controlled trials. Level II includes well-designed controlled trials without randomization, well-designed cohort or case-control analytic studies, preferably from more than one center, or multiple-time series with or without the intervention. Level III includes evidence obtained from descriptive studies or case reports.