Introduction

As an aggressive malignancy, head and neck squamous cell carcinoma (HNSCC) arise in the squamous epithelium along the head and neck region, including the nasal cavity, oral cavity and tongue, pharynx (nasal pharynx, oropharynx, hypopharynx) and larynx. In 2018, it is estimated to affect approximately 650 000 people, leading to over 350 000 deaths worldwide annually1. It has been reported that the 5-year overall survival rate is approximately 50% for treated HNSCC patients2. The current gold-standard therapy protocol consists of radical surgical resection followed by adjuvant radiotherapy as monotherapy, definitive chemoradiotherapy followed by chemotherapy or targeted therapy3. Despite advances in the treatment of HNSCC, after curative treatment patients who will develop recurrent can be as high as 50%, which render the major obstacles to long-term survival in HNSCC4.

HNSCC is a heterogeneous group, comprising different subsets with distinct outcomes. This heterogeneity may be ascribed to differences in the tumors’ biologic behaviors. Traditional prognostic factors are not helpful in predicting which patients with HNSCC will develop recurrence. Molecular investigation of HNSCC could provide information for predicting recurrence and for triaging the patients who may require and benefit from adjuvant therapies. Hence, identifying reliable and accurate predictive markers/models to screen out which subset of patients with HNSCC is vulnerable to develop recurrence is urgently needed.

As revealed by the previous genomic studies, more than 98% of the human genome is actively transcribed as non-coding RNAs (ncRNAs)5. Conventionally, these ncRNA family is roughly classified into two groups based on molecular size: small ncRNA (eg microRNA; the length is <200 nt) and long non-coding RNA (lncRNA; the length is more than 200 nt)6. Accumulating evidence has revealed that lncRNAs act as key regulators by participating in gene regulation at the transcriptional, posttranscriptional and chromosomal levels6 and are involved in large range of biological processes, particularly in cancers7,8. Compared with protein-coding RNAs, the expression patterns of the lncRNAs are more specific, which representing a vast source of largely unstudied potential molecular drivers of human cancer and can be as a new class of novel cancer biomarkers9. Previous genomewide studies have investigated the lncRNAs classifier, with accurate prediction value, as a predictor for overall survival (OS)10,11,12,13, but not for recurrence-free survival (RFS). Because OS is more likely to be influenced by post recurrence treatment and comorbidity, RFS reflects the biologic behavior more precisely for patients with HNSCC. Thus, it will be more practical and valuable to identify specific lncRNAs involved in HNSCC recurrence.

In the current study, we hypothesized that integrated nomogram incorporating genomic and clinicopathologic factors might accurately predict the recurrence of HNSCC. We selected candidate lncRNAs that significantly linked with recurrence outcome and then built a multiple-lncRNAs classifier in the training set. The lncRNAs classifier was further combined with clinicopathological factors to develop an integrated nomogram for predicting recurrence of HNSCC. We assessed the predictive ability and clinical application of the nomogram and compared it to the TNM stage. Additionally, we will validated it in an internal and external validation set.

Materials and Methods

Collection of lncRNAs data and clinicopathologic characteristics of HNSCC patients

The lncRNAs profiling data of 502 HNSCC patients and 44 normal controls were downloaded from The Atlas of ncRNA in Cancer (TANRIC)(TCGA) (http://ibl.mdanderson.org/tanric/_design/basic/query.html). The matched clinical parameters, including age, sex, primary site, smoking history, alcohol history, history of other malignancy, history of neoadjuvant treatment, lymph node neck dissection, number of lymph nodes (LNs), number of positive LNs, margin status, tumor grade, clinical T stage, clinical N stage, clinical TNM stage, fraction genome altered, mutation count, and RFS time were obtained from cBioPortal (http://www.cbioportal.org/). The RFS was time from final surgical excision to recurrence. Patients not having a recurrence or those patients who died without recurrence were censored at the time of last follow-up. After removing patients without available RFS information or the unavailability of lncRNAs data, a total of 371 HNSCC patients were used for further analysis. The TNM stage of HNSCC adopted American Joint Committee on Cancer (AJCC) tumor-node -metastasis (TNM) stage system seventh edition on the basis of database provided. HPV status determined by RNA-Seq analysis was consistent with HPV status defined by in situ hybridization.p16 staining is an indirect method of HPV detection by immunohistochemical technique, and is considered less accurate than measurement of HPV RNA expression, therefore RNA-Seq analysis was used as a primary measure of HPV status in our analysis. Subsequently, 371 HNSCC patients were randomly assigned to a training set (N = 187) and a validation set (N = 184) by R software. Moreover, GSE65858 dataset (270 HNSCC tissue samples and 30 adjacent non-tumor tissue samples, and 270 tumor samples had complete information of recurrence status and recurrence-free survival time information) from Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/) was used for external validation.

Construction and validation of lncRNAs classifier for RFS

Initially, moderated t-statistics method and Benjamini–Hochberg procedure were used to identify distinct lncRNAs between HNSCC tissues and normal tissues. The cut-off criteria of differential lncRNAs was P < 0.05 and the false discovery rate (FDR) < 0.05. Then univariate Cox regression analysis was used to select RFS-related lncRNAs in the training set (P < 0.05). After primary filtration, Least absolute shrinkage and selection operator (LASSO) logistic regression analysis14, with penalty parameter tuning conducted by 10-fold cross-validation, was built to pick out candidate lncRNAs, and final performed L1 penalized Cox analysis to further narrow lncRNAs in the training set15. After layers of screening, these eligible lncRNAs was constructed a classifier. According to the expression levels of each sample and corresponding coefficients for each of them, we calculated the risk scores of HNSCC patients and then divided patients into high-risk and low-risk subgroup based on the optimal cut-of value, which was chosen with the maximal sensitivity and specificity in receiver operating characteristic (ROC) curve (time-independent) in the training set. The RSF difference between high-risk group and low-risk group were further compared by the Kaplan-Meier analysis. Meanwhile, P-values and hazard ratio (HR) with 95% confidence interval (CI) were generated by Log-rank tests. Additionally, considering the human papillomavirus (HPV) is very important parameter for HNSCC patients, we performed a sensitivity analysis by excluding these cases of oropharynx. Furthermore, stratified analysis base on various clinical characteristics (eg. HPV status, TNM stage) is conducted to evaluate the discrimination ability of lncRNAs signature in TCGA cohort and in GEO cohort, respectively. Given HPV variables existing missing value in TCGA cohort, we perform stratified analysis in entire dataset.The flowchart of the present study was shown in Fig. 1.

Figure 1
figure 1

The flowchart of study design. LASSO: least absolute shrinkage and selection operator.

Development and validation of genomic-clinicopathologic nomogram

To build a genomic-clinicopathologic nomogram, we used univariate and multivariate Cox regression analysis to identify clinical risk parameters associated with RFS in the training set. Then, the lncRNAs classifier, together with the risk parameters, were used to develop an integrated nomogram in the training set.

The performance of model was evaluated by the calibration and discrimination. Discrimination is the models ability to distinguish between patients who recur from HNSCC and patients who will not. The concordance index (C-index) was calculated to evaluate the discrimination. Besides, based on the score generated by the nomogram, we illustrated discrimination by dividing the dataset into three groups. We plotted a Kaplan–Meier curve for all three groups. In additional, calibration curves were assessed graphically by plotting the observed rates against the nomogram predicted probabilities.

ROC analysis was used to assess and compare the discrimination ability of the nomogram with TNM stage and lncRNAs-based classifier. Clinical usefulness and net benefit of the predictive models were estimated with decision curve analysis (DCA)16 and compared to traditional TNM stage or lncRNAs classifier.

Sample size

To develop a prediction nomogram with time-to-event data, the sample size should be based on the events-per-variable (EPV). This must be greater than or equal to 10. In our sample there were a total of 77 recurrences, which allows us to construct a prediction nomogram with a maximum of six predictors (EPV = 62/6 = 10.3 ≥ 10) in the training cohort and a maximum of seven predictors (EPV = 77/7 = 11 ≥ 10) in validation cohort,

Statistical analysis

Normally distributed data were described as mean (standard deviation [SD]) whereas non-normally distributed data were expressed as median (interquartile ranges [IQR]). Categorical variables are provided as proportions (%).After classifying the patients with cancer recurrence, we calculated the best cutoff values of number of Lymph nodes, number of positive LNs, mutation count and fraction genome altered, which was a point when the Youden index (sensitivity + specificity − 1) reached the maximum value through receiver operating curve (ROC) analysis.

If there were missed values in some of the potential predictors, these missing data would be imputed, as full case analysis would improve the statistical power and reduce potentially biassed result17. Multiple imputation was used to interpolate the missing data as the missing data were considered missing at random after analyzing patterns of them18.

LASSO analysis was performed with “glmnet” packages, and ROC analysis was done with “timeROC” and “survivalROC” packages. The nomogram and calibration plots were generated with “rms” packages, and DCA was performed with the “stdca.R”.

SPSS statistics 22.0 and R software (R version 3.5.2) were used to conduct the statistical analysis. A two sided P < 0.05 would be recognized as statistically significant.

Ethics approval and consent to participate

Institutional ethical approval was not required as data was acquired from publicly available databases TANRIC and cBioPortal, and the Written informed consents had been attained from the patients before our study.

Results

Demographic parameters and RFS outcome of HNSCC patients

In the current study, 371 HNSCC patients with available lncRNAs data and corresponding clinicopathologic information were included. The basic clinicopathologic characteristics of HNSCC patients were summarized in Table 1. The median follow-up times of 20.83 months (range: 1.81 to 180.03 months) and 20.17 months (range: 1.51 to 172.54 months) for the training and validation cohorts, respectively. Of all the 371 LSCC patients, 139 patients (37.5%) developed recurrence during follow-up. The estimated 3-year and 5-year RFS rates were 64% (56.2–71.8%) and 55.4% (44.4–66.4%) in the training set, respectively. Similarly, the estimated 3-year and 5-year RFS rates were 57.6% (49.6–65.6%) and 47.3% (37.1–57.5%) in the validation set, respectively.

Table 1 Characteristics of patient in the training set and validation set from TANRIC (n = 371).

Development and validation of lncRNAs-based classifier

First, 1446 distinct lncRNAs between HNSCC tissues and normal tissues were obtained basing on the filter criteria described on the section of Methods (Supplementary Material 1). Then, using univariable Cox regression analysis, we identified 32 RFS related lncRNAs in the training set (Supplementary Material 2). Next, the selected 32 RFS related lncRNAs were entered into LASSO logistic regression model and 26 had non-zero coefficients (Fig. S1).Final, we used a LASSO Cox regression model to further narrow down RFS-related lncRNAs for patients with HNSCC in the training cohort, which were AC012531.2, AC020551.1, AC020637.1, AC076966.1, AC079789.1, AC090826.2, AC092132.1, AC097521.2, AC104051.2, AC145207.3, ADARB2.AS1, AL122019.1, AL138974.1, ATP6V1B1.AS1, LINC02471 (Fig. 2A,B).On the basis of the coefficients weighted by LASSO Cox regression analysis, a classifier was developed, and the risk score was as follows: risk score = (−0.02235* AC020637.1) + (0.01734* AC020551.1) + (0.00017* AC020637.1) + (−0.00203* AC076966.1) + (0.06052* AC079789.1) + (−0.00037* AC090826.2) + (0.00943* AC092132.1) + (0.00188* AC097521.2) + (0.01343* AC104051.2) + (0.00086 * AC145207.3) + (0.00513* ADARB2.AS1) + (0.00285* AL122019.1) + (0.01173* AL138974.1) + (0.00176* ATP6V1B1.AS1) + (0.00116* LINC02471). Using ROC curve to generate the optimal cutoff value for the risk score, patients were categoried into high-risk group and low-risk group. As was shown at Fig. S2, patients with high risk score were more likely to develop recurrence and had shorter RFS than those with low risk score in the training set (5.93 vs 29.2 months, HR = 4.92, 95%CI: 2.98–8.09, P < 0.0001)(Fig. S3A). Likewise, the lncRNAs classifier could also classify patients into the high-risk and the low-risk subgroup by the same cut-off value in the internal validation set and the external validation set. The median RFS time of high-risk patients was shorter than low-risk patients in the internal validation set (14.22 vs 27.2 months, HR = 1.941, 95%CI: 1.28–2.94, P < 0.0001) (Fig. S3B), the external validation set (12.12 vs 54.6 months, HR = 6.735, 95%CI: 3.802–11.93, P < 0.0001) (Fig. S3C). Additionally, the lncRNAs classifier showed favorable predictive efficacy, with AUC of 0.833 (3 year RFS) and AUC of 0.771 (5 year RFS) in the training cohort, as well as with AUC of 0.695 (3 year RFS) and AUC of 0.718 (5 year RFS) in the internal validation cohorts, as well as with AUC of 0.846 (3 year RFS) and AUC of 0.79 (5 year RFS) in the external validation cohort, respectively (Fig. S3D–F). Furthermore, we performed a sensitivity analysis by excluding these cases of oropharynx. As Fig. S4 show, the LncRNAs classifier showed similar predictive efficacy between non-oropharynx HNSCC patients and entire HNSCC patients, with AUC of 0.822 (3 year RFS) and AUC of 0.756 (5 year RFS) in the training cohort as well as with AUC of 0.717 (3 year RFS) and AUC of 0.701 (5 year RFS) in the internal validation cohorts. Finally, 15 lncRNAs signature in subsets of patients with different clinical variables were analyzed by stratification analysis in TCGA cohort and GEO cohort. When stratified according to clinical variables (HPV status, TNM stage), 15 lncRNAs signature remained a clinically and statistically significant prognostic model in TCGA cohort (P < 0.0001) (Fig. S5) and in GEO cohort (P < 0.0001) (Fig. S6).

Figure 2
figure 2

(A) fifteen lncRNAs selected by LASSO Cox regression analysis. The two dotted vertical lines are drawn at the optimal values by minimum criteria (left) and 1 - s.e. criteria (right). (B) LASSO coefficient profiles of the 26 lncRNAs. A vertical line is drawn at the optimal value by minimum criteria and results in fifteen non-zero coefficients. Fifteen lncRNAs—AC012531.2, AC020551.1, AC020637.1, AC076966.1, AC079789.1, AC090826.2, AC092132.1, AC097521.2, AC104051.2, AC145207.3, ADARB2.AS1, AL122019.1, AL138974.1, ATP6V1B1.AS1, LINC02471—with coefficients −0.02235, 0.01734, 0.00017, −0.00203, 0.06052, −0.00037, 0.00943, 0.00188, 0.01343, 0.00086, 0.00513, 0.00285, 0.01173, 0.00176, 0.00116, respectively, were selected in the LASSO Cox regression model.

Development and Validation of genomic-clinicopathologic nomogram

Using univariate Cox analysis, we identified four variables, including number of positive LNs, margin status, mutation count and lncRNAs classifier, were associated with RFS in the training set (Table 2). Multivariable analysis continued to verify that number of positive LNs, margin status, mutation count and lncRNAs classifier, were independent risk factors for RFS in the training set. On the basis of the multivariate analysis of RFS, we built genomic-clinicopathologic nomogram to predict1-year, 3-year and 5-year RFS (Fig. 3). The C-index of the integrated nomogram was 0.76 (0.72–0.79) (Table 3) and the calibration plots exhibited good consistency between the predicted probability and the actual probability for 3-year and 5-year RFS (Figs. 4A and S7A).Likewise, consistent results were also found in the validation set. The C-index of the integrated nomogram in the validation set was 0.74 (0.71–0.76) (Table 3), and also showed good coincide between the predicted RFS and the actual RFS (Figs. 4B and S7B). Besides, the tertiles of all the total points were used to divide the patients into high-, intermediate- and low-risk groups. The Kaplan-Meier analysis (Log-rank P < 0.0001) of the three risk subgroups indicated the great utility of the integrated nomogram in training set (Fig. S8A) and in validation set (Fig. S8B).

Table 2 Univariable and multivariable Cox regression analysis for prediction of RFS.
Figure 3
figure 3

(A) Nomogram for predicting1-year, 3-year and 5-year RFS probability of HNSCC after radical surgery. To estimate risk, calculate points for each variable by drawing a straight line from patient’s variable value to the axis labeled “Points.” Sum all points and draw a straight line from the total point axis to the1-year, 3-year and 5-year RFS axis.

Table 3 Assessing the prediction performance of the TNM stage, LncRNA classifier and nomogram in training set and validation set.
Figure 4
figure 4

(A,B) ROC curves compare the prognostic accuracy of the nomogram with TNM staging or lncRNAs classifier in predicting survival probability in the training set and in the validation set. (C,D) Decision curve analysis for the nomogram, TNM staging and lncRNAs classifier in prediction of recurrence of patients in the training set and in the validation set.

Comparison of predictive performance and clinical usefulness between nomogram and TNM stage or lncRNAs classifier

To further evaluate the predictive ability of the genomic-clinicopathologic nomogram, we compared the C-index and ROC analysis results of integrated nomogram with TNM stage and lncRNAs classifier in the training set and validation set. As was shown at Table 3, the C-index of integrated nomogram was higher than that of TNM stage (0.57 (0.52–0.59) in the training set, and 0.55 (0.52–0.58) in the validation set) and the lncRNAs classifier (0.67 (0.64–0.70) in the training set, and 0.63 (0.61–0.65) in the validation set). Likelihood ratio test, linear trend χ2 test and akaike information criterion all demonstrated that the integrated nomogram had better prediction efficiency than the TNM stage or lncRNAs classifier alone. Similar to C-index, ROC analysis also indicated that the integrated nomogram (AUC 0.809 for the training set, and 0.845 for the validation set) was better than TNM stage (AUC 0.58 for the training set, and 0.542 for the validation set) or lncRNAs classifier (AUC 0.712 for the training set, and 0.637 for the validation set) alone in predicting RFS (Fig. 5A,B).

Figure 5
figure 5

ROC curves compare the prognostic accuracy of the nomogram with TNM staging or lncRNAs classifier in predicting survival probability (A) in the training set and (B) in the validation set.

Finally, DCA was used to compare the clinical usability of the integrated nomogram to that of traditional TNM stage and lncRNAs classifier. Based on a continuum of potential thresholds for death (x axis) and the net benefit of using the model to risk-stratify patients (y axis) relative to assuming all patients will recur, the DCA graphically presented that the nomogram was better than traditional TNM stage or lncRNAs classifier (Fig. 6A,B).

Figure 6
figure 6

Decision curve analysis for the nomogram, TNM staging and lncRNAs classifier in prediction of recurrence of patients (A) in the training set and (B) in the validation set.

Discussion

Analyzing HNSCC lncRNAs profiling data and corresponding clinicopathologic variables of 371 HNSCC patients from TANRIC and cBioPortal, we identified fifteen lncRNAs relevant to RFS. According to these lncRNAs, we developed a lncRNAs classifier, which could accurately classified patients into high-risk group and low-risk group. Additionally, we developed a visually integrated nomogram, combining lncRNAs classifier and clinicopathologic parameter to predict recurrence in HNSCC patients underwent surgery resection. The nomogram effectively predicted recurrence risk, with a bootstrapped corrected C-index of 0.76 and AUC of 0.809, which presented better predictive ability and clinical usability than TNM stage alone.

A vast of studies have found that lncRNAs may be exploited as potential effective biomarkers in diagnosis, progression and prognosis of HNSCC19,20,21,22,23.

Analyzing Sixty-five HNSCC formalin-fixed and paraffin-embedded samples, Guan et al.19 revealed that H19 was significantly overexpressed in HNSCC cancer cells and patients in in contrast to adjacent normal specimens. Higher expression of H19 was correlated with tumor recurrence and is considered as prognostic factors for disease free survival, regardless of other confounders. A study in 19 HNSCC patients by Haque et al.20, using a quantitative real-time polymerase chain reaction array that interrogates lncRNA with established involvement in numerous cancers, uncovered that low MEG3 expression of seven differential expression lncRNA, including SPRY4-IT1, HEIH, LUCAT1, LINC00152, HAND2-AS1, MEG3, and TERC, was related to more favorable 3-year RFS. A study of lncRNAs microarray by Wu et al.21 found that high expression of lncRNA LOC541471 was significantly related with risk of perineural invasion and lymph node metastasis classification. According to multivariate Cox regression analysis, high expression of lncRNA LOC541471 was an independent predictor for poor RFS. Recently, Diao et al.22 identified ZEB2-AS1 as a putative oncogenic lncRNA and a novel prognostic biomarker in HNSCC, revealed that overexpression of ZEB2-AS1 associates with tumor aggressiveness and unfavorable prognosis. Notably, Troiano et al.23 performed a meta-analysis systematically and quantitatively to evaluate prognostic value of lncRNA HOTAIR in HNSCC, verified that high expression of HOTAIR, as a biomarker of aggressiveness, was linked with lymph-node metastasis (odds ratio (OR), 3.31; 95% CI: [1.24, 8.79]; P = 0, 02). These studies hinted the potential clinical implications of lncRNA in improving the recurrence prediction of HNSCC. Nevertheless, small numbers of patients and single lncRNA with an unacceptable level of suitability or precision limited the clinical applications. A classifier, comprising multiple lncRNAs, can remarkably enhance the accuracy of prediction in various cancers, such as breast cancer, hepatocellular carcinoma and gastric cancer24,25,26. It should be noted that the lncRNAs classifier predicting the RFS outcome of HNSCC has not been reported yet.

To the best of our knowledge, this is the first study constructed an inclusive nomogram, combining lncRNAs classifier and clinicopathologic factors, for predicting recurrence probability in patients with HNSCC. We built a lncRNAs classifier, consist of AC012531.2, AC020551.1, AC020637.1, AC076966.1, AC079789.1, AC090826.2, AC092132.1, AC097521.2, AC104051.2, AC145207.3, ADARB2.AS1, AL122019.1, AL138974.1, ATP6V1B1.AS1, and LINC02471, could effectively categorized patients into high-risk status with shorter RFS and low-risk status with longer RFS. In additional, we identified four independent predictors, namely, number of positive LNs, margin status, mutation count and lncRNAs classifier, which were all assembled into the nomogram. In this study, in consideration of homogeneity, and ability of discrimination and risk stratification of the model, the performance of the nomogram in predicting recurrence ability is superior to the TNM staging system. The strength of the current nomogram is that it integrated genomic and clinicopathological variables, which are important for predicting recurrence risk, but cannot be adopted by TNM stage system. Remarkably, DCA results showed that HNSCC recurrence-related treatment decision based on the nomogram led to more net benefit than treatment decision based on TNM stage, or treating either all patients or none. Taken together, the present nomogram would be clinically useful for the clinicians in tailoring recurrence-associated treatment decision.

Among the fifteen RFS-related lncRNA, ADARB2.AS1, and LINC02471 have been previously reported to be related with cancers, including breast cancer, pancreatic ductal adenocarcinoma and papillary thyroid carcinoma27,28,29. ADARB2-AS1, with highest k-core score, was recognized as core genes in HER-2-enriched subtype breast cancer, which might hopefully become novel molecular biomarkers and therapeutic targets27. Subsequently, Permuth et al.28, analyzing plasma from 57 intraductal papillary mucinous neoplasms (IPMNs) IPMN cases and 24 non-diseased controls frequency-matched by age-group and gender, appraised an 8-lncRNA signature (ADARB2-AS1, ANRIL, GLIS3-AS1, LINC00472, MEG3, PANDA, PVT1, and UCA1) which possessed greater accuracy than standard clinical and radiologic features in differentiating indolent/benign IPMNs from aggressive/malignant IPMNs than standard clinical and radiologic features. Cai et al.29, using the Cancer Genome Altas (TCGA) database, uncovered that LINC02471 was closely associated with the tumor stage, lymph node metastasis, metastasis and pathological stage of papillary thyroid carcinoma, which could reflect behavior of tumor progression in a more exact way and could function as molecule biomarkers for tumor progression and prognosis. However, other LncRNA (AC012531.2, AC020551.1, AC020637.1, AC076966.1, AC079789.1, AC090826.2, AC092132.1, AC097521.2, AC104051.2, AC145207.3, AL122019.1, AL138974.1, and ATP6V1B1.AS1), which maybe provide new insights into HNSCC development and progression, have not been thoroughly investigated. Hence, further characterization of molecules should be detected to explore potential application value.

Consistent with previous trials, number of positive LNs, was associated with higher risk of recurrence among patients with postoperative HNSCC, which is in agreement with other studies30,31. According to ROC analysis, we selected 3 as optimum cut-off point, more than 3 positive LNs is an independent risk factors for recurrence. Recently, Zumsteg et al.32 found that there was no benefit from postoperative adjuvant chemoradiation in patients with 0–2 positive LNs, while more than 3 positive LNs can significantly benefit from postoperative adjuvant chemoradiation. What’s more, the author discovered association between number of lymph node burden and the efficacy of postoperative adjuvant chemoradiation have an approximate positive linear trend. Similarly, margin status and mutation count were frequently reported risk factors of recurrence for patients with HNSCC, including oral cavity, oropharyngeal cancers, laryngeal carcinoma and so on33,34,35. In addition to these clinicopathologic factors, as expected, the lncRNAs classifier was an effective independent risk variables for the recurrence of patients with HNSCC.

Although our nomogram demonstrated impressive performance in LSCC recurrence prediction, there are specific limitations associated with our trial. First, the presented nomogram based only on single public database, are not yet suitable for general use prior to validation of the predictive models with external datasets. So external and multicenter prospective cohorts with large sample sizes are still needed to validate the clinical application of our model.

Second, Missing variables were a source of defect in this evaluation. We did not investigate identified factors associated with recurrence, such as extracapsular spread30,34, lymphovascular invasion status34, perineural invasionas34 and human papillomavirus (HPV)36 as important parameters for HNSCC patients, weren’t well recorded in database. A recent large study, using centralized testing and controlling for other risk factors, examined the prognostic utility of HPV biomarkers among HNSCC across different global regions37. HPV positivity were strong biomarkers for improved survival among HNSCC. In additional, HPV positive patients were sensitive to radiotherapy and chemotherapy as well as showed superior survival38,39. Hence, we recommend that future studies should added value of those factors in a multivariable prediction model to further improve the accuracy of prediction in HNSCC patients.

Third, our study included a variety of tumors in the head and neck region, such as oral cavity, tongue oropharynx, oral tongue, hypopharynx, larynx cancer and so on. Though they stemmed from epithelial squamous cells, there existing marked heterogeneity between them. On account of lack of enough simple size for a specific tumor, with less more 100 patients for single cancer, we cannot constructed specific nomogram to estimate conditional risk of type-specific recurrence, which maybe reduce the accuracy of prediction. Even so, our estimation based on the predictive nomogram yielded similar C-index on the validation datasets and was significantly superior to TNM stage for recurrence prediction.

Fourth, we do not explore the underlying biological function and pathways of the lncRNAs, so further studies are needed to uncover the related mechanisms.

Conclusion

We have built visually comprehensive nomogram, incorporated genomic and clinicopathologic factors, for the prediction of recurrence in patients with HNSCC. It seem to be a more effective tool for HNSCC recurrence prediction, compared to TNM stage in terms of the predictive value and clinical usability. The integrated nomogram may help clinicians to make more fitly individualized therapeutic strategies for HNSCC patients.