一个高性能代码编辑器构建历程

前言

为什么要做一个编辑器

很久之前刚从 ModPE 转 Android 开发圈子时遇到了启蒙的一个编辑器 —— AIDE
当时对 Java 基本一窍不通,恰巧拿到了 AIDE 这个软件
里面有 Android 开发的教程,我也算是半只脚踏入了 Android 开发圈吧

后面的话,学了几年但也只是皮毛
之后某天突然有个想法,就是移植 CreateJS 实现自己的代码编辑,运行打包的功能

CreateJS —— 利用 rhino, apkeditor, xml2view, textwarrior, android-rhino 实现的伪 IDE
作者目前实现的功能:

1.解释 js (非编译)

2.打包 app (非编译,签名时有BUG)

3.打包 modpkg (非编译)

4.编辑器高亮,伪补全

(var 与function 声明的变量名/函数名还未实现作用域)

主题完全自定义(高亮具有 BUG,具体在正则,int,与字符串)

就这样,我 CopyCreateJS 的编辑器到我的 APP 上(有点不耻
当时还没有买电脑,用的是 AIDE 嘛,然后一打开编辑器准备研究研究,结果愣是一个也看不懂….
(顺便吐槽 LingSaTuo ,每当我想联系他时都会是: “海不会不蓝,我不会不在,除了……”)
后面才发现 群的 Celivad 就是 LingSaTuo ….
然后嘛我也是感谢了他一波
现在回过头来再看看,Lexer 不就词法分析嘛,AutoCompletePanel 不就自动补全面板嘛
于是经历了一阵硬磕后,还是做出了第一个编辑器(虽然是别人的编辑器)

CideCompat 是我的第一个编辑器

可编辑和执行大部分代码, 有代码语法高亮和自动补全(已废弃)

后面又尝试做了两个 IDE,还是失败了,应为没开源所以这里就不多叙述了
时间来到 2022 年,这一年我的 Android 技术已经算渡劫镜了
为了新 IDE Reverse (溯),我决定自己写一个编辑器
之前看到了个关于编辑器的博客,一瞬间想起来了,再次去翻,闭门造车

Sora-editor 是一款酷炫优化的安卓平台代码编辑器

Rosemoe 制作

经过一段时间的闭门造车学习,下面开始吧!

MuCodeEditor -> 一款快速渲染的代码编辑器,还在添加功能中…

这是我现在研究成功的编辑器,能高亮、自动补全,还在加功能

若你要研究的话别忘记给 Star 啊,不能白嫖好吧!

制作过程

绘制的坑

既然要写个编辑器,那我们肯定得看 Android 的绘制嘛
找到 canvas.drawTextCTRL 按着点它!
我们看到它调用了 super.drawText,继续跟进
终于在 BaseCanvas 中找到了 drawText 方法,我们来分析分析:
在第二个 if 中,我们发现他去判断了 text 的类型
如果是 StringSpannedStringSpannableString
它就会吧它 toString 一次!
我们都知道在 onDraw 方法中尤其要小心对象的创建,如果 onDraw 中创建了对象
然后我们又频繁 invalidate 就会造成内存抖动(即上次的对象刚被 gc 这次又创建了对象,长此已往就会导致界面卡顿)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
public void drawText(@NonNull CharSequence text, int start, int end, float x, float y,
@NonNull Paint paint) {
if ((start | end | (end - start) | (text.length() - end)) < 0) {
throw new IndexOutOfBoundsException();
}
throwIfHasHwBitmapInSwMode(paint);
if (text instanceof String || text instanceof SpannedString ||
text instanceof SpannableString) {
nDrawText(mNativeCanvasWrapper, text.toString(), start, end, x, y,
paint.mBidiFlags, paint.getNativeInstance());
} else if (text instanceof GraphicsOperations) {
((GraphicsOperations) text).drawText(this, start, end, x, y,
paint);
} else {
char[] buf = TemporaryBuffer.obtain(end - start);
TextUtils.getChars(text, start, end, buf, 0);
nDrawText(mNativeCanvasWrapper, buf, 0, end - start, x, y,
paint.mBidiFlags, paint.getNativeInstance());
TemporaryBuffer.recycle(buf);
}
}

继承 View

首先你得继承自 View
什么?你问我为啥不继承 EditText 或者 TextView ?
刑啊你可以自己去试试,看看到时候不卡死你
就这么和你说吧,这俩控件不能适配大文本,你文本一多就得凉
究其原因还是缓存处理问题

当然你可以继承 SurfaceView,以实现 主线程外绘制
这样做的好处很明显,就是不会影响到 APP 的正常运行
你又问我那为什么要继承 View ?
主要是之前尝试过,但是发现要一直 while(true) 去更新编辑器很难受
还有一点就是 SurfaceView 无法使用 scrollToscrollBy 方法
我之前就卡在处理 x 轴的滚动操作了….
当然我还是很鼓励你去继承 SurfaceView 的,谁能实现谁的编辑器起码堪比 MT
下面贴出关键实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
@Suppress("LeakingThis", "unused", "MemberVisibilityCanBePrivate", "DEPRECATION")
open class MuCodeEditor @JvmOverloads constructor(
context: Context, attrs: AttributeSet? = null, defStyleAttr: Int = 0,
) : View(context, attrs, defStyleAttr) {

//代码编辑器的控制器,用于更改属性什么的
protected open val mController = EditorController(this)

//文本提供者
protected open val mContentProvider = ContentProvider(this)

//手势事件处理
protected open val mEventHandler = EventHandler(this)

//手势捕捉
protected open val mGestureDetector = createGestureDetector()

//绘制器,用于绘制编辑器的
protected open val mPainter: EditorPainter

// 存放画笔
protected open val mPaints: EditorPaints

init {
if (context is Activity) {
context.window.setSoftInputMode(WindowManager.LayoutParams.SOFT_INPUT_ADJUST_RESIZE)
}

//初始化绘制器
mPainter = EditorPainter(this, mContentProvider)
//推送代码控制器

mPaints = EditorPaints()
}

//绘制
@UnsupportedUserUsage
override fun onDraw(canvas: Canvas) {
//绘制类
mPainter.onDraw(canvas)
}

}

这里面的 EditorController 是个控制器,用于决定编辑器的属性功能等
ContentProvider 是我们的内容提供类,提供对内容的增删查操作
EventHandler 就是我们的手势处理类了,用来处理手势操作
EditorPainter 是个类,其中的 onDraw 方法就是实际绘制了,这里是减少代码的耦合

实现词法分析器 - Lexer

写编辑器必要的就是词法分析器和高亮了,我们先来说说词法分析器吧
Lexer 的作用就是将内容进行扫描,然后变成一个个词组
举个例子吧:比如英语句子就要用空格来断词组,如果不断词组的话 Hello World 就成了 HelloWorld
你没法第一时间看出他是什么,因此便有了空格断词
这边 Lexer 是编译原理中最开始的步骤了,所以分析一定要准确
其中 Lexer 的编写部分我们慢慢来,让你听懂
Lexer 最有关联的自然就是 Token 了,中文意思是种别码,你也可以翻译成标志
Token 既是词法分析器分析完毕后的每个结果,我们可以说这是一个 Token
比如 Kotlin 中的 val 常量关键字我们的 Token 就是 VAL,一般习惯为大写
下面来讲讲 Lexer 的实现

编写 Lexer 的几种方式

1.循环分析法:即使用堆循环的方式凑成一个个 Token,以此来进行分析的方法
特点:速度快,灵活
缺点:代码维护难,经常加或改无从下手
2.递归分析法:即使用递归去分析内容,以此进行分析的方法(后经实验发现此方法不行,因为方法栈越深越容易 StackOverFlowError)
特点:代码维护简单,编写快
缺点:使用递归会带来额外的调用栈性能损耗,你可以使用尾递归来降低

使用 BaseLexer 为子类铺路

BaseLexer 就是 Lexer 的基类,在此我们进行判断属于操作
mTokens 存储已经扫描过的所有 Token
sources 存储要扫描的所有行源
scannedColumnSource 就是当前扫描到的行源
column 就是当前行
row 就是当前列
scannerChar 就是当前扫描到的 Char
keywordTables 就是关键字表
symbolTables 就是符号表
analyze 就是进行词法分析的啦!
来看看代码叭

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
@Suppress("MemberVisibilityCanBePrivate", "LeakingThis")
abstract class BaseLexer<T : BaseToken> {

protected val mTokens: MutableList<Pair<T, Pair<ColumnRowPosition, ColumnRowPosition>>> =
ArrayList()

protected lateinit var sources: List<CharSequence>
private set

protected lateinit var scannedColumnSource: CharSequence

protected var column = 1

protected var row = 0

protected var scannedChar = '\u0000'

protected lateinit var mSpecialTables: Map<String, T>

protected lateinit var mKeywordTables: Map<String, T>

protected lateinit var mSymbolTables: Map<Char, T>

open fun getSpecialTables(): Map<String, T> {
return mSpecialTables
}

open fun getKeywordTables(): Map<String, T> {
return mKeywordTables
}

open fun analyze() {

}

protected open fun addToken(token: T, startPos: ColumnRowPosition, endPos: ColumnRowPosition) {
mTokens.add(token to (startPos to endPos))
}

protected open fun getChar() {
scannedChar = scannedColumnSource[row]
}

protected open fun yyChar() {
++row
scannedChar = if (isNotRowEOF()) {
scannedColumnSource[row]
} else {
'\u0000'
}
}

open fun columnSize(): Int {
return sources.size
}

open fun rowSize(): Int {
return scannedColumnSource.length
}

open fun setSources(sources: List<CharSequence>) {
clearAll()
this.sources = sources

column = 1
row = 0
}

open fun getTokens(): List<Pair<T, Pair<ColumnRowPosition, ColumnRowPosition>>> {
return mTokens
}

open fun isNotRowEOF(): Boolean {
return !isRowEOF()
}

open fun isRowEOF(): Boolean {
return row >= rowSize()
}

open fun isNearRowEOF(): Boolean {
return row + 1 == rowSize()
}

open fun isWhitespace(): Boolean {
return scannedChar == ' '
}

open fun isWhitespace(target: Char): Boolean {
return target == ' '
}

open fun isLetter(): Boolean {
return scannedChar in 'a'..'z' || scannedChar in 'A'..'Z'
}

open fun isLetter(target: Char): Boolean {
return target in 'a'..'z' || target in 'A'..'Z'
}

open fun isDigit(): Boolean {
return scannedChar in '0'..'9'
}

open fun isDigit(target: Char): Boolean {
return target in '0'..'9'
}

open fun isSymbol(): Boolean {
return mSymbolTables.containsKey(scannedChar)
}

open fun isSymbol(target: Char): Boolean {
return mSymbolTables.containsKey(target)
}

open fun isKeyword(buffer: String): Boolean {
return mKeywordTables.containsKey(buffer)
}

open fun isSpecial(buffer: String): Boolean {
return mSpecialTables.containsKey(buffer)
}

open fun clearAll() {
mTokens.clear()
}

protected abstract fun setup()

init {
setup()
}

}

创建 BaseToken

BaseToken 就是所有 Token 的父类,这么写是为了泛型的转型方便
另外就是为之后的高亮铺路

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
abstract class BaseToken(
protected val type: CodeEditorColorToken?,
protected val value: String,
) {

open fun valueCount(): Int {
return value.length
}

open fun getColorType(): CodeEditorColorToken? {
return type
}

override fun toString(): String {
return value
}

}

创建 EcmaScriptToken 进行 ES Token 分词

下面我们就需要来写 Token
注意嗷,界符什么的都要加进去,可别少了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
@Suppress("MemberVisibilityCanBePrivate")
class EcmaScriptToken private constructor(type: CodeEditorColorToken, value: String) : BaseToken(type, value) {

companion object {

val REGEX = EcmaScriptToken(CodeEditorColorToken.SPECIAL_COLOR, "/regex/mode")
val SINGLE_COMMENT = EcmaScriptToken(CodeEditorColorToken.COMMENT_COLOR, "//")
val MULTI_COMMENT_START = EcmaScriptToken(CodeEditorColorToken.COMMENT_COLOR, "/*")
val MULTI_COMMENT_PART = EcmaScriptToken(CodeEditorColorToken.COMMENT_COLOR, "Comment")
val MULTI_COMMENT_END = EcmaScriptToken(CodeEditorColorToken.COMMENT_COLOR, "*/")

val SINGLE_STRING = EcmaScriptToken(CodeEditorColorToken.STRING_COLOR, "\"String\"")
val TEMPLATE_STRING = EcmaScriptToken(CodeEditorColorToken.STRING_COLOR, "`Template`")

val DIGIT_NUMBER =
EcmaScriptToken(CodeEditorColorToken.NUMERICAL_VALUE_COLOR, "DIGIT_NUMBER")
val IDENTIFIER = EcmaScriptToken(CodeEditorColorToken.IDENTIFIER_COLOR, "IDENTIFIER")

val WHITESPACE = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "WHITESPACE")
val NEW_LINE = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "NEW_LINE")

val FALSE = EcmaScriptToken(CodeEditorColorToken.SPECIAL_COLOR, "FALSE")
val TRUE = EcmaScriptToken(CodeEditorColorToken.SPECIAL_COLOR, "TRUE")
val NAN = EcmaScriptToken(CodeEditorColorToken.SPECIAL_COLOR, "NAN")
val UNDEFINED = EcmaScriptToken(CodeEditorColorToken.SPECIAL_COLOR, "UNDEFINED")
val NULL = EcmaScriptToken(CodeEditorColorToken.SPECIAL_COLOR, "NULL")

val VAR = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "VAR")
val LET = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "LET")
val CONST = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "CONST")

val IF = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "IF")
val ELSE = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "ELSE")
val SWITCH = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "SWITCH")
val CASE = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "CASE")
val DEFAULT = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "DEFAULT")

val FOR = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "FOR")
val WHILE = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "WHILE")
val DO = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "DO")
val BREAK = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "BREAK")
val CONTINUE = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "CONTINUE")

val FUNCTION = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "FUNCTION")
val RETURN = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "RETURN")
val YIELD = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "YIELD")
val ASYNC = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "ASYNC")
val AWAIT = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "AWAIT")

val THROW = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "THROW")
val TRY = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "TRY")
val CATCH = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "CATCH")
val FINALLY = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "FINALLY")

val THIS = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "THIS")
val WITH = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "WITH")
val IN = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "IN")
val OF = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "OF")
val DELETE = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "DELETE")
val INSTANCEOF = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "INSTANCEOF")
val TYPEOF = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "TYPEOF")

val NEW = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "NEW")
val CLASS = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "CLASS")
val EXTEND = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "EXTEND")
val SET = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "SET")
val GET = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "GET")

val IMPORT = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "IMPORT")
val AS = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "AS")
val FROM = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "FROM")
val EXPORT = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "EXPORT")

val VOID = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "VOID")
val DEBUGGER = EcmaScriptToken(CodeEditorColorToken.KEYWORD_COLOR, "DEBUGGER")

val PLUS = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, "+")
val MINUS = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, "-")
val MULTI = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, "*")
val DIV = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, "/")
val NOT = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, "!")
val MOD = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, "%")
val XOR = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, "^")
val AND = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, "&")
val QUESTION = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, "?")
val COMP = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, "~")
val DOT = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, ".")
val COMMA = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, ",")
val SEMICOLON = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, ";")
val EQUALS = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, "=")
val LEFT_PARENTHESIS = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, "(")
val RIGHT_PARENTHESIS = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, ")")
val LEFT_BRACKET = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, "[")
val RIGHT_BRACKET = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, "]")
val LEFT_BRACE = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, "{")
val RIGHT_BRACE = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, "}")
val OR = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, "|")
val LESS_THAN = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, "<")
val MORE_THAN = EcmaScriptToken(CodeEditorColorToken.SYMBOL_COLOR, ">")

fun getTokens(): List<EcmaScriptToken> {
return listOf(
DIGIT_NUMBER,
IDENTIFIER,
WHITESPACE,
VAR,
LET,
CONST,
IF,
ELSE,
SWITCH,
CASE,
DEFAULT,
FOR,
WHILE,
DO,
BREAK,
CONTINUE,
FUNCTION,
RETURN,
YIELD,
ASYNC,
AWAIT,
THROW,
TRY,
CATCH,
FINALLY,
THIS,
WITH,
IN,
OF,
DELETE,
INSTANCEOF,
TYPEOF,
NEW,
CLASS,
EXTEND,
SET,
GET,
IMPORT,
AS,
FROM,
EXPORT,
VOID,
DEBUGGER,

PLUS,
MINUS,
MULTI,
DIV,
NOT,
MOD,
XOR,
AND,
QUESTION,
COMP,
DOT,
COMMA,
SEMICOLON,
EQUALS,
LEFT_PARENTHESIS,
RIGHT_PARENTHESIS,
LEFT_BRACKET,
RIGHT_BRACKET,
LEFT_BRACE,
RIGHT_BRACE,
)
}
}

}

使用 EcmaScriptLexer 进行 ES 的词法分析

这个是递归分析的写法
即当前词组如果匹配添加到 Token 随后再次进行 scan 操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
open class EcmaScriptLexer : BaseLexer<EcmaScriptToken>() {

@Synchronized
override fun analyze() {
if (column > columnSize()) {
return
}

while (true) {

if (column > columnSize()) {
return
}

scannedColumnSource = sources[column - 1]

if (row >= rowSize()) {
++column
row = 0
continue
}

scannedColumnSource = sources[column - 1]
getChar()

if (handleWhitespace()) continue

if (handleComments()) continue

if (handleString()) continue

if (handleRegex()) continue

if (handleSymbol()) continue

if (handleSpecial()) continue

if (handleKeyword()) continue

if (handleIdentifier()) continue

if (handleDigit()) continue

++row

}

}

protected open fun handleWhitespace(): Boolean {
if (!isWhitespace()) {
return false
}

val start = row
while (isWhitespace() && isNotRowEOF()) {
yyChar()
}
val end = row
addToken(
EcmaScriptToken.WHITESPACE,
ColumnRowPosition(column, start),
ColumnRowPosition(column, end)
)
return true
}

protected open fun handleComments(): Boolean {
if (scannedChar != '/') {
return false
}

val start = row
yyChar()
if (scannedChar == '/') {
val end = rowSize()
addToken(
EcmaScriptToken.SINGLE_COMMENT,
ColumnRowPosition(column, start),
ColumnRowPosition(column, end)
)
row = end
return true
}

if (scannedChar == '*') {
val currentFindPos = scannedColumnSource.indexOf("*/", row + 1)
if (currentFindPos != -1) {
val end = currentFindPos + 2
addToken(
EcmaScriptToken.SINGLE_COMMENT,
ColumnRowPosition(column, start),
ColumnRowPosition(column, end)
)
row = end
return true
}

addToken(
EcmaScriptToken.MULTI_COMMENT_START,
ColumnRowPosition(column, start),
ColumnRowPosition(column, rowSize())
)

++column
while (column <= columnSize()) {
row = 0
scannedColumnSource = sources[column - 1]
if (scannedColumnSource.isEmpty()) {
++column
continue
}

val findPos = scannedColumnSource.indexOf("*/")
if (findPos != -1) {
val end = findPos + 2
addToken(
EcmaScriptToken.MULTI_COMMENT_END,
ColumnRowPosition(column, 0),
ColumnRowPosition(column, end)
)
row = end
return true
}

addToken(
EcmaScriptToken.MULTI_COMMENT_PART,
ColumnRowPosition(column, 0),
ColumnRowPosition(column, rowSize())
)
++column
}
return true
}

row = start
getChar()
return false
}

protected open fun handleSymbol(): Boolean {
if (!isSymbol()) {
return false
}

val token = mSymbolTables[scannedChar]!!
addToken(
token,
ColumnRowPosition(column, row),
ColumnRowPosition(column, row + 1)
)

++row
return true
}

protected open fun handleSpecial(): Boolean {
if (!isLetter()) {
return false
}

val start = row
val buffer = StringBuilder()
while (isLetter() && isNotRowEOF()) {
buffer.append(scannedChar)
yyChar()
}
val end = row
val text = buffer.toString()
if (isSpecial(text)) {
val token = mSpecialTables[text]!!
addToken(
token,
ColumnRowPosition(column, start),
ColumnRowPosition(column, end)
)
return true
}

row = start
getChar()
return false
}

protected open fun handleKeyword(): Boolean {
if (!isLetter()) {
return false
}

val buffer = StringBuilder()
val start = row
while (isLetter() && isNotRowEOF()) {
buffer.append(scannedChar)
yyChar()
}
val end = row
val text = buffer.toString()

if (isKeyword(text)) {
val token = mKeywordTables[text]!!
addToken(
token,
ColumnRowPosition(column, start),
ColumnRowPosition(column, end)
)
return true
}

row = start
getChar()
return false
}

protected open fun handleString(): Boolean {
if (scannedChar != '\'' && scannedChar != '"' && scannedChar != '`') {
return false
}

val start = row
if (scannedChar == '"') {
yyChar()

while (isNotRowEOF()) {
if (scannedChar == '"') {
yyChar()
break
}
yyChar()
}

val end = row
addToken(
EcmaScriptToken.SINGLE_STRING,
ColumnRowPosition(column, start),
ColumnRowPosition(column, end)
)
return true
}

if (scannedChar == '\'') {
yyChar()

while (isNotRowEOF()) {
if (scannedChar == '\'') {
yyChar()
break
}
yyChar()
}

val end = row
addToken(
EcmaScriptToken.SINGLE_STRING,
ColumnRowPosition(column, start),
ColumnRowPosition(column, end)
)
return true
}

if (scannedChar == '`') {
val currentFindPos = scannedColumnSource.indexOf('`', row + 1)

if (currentFindPos != -1) {
val end = currentFindPos + 1
addToken(
EcmaScriptToken.TEMPLATE_STRING,
ColumnRowPosition(column, start),
ColumnRowPosition(column, end)
)
row = end
return true
}

addToken(
EcmaScriptToken.TEMPLATE_STRING,
ColumnRowPosition(column, start),
ColumnRowPosition(column, rowSize())
)

++column
while (column <= columnSize()) {
row = 0
scannedColumnSource = sources[column - 1]
if (scannedColumnSource.isEmpty()) {
++column
continue
}

val findPos = scannedColumnSource.indexOf('`')
if (findPos != -1) {
val end = findPos + 1
addToken(
EcmaScriptToken.TEMPLATE_STRING,
ColumnRowPosition(column, 0),
ColumnRowPosition(column, end)
)
row = end
return true
}

addToken(
EcmaScriptToken.TEMPLATE_STRING,
ColumnRowPosition(column, 0),
ColumnRowPosition(column, rowSize())
)
++column
}

return true
}

row = start
getChar()
return false
}

protected open fun handleRegex(): Boolean {
if (scannedChar != '/') {
return false
}

val start = row
val currentFindPos = scannedColumnSource.indexOf('/', row + 1)
if (currentFindPos != -1) {
var end = currentFindPos
row = end
yyChar()
if (isLetter()) {
while (isLetter() && isNotRowEOF()) {
yyChar()
}
end = row
addToken(
EcmaScriptToken.REGEX,
ColumnRowPosition(column, start),
ColumnRowPosition(column, end)
)
return true
}
++end
addToken(
EcmaScriptToken.REGEX,
ColumnRowPosition(column, start),
ColumnRowPosition(column, end)
)
row = end
return true
}

row = start
getChar()
return false
}

protected open fun handleIdentifier(): Boolean {
if (isWhitespace() || isSymbol() || isDigit()) {
return false
}

val start = row
while (!isWhitespace() && !isSymbol() && isNotRowEOF()) {
yyChar()
}
val end = row

addToken(
EcmaScriptToken.IDENTIFIER,
ColumnRowPosition(column, start),
ColumnRowPosition(column, end)
)
return true
}

protected open fun handleDigit(): Boolean {
if (!isDigit()) {
return false
}

val start = row
if (scannedChar == '0') {
yyChar()
if (scannedChar == 'x') {
while ((isDigit() || isLetter()) && isNotRowEOF()) {
yyChar()
}
val end = row
addToken(
EcmaScriptToken.DIGIT_NUMBER,
ColumnRowPosition(column, start),
ColumnRowPosition(column, end)
)
return true
}
row = start
getChar()
}

while (isDigit() && isNotRowEOF()) {
yyChar()
}

val end = row

addToken(
EcmaScriptToken.DIGIT_NUMBER,
ColumnRowPosition(column, start),
ColumnRowPosition(column, end)
)
return true
}

override fun setup() {
mKeywordTables = createKeywordTable()
mSymbolTables = createSymbolTable()
mSpecialTables = createSpecialTable()
}

private fun createKeywordTable(): Map<String, EcmaScriptToken> {
return hashMapOf(
"var" to EcmaScriptToken.VAR,
"let" to EcmaScriptToken.LET,
"const" to EcmaScriptToken.CONST,
"if" to EcmaScriptToken.IF,
"else" to EcmaScriptToken.ELSE,
"switch" to EcmaScriptToken.SWITCH,
"case" to EcmaScriptToken.CASE,
"default" to EcmaScriptToken.DEFAULT,
"for" to EcmaScriptToken.FOR,
"while" to EcmaScriptToken.WHILE,
"do" to EcmaScriptToken.DO,
"break" to EcmaScriptToken.BREAK,
"continue" to EcmaScriptToken.CONTINUE,
"function" to EcmaScriptToken.FUNCTION,
"return" to EcmaScriptToken.RETURN,
"yield" to EcmaScriptToken.YIELD,
"async" to EcmaScriptToken.ASYNC,
"await" to EcmaScriptToken.AWAIT,
"throw" to EcmaScriptToken.THROW,
"try" to EcmaScriptToken.TRY,
"catch" to EcmaScriptToken.CATCH,
"finally" to EcmaScriptToken.FINALLY,
"this" to EcmaScriptToken.THIS,
"with" to EcmaScriptToken.WITH,
"in" to EcmaScriptToken.IN,
"of" to EcmaScriptToken.OF,
"delete" to EcmaScriptToken.DELETE,
"instanceof" to EcmaScriptToken.INSTANCEOF,
"typeof" to EcmaScriptToken.TYPEOF,
"new" to EcmaScriptToken.NEW,
"class" to EcmaScriptToken.CLASS,
"extend" to EcmaScriptToken.EXTEND,
"set" to EcmaScriptToken.SET,
"get" to EcmaScriptToken.GET,
"import" to EcmaScriptToken.IMPORT,
"as" to EcmaScriptToken.AS,
"from" to EcmaScriptToken.FROM,
"export" to EcmaScriptToken.EXPORT,
"void" to EcmaScriptToken.VOID,
"debugger" to EcmaScriptToken.DEBUGGER
)
}

private fun createSymbolTable(): Map<Char, EcmaScriptToken> {
return hashMapOf(
'+' to EcmaScriptToken.PLUS,
'-' to EcmaScriptToken.MINUS,
'*' to EcmaScriptToken.MULTI,
'/' to EcmaScriptToken.DIV,
'!' to EcmaScriptToken.NOT,
'%' to EcmaScriptToken.MOD,
'^' to EcmaScriptToken.XOR,
'&' to EcmaScriptToken.AND,
'?' to EcmaScriptToken.QUESTION,
'~' to EcmaScriptToken.COMP,
'.' to EcmaScriptToken.DOT,
',' to EcmaScriptToken.COMMA,
';' to EcmaScriptToken.SEMICOLON,
'=' to EcmaScriptToken.EQUALS,
'(' to EcmaScriptToken.LEFT_PARENTHESIS,
')' to EcmaScriptToken.RIGHT_PARENTHESIS,
'[' to EcmaScriptToken.LEFT_BRACKET,
']' to EcmaScriptToken.RIGHT_BRACKET,
'{' to EcmaScriptToken.LEFT_BRACE,
'}' to EcmaScriptToken.RIGHT_BRACE,
'|' to EcmaScriptToken.OR,
'<' to EcmaScriptToken.LESS_THAN,
'>' to EcmaScriptToken.MORE_THAN
)
}

private fun createSpecialTable(): Map<String, EcmaScriptToken> {
return hashMapOf(
"false" to EcmaScriptToken.FALSE,
"true" to EcmaScriptToken.TRUE,
"NaN" to EcmaScriptToken.NAN,
"undefined" to EcmaScriptToken.UNDEFINED,
"null" to EcmaScriptToken.NULL
)
}

}