1
0
forked from docs/doc-exports
doc-exports/docs/dli/sqlreference/dli_08_0110.html
Su, Xiaomeng 76a5b1ee83 dli_sqlreference_20240227
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
Co-committed-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
2024-03-27 22:02:33 +00:00

116 lines
11 KiB
HTML

<a name="dli_08_0110"></a><a name="dli_08_0110"></a>
<h1 class="topictitle1">Anomaly Detection</h1>
<div id="body1574406512582"><p id="dli_08_0110__p3243542173612">Anomaly detection applies to various scenarios, including intrusion detection, financial fraud detection, sensor data monitoring, medical diagnosis, natural data detection, and more. The typical algorithms for anomaly detection include the statistical modeling method, distance-based calculation method, linear model, and nonlinear model.</p>
<p id="dli_08_0110__p187193817372">DLI uses an anomaly detection method based on the random forest, which has the following characteristics:</p>
<ul id="dli_08_0110__ul871103818372"><li id="dli_08_0110__li47153813718">The one-pass algorithm is used with O(1) amortized time complexity and O(1) space complexity.</li><li id="dli_08_0110__li3711138193715">The random forest structure is constructed only once. The model update operation only updates the node data distribution values.</li><li id="dli_08_0110__li971143833711">The node stores data distribution information of multiple windows, and the algorithm can detect data distribution changes.</li><li id="dli_08_0110__li1471538153711">Anomaly detection and model updates are completed in the same code framework.</li></ul>
<div class="section" id="dli_08_0110__section8768110989"><h4 class="sectiontitle">Syntax</h4><div class="codecoloring" codetype="Sql" id="dli_08_0110__screen148620188157"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">SRF_UNSUP</span><span class="p">(</span><span class="nb">ARRAY</span><span class="p">[</span><span class="n">Field</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">Field</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="p">...],</span><span class="w"> </span><span class="s1">'Optional parameter list'</span><span class="p">)</span>
</pre></div></td></tr></table></div>
</div>
<div class="note" id="dli_08_0110__note696020431233"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="dli_08_0110__ul99601519161917"><li id="dli_08_0110__li11960151971915">The anomaly score returned by the function is a DOUBLE value in the range of [0, 1].</li><li id="dli_08_0110__li196061921916">The field names must be of the same type. If the field types are different, you can use the CAST function to escape the field names, for example, [a, CAST(b as DOUBLE)].</li><li id="dli_08_0110__li9960111915192">The syntax of the optional parameter list is as follows: "key1=value,key2=value2,..."</li></ul>
</div></div>
</div>
<div class="section" id="dli_08_0110__section1032282412160"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dli_08_0110__table2060911914818" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameters</caption><thead align="left"><tr id="dli_08_0110__row1960910913480"><th align="left" class="cellrowborder" valign="top" width="25.22%" id="mcps1.3.5.2.2.5.1.1"><p id="dli_08_0110__p176092984816">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="11.21%" id="mcps1.3.5.2.2.5.1.2"><p id="dli_08_0110__p03846300421">Mandatory</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="53.93%" id="mcps1.3.5.2.2.5.1.3"><p id="dli_08_0110__p136107924817">Description</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="9.64%" id="mcps1.3.5.2.2.5.1.4"><p id="dli_08_0110__p6233369126">Default Value</p>
</th>
</tr>
</thead>
<tbody><tr id="dli_08_0110__row2061019914488"><td class="cellrowborder" valign="top" width="25.22%" headers="mcps1.3.5.2.2.5.1.1 "><p id="dli_08_0110__p17610179154820">transientThreshold</p>
</td>
<td class="cellrowborder" valign="top" width="11.21%" headers="mcps1.3.5.2.2.5.1.2 "><p id="dli_08_0110__p6385193044210">No</p>
</td>
<td class="cellrowborder" valign="top" width="53.93%" headers="mcps1.3.5.2.2.5.1.3 "><p id="dli_08_0110__p1743794112133">Threshold for which the histogram change is indicating a change in the data.</p>
</td>
<td class="cellrowborder" valign="top" width="9.64%" headers="mcps1.3.5.2.2.5.1.4 "><p id="dli_08_0110__p2231236161216">5</p>
</td>
</tr>
<tr id="dli_08_0110__row166109917484"><td class="cellrowborder" valign="top" width="25.22%" headers="mcps1.3.5.2.2.5.1.1 "><p id="dli_08_0110__p5628149111218">numTrees</p>
</td>
<td class="cellrowborder" valign="top" width="11.21%" headers="mcps1.3.5.2.2.5.1.2 "><p id="dli_08_0110__p126167301495">No</p>
</td>
<td class="cellrowborder" valign="top" width="53.93%" headers="mcps1.3.5.2.2.5.1.3 "><p id="dli_08_0110__p2614163012492">Number of trees composing the random forest.</p>
</td>
<td class="cellrowborder" valign="top" width="9.64%" headers="mcps1.3.5.2.2.5.1.4 "><p id="dli_08_0110__p1623193613129">15</p>
</td>
</tr>
<tr id="dli_08_0110__row15475164618501"><td class="cellrowborder" valign="top" width="25.22%" headers="mcps1.3.5.2.2.5.1.1 "><p id="dli_08_0110__p1059320592129">maxLeafCount</p>
</td>
<td class="cellrowborder" valign="top" width="11.21%" headers="mcps1.3.5.2.2.5.1.2 "><p id="dli_08_0110__p1647674655015">No</p>
</td>
<td class="cellrowborder" valign="top" width="53.93%" headers="mcps1.3.5.2.2.5.1.3 "><p id="dli_08_0110__p12476746115012">Maximum number of leaf nodes one tree can have.</p>
</td>
<td class="cellrowborder" valign="top" width="9.64%" headers="mcps1.3.5.2.2.5.1.4 "><p id="dli_08_0110__p18234363121">15</p>
</td>
</tr>
<tr id="dli_08_0110__row88971315155116"><td class="cellrowborder" valign="top" width="25.22%" headers="mcps1.3.5.2.2.5.1.1 "><p id="dli_08_0110__p119183420159">maxTreeHeight</p>
</td>
<td class="cellrowborder" valign="top" width="11.21%" headers="mcps1.3.5.2.2.5.1.2 "><p id="dli_08_0110__p189812159519">No</p>
</td>
<td class="cellrowborder" valign="top" width="53.93%" headers="mcps1.3.5.2.2.5.1.3 "><p id="dli_08_0110__p1970415411524">Maximum height of the tree.</p>
</td>
<td class="cellrowborder" valign="top" width="9.64%" headers="mcps1.3.5.2.2.5.1.4 "><p id="dli_08_0110__p152323611129">12</p>
</td>
</tr>
<tr id="dli_08_0110__row37911714195215"><td class="cellrowborder" valign="top" width="25.22%" headers="mcps1.3.5.2.2.5.1.1 "><p id="dli_08_0110__p57845810563">seed</p>
</td>
<td class="cellrowborder" valign="top" width="11.21%" headers="mcps1.3.5.2.2.5.1.2 "><p id="dli_08_0110__p8791914115216">No</p>
</td>
<td class="cellrowborder" valign="top" width="53.93%" headers="mcps1.3.5.2.2.5.1.3 "><p id="dli_08_0110__p7126114635812">Random seed value used by the algorithm.</p>
</td>
<td class="cellrowborder" valign="top" width="9.64%" headers="mcps1.3.5.2.2.5.1.4 "><p id="dli_08_0110__p82353616122">4010</p>
</td>
</tr>
<tr id="dli_08_0110__row146111810573"><td class="cellrowborder" valign="top" width="25.22%" headers="mcps1.3.5.2.2.5.1.1 "><p id="dli_08_0110__p6615183570">numClusters</p>
</td>
<td class="cellrowborder" valign="top" width="11.21%" headers="mcps1.3.5.2.2.5.1.2 "><p id="dli_08_0110__p17716188575">No</p>
</td>
<td class="cellrowborder" valign="top" width="53.93%" headers="mcps1.3.5.2.2.5.1.3 "><p id="dli_08_0110__p11937672190">Number of types of data to be detected. By default, the following two data types are available: anomalous and normal data.</p>
</td>
<td class="cellrowborder" valign="top" width="9.64%" headers="mcps1.3.5.2.2.5.1.4 "><p id="dli_08_0110__p523036191214">2</p>
</td>
</tr>
<tr id="dli_08_0110__row87951424133"><td class="cellrowborder" valign="top" width="25.22%" headers="mcps1.3.5.2.2.5.1.1 "><p id="dli_08_0110__p77951241833">dataViewMode</p>
</td>
<td class="cellrowborder" valign="top" width="11.21%" headers="mcps1.3.5.2.2.5.1.2 "><p id="dli_08_0110__p079518243315">No</p>
</td>
<td class="cellrowborder" valign="top" width="53.93%" headers="mcps1.3.5.2.2.5.1.3 "><p id="dli_08_0110__p11322133311193">Algorithm learning mode.</p>
<ul id="dli_08_0110__ul0800155105912"><li id="dli_08_0110__li2800855195910">Value <strong id="dli_08_0110__b11829151775817">history</strong> indicates that all historical data is considered.</li><li id="dli_08_0110__li6800145565911">Value <strong id="dli_08_0110__b895672419598">horizon</strong> indicates that only historical data of a recent time period (typically a size of 4 windows) is considered.</li></ul>
</td>
<td class="cellrowborder" valign="top" width="9.64%" headers="mcps1.3.5.2.2.5.1.4 "><p id="dli_08_0110__p15233367123">history</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="dli_08_0110__section10313104614100"><h4 class="sectiontitle">Example</h4><p id="dli_08_0110__p15649165174610">Anomaly detection is conducted on the <strong id="dli_08_0110__b12331333175010">c</strong> field in data stream <strong id="dli_08_0110__b17541154345016">MyTable</strong>. If the anomaly score is greater than 0.8, then the detection result is considered to be anomaly.</p>
<div class="codecoloring" codetype="Sql" id="dli_08_0110__screen573418131113"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="k">c</span><span class="p">,</span>
<span class="w"> </span><span class="k">CASE</span><span class="w"> </span><span class="k">WHEN</span><span class="w"> </span><span class="n">SRF_UNSUP</span><span class="p">(</span><span class="nb">ARRAY</span><span class="p">[</span><span class="k">c</span><span class="p">],</span><span class="w"> </span><span class="ss">&quot;numTrees=15,seed=4010&quot;</span><span class="p">)</span><span class="w"> </span><span class="n">OVER</span><span class="w"> </span><span class="p">(</span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">proctime</span><span class="w"> </span><span class="n">RANGE</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">'99'</span><span class="w"> </span><span class="k">SECOND</span><span class="w"> </span><span class="n">PRECEDING</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">CURRENT</span><span class="w"> </span><span class="k">ROW</span><span class="p">)</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">.</span><span class="mi">8</span>
<span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="s1">'anomaly'</span><span class="w"> </span>
<span class="w"> </span><span class="k">ELSE</span><span class="w"> </span><span class="s1">'not anomaly'</span><span class="w"> </span>
<span class="w"> </span><span class="k">END</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">MyTable</span><span class="w"> </span>
</pre></div></td></tr></table></div>
</div>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_08_0109.html">StreamingML</a></div>
</div>
</div>